Unitree GO2 Pro Embodied AI Stack Survey
Author: Tachi Date: 2026-02-22 (original: 2026-02-19) Purpose: Comprehensive survey of resources, frameworks, and research for integrating the Unitree GO2 Pro quadruped with modern embodied AI systems including ROS2, VLA/VLM models, Nvidia Isaac/GR00T, and LLM-based control.
Executive Summary
The Unitree GO2 Pro is a capable platform for embodied AI research with growing ecosystem support. Key findings:
- ROS2 Integration: Multiple mature SDKs exist (official and community) supporting WebRTC, CycloneDDS, and full sensor access
- Unitree’s Own VLA: Unitree has released UnifoLM-VLA-0, an open-source vision-language-action model trained on Unitree robots
- Nvidia Partnership: Unitree is an official Nvidia partner for GR00T foundation model development
- LLM Control: MCP servers enable natural language control via LLMs
- Simulation: Isaac Sim, MuJoCo, Gazebo, and PyBullet all have GO2 support
- Dimensional (dim.os): New Python-native framework with first-class MCP support — promising for agent-native robotics
1. ROS2 Integration
1.1 Official Unitree ROS2 SDK
Repository: github.com/unitreerobotics/unitree_ros2
The official SDK provides:
- C++ and Python interfaces
- Sport mode control for basic locomotion
- Example programs for walking patterns
- Direct integration with Unitree’s SDK2
Key Features:
sport_mode_ctrlexample demonstrates walking back and forth- Supports all GO2 variants (AIR/PRO/EDU)
- Joint-level control capabilities
1.2 Community ROS2 SDK (Highly Recommended)
Repository: github.com/abizovnuralem/go2_ros2_sdk
An unofficial but feature-rich SDK that provides:
| Feature | Status |
|---|---|
| URDF | ✅ |
| Joint states (real-time) | ✅ |
| IMU sync | ✅ |
| Joystick control | ✅ |
| LiDAR stream (PointCloud2) | ✅ |
| Camera stream | ✅ |
| Foot force sensors | ✅ |
| SLAM (slam_toolbox) | ✅ |
| Navigation (Nav2) | ✅ |
| Object detection (COCO) | ✅ |
| Multi-robot support | ✅ |
| Docker support | ✅ |
Protocols:
- WebRTC (Wi-Fi) - Remote control via internet
- CycloneDDS (Ethernet) - Low-latency local control
ROS2 Distributions: Humble, Iron, Rolling (Ubuntu 22.04)
Installation:
mkdir -p ros2_ws && cd ros2_ws
git clone --recurse-submodules https://github.com/abizovnuralem/go2_ros2_sdk.git src
sudo apt install ros-$ROS_DISTRO-image-tools ros-$ROS_DISTRO-vision-msgs
pip install -r src/requirements.txt
source /opt/ros/$ROS_DISTRO/setup.bash
rosdep install --from-paths src --ignore-src -r -y
colcon build
1.3 CHAMP Controller Integration
Repository: github.com/anujjain-dev/unitree-go2-ros2
Built on the CHAMP legged robots framework:
- Gazebo simulation support
- ros2-control integration
- Velodyne sensor support
- Robot localization package
Dependencies:
sudo apt install ros-humble-gazebo-ros2-control
sudo apt install ros-humble-xacro
sudo apt install ros-humble-robot-localization
sudo apt install ros-humble-ros2-controllers
sudo apt install ros-humble-ros2-control
1.4 Additional ROS2 Resources
| Repository | Description |
|---|---|
| OpenMind/unitree-sdk | Zenoh bridge integration for GO2/G1 |
| khaledgabr77/unitree_go2_ros2 | ROS2 Jazzy + Gazebo Harmonic support |
| grasp-lyrl/go2_ros2_webrtc_sdk | WebRTC-focused SDK |
| eppl-erau-db/amigo_ros2 | Isaac ROS integration with nvblox |
| Unitree-Go2-Robot/go2_robot | General ROS2 package |
1.5 Python SDK
Repository: github.com/legion1581/go2_python_sdk
Unofficial Python SDK supporting:
- CycloneDDS driver
- WebRTC (in development)
- Direct robot control without ROS2
2. VLA/VLM Integration
2.1 Unitree’s Official VLA Model: UnifoLM-VLA-0
Repository: github.com/unitreerobotics/unifolm-vla Project Page: unigen-x.github.io/unifolm-vla.github.io
Unitree has released their own Vision-Language-Action model as open source:
Key Features:
- Designed for general-purpose humanoid robot manipulation
- Evolves from “vision-language understanding” to “embodied brain”
- Spatial semantic enhancement for 2D/3D understanding
- Manipulation generalization across 12 task categories
Model Checkpoints:
| Model | Description | Link |
|---|---|---|
| Unifolm-VLM-Base | Fine-tuned on image-text VQA + robot datasets | HuggingFace |
| UnifoLM-VLA-Base | Fine-tuned on Unitree open-source dataset | HuggingFace |
| UnifoLM-VLA-Libero | Fine-tuned on Libero dataset | HuggingFace |
Training Datasets (G1 Humanoid):
- G1_Stack_Block, G1_Bag_Insert, G1_Erase_Board
- G1_Clean_Table, G1_Pack_PencilBox, G1_Pour_Medicine
- G1_Pack_PingPong, G1_Prepare_Fruit, G1_Organize_Tools
- G1_Fold_Towel, G1_Wipe_Table, G1_DualRobot_Clean_Table
Installation:
conda create -n unifolm-vla python=3.10.18
conda activate unifolm-vla
git clone https://github.com/unitreerobotics/unifolm-vla.git
cd unifolm-vla
pip install --no-deps "lerobot @ git+https://github.com/huggingface/lerobot.git@0878c68"
pip install -e .
pip install "flash-attn==2.5.6" --no-build-isolation
Note: Currently focused on G1 humanoid manipulation, but the architecture is applicable to quadruped manipulation tasks.
2.2 OpenVLA
Repository: github.com/openvla/openvla Project Page: openvla.github.io
An open-source vision-language-action model trained on 970K robot manipulation trajectories from Open X-Embodiment dataset.
Key Features:
- Generalist robotic manipulation
- Trained on diverse tasks, scenes, and embodiments
- Supports fine-tuning on custom datasets
- RLDS format for data loading
Relevance: Can be fine-tuned for quadruped manipulation tasks using Unitree’s data.
2.3 VLA Learning Resources
| Resource | Description |
|---|---|
| Awesome-VLA-Learning-Guide | Systematic introduction to VLA models |
| awesome-embodied-vla-va-vln | Curated list of VLA/VLN research |
| Large-VLM-based-VLA-for-Robotic-Manipulation | VLM-based VLA models for manipulation |
| LLaVA-VLA | LLaVA-based VLA model |
| Awesome-VLA-Robotics | Comprehensive VLA papers/models/datasets |
2.4 QUARD Dataset
Paper: QUARD (QUAdruped Robot Dataset)
A dataset specifically designed for quadruped robot manipulation. Relevant for GO2 manipulation tasks.
3. Nvidia Isaac & GR00T
3.1 GR00T Foundation Model
Nvidia’s GR00T (Generalist Robot 00 Technology) is a foundation model for humanoid and quadruped robots.
Key Points:
- Unitree is an official Nvidia GR00T partner
- Enables complex tasks with minimal training
- 800 teraflops of 8-bit floating point AI performance on Jetson Thor
- Multimodal generative AI capabilities
GR00T N1.5 Performance on Unitree G1:
- 98.8% success rate on placing known fruits (vs 44.0% for N1)
- Post-trained with only 1,000 teleoperation episodes
- Supports both humanoid and quadruped platforms
References:
3.2 Isaac Sim Quadruped Extension
Documentation: Isaac Sim Quadruped Extension
Features:
- Unitree A1 support with ROS2 camera data
- Visual-inertial odometry integration
- Stereo vision support
- Custom scene creation
3.3 Isaac ROS
GitHub Organization: github.com/NVIDIA-ISAAC-ROS Documentation: nvidia-isaac-ros.github.io
NVIDIA-accelerated ROS 2 packages for autonomous robots:
Key Packages:
isaac_ros_jetson- Jetson support packages- nvblox - 3D scene reconstruction
- NITROS - Zero-copy ROS2 messaging
- Visual SLAM
- Object detection
Jetson Orin Integration:
- Full CUDA acceleration
- TensorRT model optimization
- Docker container support
References:
4. LLM/MLLM Integration
4.1 MCP Server for Natural Language Control
Repository: github.com/lpigeon/unitree-go2-mcp-server
A Model Context Protocol (MCP) server that enables:
- Natural language control of GO2 via LLM
- Command interpretation by ChatGPT/Claude/etc.
- Integration with OpenAI and other LLM providers
Use Case: “Walk forward 3 meters and then turn left” → Robot executes commands.
4.2 Voice Interaction with OpenAI
Guide: Configuring Unitree Go2 EDU for Real-Time Voice Interaction
Setup guide for:
- Voice input via microphone
- Speech-to-text processing
- OpenAI API integration for command interpretation
- Robot command execution
Requirements:
- Unitree SDK (C++ or Python)
- OpenAI API key
- Audio processing libraries
4.3 WSO2 AI Agent Integration
Article: How We Gave Life to an AI Agent with Unitree Go2
Integration approach:
- Remote control via app
- SDK-based control (C++ and Python)
- AI agent for autonomous behavior
- Communication via multiple channels
4.4 Security Considerations
Research: Jailbreaking LLM-controlled robots
Important security research on LLM-controlled robots, including the Unitree GO2. Highlights the need for:
- Input validation
- Command filtering
- Rate limiting
- Safety boundaries
5. Reinforcement Learning
5.1 Unitree RL Gym
Repository: github.com/unitreerobotics/unitree_rl_gym
Official RL training environment:
- Supports GO2, H1, H1_2, and G1
- Isaac Gym integration
- PPO-based training
- Sim-to-real transfer
5.2 CHAMP Framework
Repository: CHAMP Legged Robots
Open-source quadruped controller:
- ROS-based control
- Gait generation
- Balance control
- Simulation support
6. Simulation Environments
6.1 Isaac Sim
Best for: High-fidelity simulation with GPU acceleration
Features:
- Photo-realistic rendering
- PhysX physics engine
- Domain randomization
- Synthetic data generation
GO2 Support: Via quadruped extension
6.2 MuJoCo
Best for: Fast physics simulation
- Open-source since 2021
- Excellent for RL training
- Contact dynamics
6.3 Gazebo
Best for: ROS2 integration testing
- Native ROS2 support
- Multiple physics engines
- Sensor plugins
6.4 PyBullet
Best for: Quick prototyping
- Python-native
- Fast simulation
- Good for RL
7. Curated Resource Collections
7.1 Awesome Unitree Robots
Repository: github.com/shaoxiang/awesome-unitree-robots
Comprehensive collection covering:
- G1, Go2, B2, H1+ robots
- ROS/ROS2 integration
- High-fidelity simulation
- Motion control
- RL training
- Vision systems
- Tutorials
7.2 Awesome Quadrupedal Robots
Repository: github.com/curieuxjy/Awesome_Quadrupedal_Robots
General quadruped resources including:
- Manipulation on quadrupeds
- Gait transitions
- Terrain adaptation
8. Dimensional (dim.os)
Repository: github.com/dimensionalOS/dimos
Added 2026-02-22. Announced 2026-02-19.
Dimensional (or “dim.os”) is a Python-native robotics framework that doesn’t require ROS but plays nice with it. The killer feature: Natural language control via MCP — you can literally tell your robot “hey, go find the kitchen” and it figures out the rest.
What is Dimensional?
Dimensional positions itself as “the agentive operating system for generalist robotics” — a Python-native framework with first-class MCP support for natural language control.
Key Features
Status icons: ✅ = stable/fully supported; 🟩 = mixed or partial support (some components in beta).
| Feature | Status | Notes |
|---|---|---|
| Non-ROS architecture | ✅ | Pure Python, no ROS required |
| MCP integration | ✅ | “vibecode” robots in natural language |
| Navigation & SLAM | ✅ | Built-in, also supports ROS Nav2 |
| 3D Perception | ✅ | VLMs, detectors, spatial memory |
| Simulation | ✅ | MuJoCo support built-in |
| Multi-robot | ✅ | Framework supports multiple robots |
| Hardware support | 🟩 | Unitree Go2 Pro/AIR stable, G1 beta |
Installation
# Quick install
uvx --python 3.12 --from 'dimos[base,unitree]' dimos --replay run unitree-go2
# With simulation
uv pip install 'dimos[base,unitree,sim]'
dimos --simulation run unitree-go2
Why This Matters
MCP hooks built-in from day one. Most of the other stacks in this survey need some hacking to connect LLMs. Dimensional has it as a first-class feature.
From their docs:
“Dimensional is agent native — ‘vibecode’ your robots in natural language and build (local & hosted) multi-agent systems that work seamlessly with your hardware.”
That aligns well with goals for natural language control through unified, LLM-friendly control stacks.
Technical Deep Dive
MCP Integration:
| Component | Details |
|---|---|
| MCP Server Port | 9990 (default) |
| Protocol | JSON-RPC 2.0 |
| Tool Format | tools/list, tools/call |
| Skill Discovery | Auto-discovers skills from modules |
Connection Pattern:
LLM Client → MCP (HTTP) → Dimensional → Unitree SDK → GO2 Hardware
Skills Available via MCP:
| Skill | Description |
|---|---|
UnitreeSpeak |
TTS through robot speakers (uses OpenAI TTS API) |
FollowHuman |
Visual servoing to follow a person |
NavigateTo |
Point-to-point navigation |
| (custom skills) | Users can register additional skills |
Navigation & SLAM:
- Frontier Exploration — Autonomous map building
- A* Replanning — Dynamic path replanning
- Costmapper — Occupancy grid cost maps
- Visual Servoing — Image-based control
- ROS Nav2 Integration — Can leverage ROS navigation stack
Perception Stack:
- Object detection (2D/3D)
- Object tracking (2D/3D)
- Spatial perception / point clouds
- Person tracking
- Object-scene registration
Key Dependencies:
| Library | Purpose |
|---|---|
| Pinocchio | Inverse kinematics for legged robots |
| OpenCV, Open3D | Computer vision and 3D processing |
| ReactiveX | Async stream processing |
| Numba | JIT compilation for occupancy mapping |
| rerun-sdk | Visualization (required) |
| dimos-lcm | LCM transport protocol |
System Requirements:
- OS: Ubuntu 22.04/24.04 (NixOS also supported)
- Python: 3.12+
- Hardware Access: WebRTC for remote, direct for local
- Simulation: MuJoCo support built-in (no hardware needed for testing)
Open Questions
- Stability of WebRTC for real-time control latency
- Comparison to direct ROS/DDS for low-latency applications
- Production readiness (project is explicitly alpha)
- Low-level joint control parity with Unitree’s official SDK
Bottom line: Keep an eye on this one. The Roboverse announcement on 2026-02-19 generated significant interest, and if the community adopts it, this could become a standard way to connect LLMs to Unitree robots.
9. Recommended Stack for GO2 Pro
9.1 Development Environment
┌─────────────────────────────────────────────────┐
│ GO2 Pro Platform │
├─────────────────────────────────────────────────┤
│ LLM Layer │ MCP Server (natural language) │
├─────────────────────────────────────────────────┤
│ VLA Layer │ OpenVLA / UnifoLM-VLA │
├─────────────────────────────────────────────────┤
│ ROS2 Layer │ go2_ros2_sdk (community) │
├─────────────────────────────────────────────────┤
│ Simulation │ Isaac Sim / Gazebo │
├─────────────────────────────────────────────────┤
│ Compute │ Jetson Orin / External PC │
└─────────────────────────────────────────────────┘
With Dimensional:
┌─────────────────────────────────────────────────┐
│ GO2 Pro Platform │
├─────────────────────────────────────────────────┤
│ LLM Layer │ Dimensional (MCP built-in) │
├─────────────────────────────────────────────────┤
│ VLA Layer │ OpenVLA / UnifoLM-VLA │
├─────────────────────────────────────────────────┤
│ Control Layer│ Dimensional (Python-native) │
├─────────────────────────────────────────────────┤
│ Simulation │ MuJoCo / Isaac Sim │
├─────────────────────────────────────────────────┤
│ Compute │ Jetson Orin / External PC │
└─────────────────────────────────────────────────┘
9.2 Quick Start Path
- ROS2 Setup: Install
go2_ros2_sdkfor sensor access and control - Simulation: Test in Gazebo with CHAMP controller
- LLM Integration: Add MCP server for natural language commands
- VLA Training: Fine-tune OpenVLA on custom manipulation data
- Deployment: Use Jetson Orin for onboard compute
Alternative (Dimensional):
- Quick Start:
uvx --python 3.12 --from 'dimos[base,unitree]' dimos --simulation run unitree-go2 - MCP Integration: Already built-in — connect LLM client to port 9990
- Custom Skills: Implement and register as needed
9.3 Hardware Recommendations
| Component | Option | Notes |
|---|---|---|
| Onboard Compute | Jetson Orin Nano/AGX | Isaac ROS support |
| External Compute | Workstation with RTX 4070+ | VLA training |
| Sensors | Built-in + RealSense | Additional depth sensing |
| Communication | WebRTC (remote) / DDS (local) | Protocol selection |
10. References
Official Documentation
Nvidia Resources
Research Papers
- OpenVLA: An Open-Source Vision-Language-Action Model (CoRL 2024)
- RT-X: Open X-Embodiment Robot Learning (arXiv 2023)
- GR00T N1.5: VLA Model for Humanoid Robots (NVIDIA 2025)
Community Resources
11. Future Work
Potential directions for GO2 Pro research:
- Quadruped Manipulation: Mount arm on GO2, train VLA for mobile manipulation
- Navigation VLA: Adapt OmniVLA for quadruped navigation
- Multi-robot Coordination: Use ROS2 multi-robot support for fleet behavior
- Sim-to-Real: Isaac Sim → GR00T → Real GO2 pipeline
- LLM Reasoning: Chain-of-thought prompting for complex tasks
- Dimensional Evaluation: Test production readiness, latency, and joint control parity
Survey compiled by Tachi 🕷️ Last updated: 2026-02-22