Unitree GO2 Pro Embodied AI Stack Survey

Author: Tachi Date: 2026-02-22 (original: 2026-02-19) Purpose: Comprehensive survey of resources, frameworks, and research for integrating the Unitree GO2 Pro quadruped with modern embodied AI systems including ROS2, VLA/VLM models, Nvidia Isaac/GR00T, and LLM-based control.

Executive Summary

The Unitree GO2 Pro is a capable platform for embodied AI research with growing ecosystem support. Key findings:

ROS2 Integration: Multiple mature SDKs exist (official and community) supporting WebRTC, CycloneDDS, and full sensor access
Unitree’s Own VLA: Unitree has released UnifoLM-VLA-0, an open-source vision-language-action model trained on Unitree robots
Nvidia Partnership: Unitree is an official Nvidia partner for GR00T foundation model development
LLM Control: MCP servers enable natural language control via LLMs
Simulation: Isaac Sim, MuJoCo, Gazebo, and PyBullet all have GO2 support
Dimensional (dim.os): New Python-native framework with first-class MCP support — promising for agent-native robotics

1. ROS2 Integration

1.1 Official Unitree ROS2 SDK

Repository: github.com/unitreerobotics/unitree_ros2

The official SDK provides:

C++ and Python interfaces
Sport mode control for basic locomotion
Example programs for walking patterns
Direct integration with Unitree’s SDK2

Key Features:

sport_mode_ctrl example demonstrates walking back and forth
Supports all GO2 variants (AIR/PRO/EDU)
Joint-level control capabilities

1.2 Community ROS2 SDK (Highly Recommended)

Repository: github.com/abizovnuralem/go2_ros2_sdk

An unofficial but feature-rich SDK that provides:

Feature	Status
URDF	✅
Joint states (real-time)	✅
IMU sync	✅
Joystick control	✅
LiDAR stream (PointCloud2)	✅
Camera stream	✅
Foot force sensors	✅
SLAM (slam_toolbox)	✅
Navigation (Nav2)	✅
Object detection (COCO)	✅
Multi-robot support	✅
Docker support	✅

Protocols:

WebRTC (Wi-Fi) - Remote control via internet
CycloneDDS (Ethernet) - Low-latency local control

ROS2 Distributions: Humble, Iron, Rolling (Ubuntu 22.04)

Installation:

mkdir -p ros2_ws && cd ros2_ws
git clone --recurse-submodules https://github.com/abizovnuralem/go2_ros2_sdk.git src
sudo apt install ros-$ROS_DISTRO-image-tools ros-$ROS_DISTRO-vision-msgs
pip install -r src/requirements.txt
source /opt/ros/$ROS_DISTRO/setup.bash
rosdep install --from-paths src --ignore-src -r -y
colcon build

1.3 CHAMP Controller Integration

Repository: github.com/anujjain-dev/unitree-go2-ros2

Built on the CHAMP legged robots framework:

Gazebo simulation support
ros2-control integration
Velodyne sensor support
Robot localization package

Dependencies:

sudo apt install ros-humble-gazebo-ros2-control
sudo apt install ros-humble-xacro
sudo apt install ros-humble-robot-localization
sudo apt install ros-humble-ros2-controllers
sudo apt install ros-humble-ros2-control

1.4 Additional ROS2 Resources

Repository	Description
OpenMind/unitree-sdk	Zenoh bridge integration for GO2/G1
khaledgabr77/unitree_go2_ros2	ROS2 Jazzy + Gazebo Harmonic support
grasp-lyrl/go2_ros2_webrtc_sdk	WebRTC-focused SDK
eppl-erau-db/amigo_ros2	Isaac ROS integration with nvblox
Unitree-Go2-Robot/go2_robot	General ROS2 package

1.5 Python SDK

Repository: github.com/legion1581/go2_python_sdk

Unofficial Python SDK supporting:

CycloneDDS driver
WebRTC (in development)
Direct robot control without ROS2

2. VLA/VLM Integration

2.1 Unitree’s Official VLA Model: UnifoLM-VLA-0

Repository: github.com/unitreerobotics/unifolm-vla Project Page: unigen-x.github.io/unifolm-vla.github.io

Unitree has released their own Vision-Language-Action model as open source:

Key Features:

Designed for general-purpose humanoid robot manipulation
Evolves from “vision-language understanding” to “embodied brain”
Spatial semantic enhancement for 2D/3D understanding
Manipulation generalization across 12 task categories

Model Checkpoints:

Model	Description	Link
Unifolm-VLM-Base	Fine-tuned on image-text VQA + robot datasets	HuggingFace
UnifoLM-VLA-Base	Fine-tuned on Unitree open-source dataset	HuggingFace
UnifoLM-VLA-Libero	Fine-tuned on Libero dataset	HuggingFace

Training Datasets (G1 Humanoid):

G1_Stack_Block, G1_Bag_Insert, G1_Erase_Board
G1_Clean_Table, G1_Pack_PencilBox, G1_Pour_Medicine
G1_Pack_PingPong, G1_Prepare_Fruit, G1_Organize_Tools
G1_Fold_Towel, G1_Wipe_Table, G1_DualRobot_Clean_Table

Installation:

conda create -n unifolm-vla python=3.10.18
conda activate unifolm-vla
git clone https://github.com/unitreerobotics/unifolm-vla.git
cd unifolm-vla
pip install --no-deps "lerobot @ git+https://github.com/huggingface/lerobot.git@0878c68"
pip install -e .
pip install "flash-attn==2.5.6" --no-build-isolation

Note: Currently focused on G1 humanoid manipulation, but the architecture is applicable to quadruped manipulation tasks.

2.2 OpenVLA

Repository: github.com/openvla/openvla Project Page: openvla.github.io

An open-source vision-language-action model trained on 970K robot manipulation trajectories from Open X-Embodiment dataset.

Key Features:

Generalist robotic manipulation
Trained on diverse tasks, scenes, and embodiments
Supports fine-tuning on custom datasets
RLDS format for data loading

Relevance: Can be fine-tuned for quadruped manipulation tasks using Unitree’s data.

2.3 VLA Learning Resources

Resource	Description
Awesome-VLA-Learning-Guide	Systematic introduction to VLA models
awesome-embodied-vla-va-vln	Curated list of VLA/VLN research
Large-VLM-based-VLA-for-Robotic-Manipulation	VLM-based VLA models for manipulation
LLaVA-VLA	LLaVA-based VLA model
Awesome-VLA-Robotics	Comprehensive VLA papers/models/datasets

2.4 QUARD Dataset

Paper: QUARD (QUAdruped Robot Dataset)

A dataset specifically designed for quadruped robot manipulation. Relevant for GO2 manipulation tasks.

3. Nvidia Isaac & GR00T

3.1 GR00T Foundation Model

Nvidia’s GR00T (Generalist Robot 00 Technology) is a foundation model for humanoid and quadruped robots.

Key Points:

Unitree is an official Nvidia GR00T partner
Enables complex tasks with minimal training
800 teraflops of 8-bit floating point AI performance on Jetson Thor
Multimodal generative AI capabilities

GR00T N1.5 Performance on Unitree G1:

98.8% success rate on placing known fruits (vs 44.0% for N1)
Post-trained with only 1,000 teleoperation episodes
Supports both humanoid and quadruped platforms

References:

3.2 Isaac Sim Quadruped Extension

Documentation: Isaac Sim Quadruped Extension

Features:

Unitree A1 support with ROS2 camera data
Visual-inertial odometry integration
Stereo vision support
Custom scene creation

3.3 Isaac ROS

GitHub Organization: github.com/NVIDIA-ISAAC-ROS Documentation: nvidia-isaac-ros.github.io

NVIDIA-accelerated ROS 2 packages for autonomous robots:

Key Packages:

isaac_ros_jetson - Jetson support packages
nvblox - 3D scene reconstruction
NITROS - Zero-copy ROS2 messaging
Visual SLAM
Object detection

Jetson Orin Integration:

Full CUDA acceleration
TensorRT model optimization
Docker container support

References:

4. LLM/MLLM Integration

4.1 MCP Server for Natural Language Control

Repository: github.com/lpigeon/unitree-go2-mcp-server

A Model Context Protocol (MCP) server that enables:

Natural language control of GO2 via LLM
Command interpretation by ChatGPT/Claude/etc.
Integration with OpenAI and other LLM providers

Use Case: “Walk forward 3 meters and then turn left” → Robot executes commands.

4.2 Voice Interaction with OpenAI

Guide: Configuring Unitree Go2 EDU for Real-Time Voice Interaction

Setup guide for:

Voice input via microphone
Speech-to-text processing
OpenAI API integration for command interpretation
Robot command execution

Requirements:

Unitree SDK (C++ or Python)
OpenAI API key
Audio processing libraries

4.3 WSO2 AI Agent Integration

Article: How We Gave Life to an AI Agent with Unitree Go2

Integration approach:

Remote control via app
SDK-based control (C++ and Python)
AI agent for autonomous behavior
Communication via multiple channels

4.4 Security Considerations

Research: Jailbreaking LLM-controlled robots

Important security research on LLM-controlled robots, including the Unitree GO2. Highlights the need for:

Input validation
Command filtering
Rate limiting
Safety boundaries

5. Reinforcement Learning

5.1 Unitree RL Gym

Repository: github.com/unitreerobotics/unitree_rl_gym

Official RL training environment:

Supports GO2, H1, H1_2, and G1
Isaac Gym integration
PPO-based training
Sim-to-real transfer

5.2 CHAMP Framework

Repository: CHAMP Legged Robots

Open-source quadruped controller:

ROS-based control
Gait generation
Balance control
Simulation support

6. Simulation Environments

6.1 Isaac Sim

Best for: High-fidelity simulation with GPU acceleration

Features:

Photo-realistic rendering
PhysX physics engine
Domain randomization
Synthetic data generation

GO2 Support: Via quadruped extension

6.2 MuJoCo

Best for: Fast physics simulation

Open-source since 2021
Excellent for RL training
Contact dynamics

6.3 Gazebo

Best for: ROS2 integration testing

Native ROS2 support
Multiple physics engines
Sensor plugins

6.4 PyBullet

Best for: Quick prototyping

Python-native
Fast simulation
Good for RL

7. Curated Resource Collections

7.1 Awesome Unitree Robots

Repository: github.com/shaoxiang/awesome-unitree-robots

Comprehensive collection covering:

G1, Go2, B2, H1+ robots
ROS/ROS2 integration
High-fidelity simulation
Motion control
RL training
Vision systems
Tutorials

7.2 Awesome Quadrupedal Robots

Repository: github.com/curieuxjy/Awesome_Quadrupedal_Robots

General quadruped resources including:

Manipulation on quadrupeds
Gait transitions
Terrain adaptation

8. Dimensional (dim.os)

Repository: github.com/dimensionalOS/dimos

Added 2026-02-22. Announced 2026-02-19.

Dimensional (or “dim.os”) is a Python-native robotics framework that doesn’t require ROS but plays nice with it. The killer feature: Natural language control via MCP — you can literally tell your robot “hey, go find the kitchen” and it figures out the rest.

What is Dimensional?

Dimensional positions itself as “the agentive operating system for generalist robotics” — a Python-native framework with first-class MCP support for natural language control.

Key Features

Status icons: ✅ = stable/fully supported; 🟩 = mixed or partial support (some components in beta).

Feature	Status	Notes
Non-ROS architecture	✅	Pure Python, no ROS required
MCP integration	✅	“vibecode” robots in natural language
Navigation & SLAM	✅	Built-in, also supports ROS Nav2
3D Perception	✅	VLMs, detectors, spatial memory
Simulation	✅	MuJoCo support built-in
Multi-robot	✅	Framework supports multiple robots
Hardware support	🟩	Unitree Go2 Pro/AIR stable, G1 beta

Installation

# Quick install
uvx --python 3.12 --from 'dimos[base,unitree]' dimos --replay run unitree-go2

# With simulation
uv pip install 'dimos[base,unitree,sim]'
dimos --simulation run unitree-go2

Why This Matters

MCP hooks built-in from day one. Most of the other stacks in this survey need some hacking to connect LLMs. Dimensional has it as a first-class feature.

From their docs:

“Dimensional is agent native — ‘vibecode’ your robots in natural language and build (local & hosted) multi-agent systems that work seamlessly with your hardware.”

That aligns well with goals for natural language control through unified, LLM-friendly control stacks.

Technical Deep Dive

MCP Integration:

Component	Details
MCP Server Port	9990 (default)
Protocol	JSON-RPC 2.0
Tool Format	`tools/list`, `tools/call`
Skill Discovery	Auto-discovers skills from modules

Connection Pattern:

LLM Client → MCP (HTTP) → Dimensional → Unitree SDK → GO2 Hardware

Skills Available via MCP:

Skill	Description
`UnitreeSpeak`	TTS through robot speakers (uses OpenAI TTS API)
`FollowHuman`	Visual servoing to follow a person
`NavigateTo`	Point-to-point navigation
(custom skills)	Users can register additional skills

Navigation & SLAM:

Frontier Exploration — Autonomous map building
A* Replanning — Dynamic path replanning
Costmapper — Occupancy grid cost maps
Visual Servoing — Image-based control
ROS Nav2 Integration — Can leverage ROS navigation stack

Perception Stack:

Object detection (2D/3D)
Object tracking (2D/3D)
Spatial perception / point clouds
Person tracking
Object-scene registration

Key Dependencies:

Library	Purpose
Pinocchio	Inverse kinematics for legged robots
OpenCV, Open3D	Computer vision and 3D processing
ReactiveX	Async stream processing
Numba	JIT compilation for occupancy mapping
rerun-sdk	Visualization (required)
dimos-lcm	LCM transport protocol

System Requirements:

OS: Ubuntu 22.04/24.04 (NixOS also supported)
Python: 3.12+
Hardware Access: WebRTC for remote, direct for local
Simulation: MuJoCo support built-in (no hardware needed for testing)

Open Questions

Stability of WebRTC for real-time control latency
Comparison to direct ROS/DDS for low-latency applications
Production readiness (project is explicitly alpha)
Low-level joint control parity with Unitree’s official SDK

Bottom line: Keep an eye on this one. The Roboverse announcement on 2026-02-19 generated significant interest, and if the community adopts it, this could become a standard way to connect LLMs to Unitree robots.

9. Recommended Stack for GO2 Pro

9.1 Development Environment

┌─────────────────────────────────────────────────┐
│                 GO2 Pro Platform                 │
├─────────────────────────────────────────────────┤
│  LLM Layer    │  MCP Server (natural language)  │
├─────────────────────────────────────────────────┤
│  VLA Layer    │  OpenVLA / UnifoLM-VLA          │
├─────────────────────────────────────────────────┤
│  ROS2 Layer   │  go2_ros2_sdk (community)       │
├─────────────────────────────────────────────────┤
│  Simulation   │  Isaac Sim / Gazebo             │
├─────────────────────────────────────────────────┤
│  Compute      │  Jetson Orin / External PC      │
└─────────────────────────────────────────────────┘

With Dimensional:

┌─────────────────────────────────────────────────┐
│                 GO2 Pro Platform                 │
├─────────────────────────────────────────────────┤
│  LLM Layer    │  Dimensional (MCP built-in)     │
├─────────────────────────────────────────────────┤
│  VLA Layer    │  OpenVLA / UnifoLM-VLA          │
├─────────────────────────────────────────────────┤
│  Control Layer│  Dimensional (Python-native)    │
├─────────────────────────────────────────────────┤
│  Simulation   │  MuJoCo / Isaac Sim             │
├─────────────────────────────────────────────────┤
│  Compute      │  Jetson Orin / External PC      │
└─────────────────────────────────────────────────┘

9.2 Quick Start Path

ROS2 Setup: Install go2_ros2_sdk for sensor access and control
Simulation: Test in Gazebo with CHAMP controller
LLM Integration: Add MCP server for natural language commands
VLA Training: Fine-tune OpenVLA on custom manipulation data
Deployment: Use Jetson Orin for onboard compute

Alternative (Dimensional):

Quick Start: uvx --python 3.12 --from 'dimos[base,unitree]' dimos --simulation run unitree-go2
MCP Integration: Already built-in — connect LLM client to port 9990
Custom Skills: Implement and register as needed

9.3 Hardware Recommendations

Component	Option	Notes
Onboard Compute	Jetson Orin Nano/AGX	Isaac ROS support
External Compute	Workstation with RTX 4070+	VLA training
Sensors	Built-in + RealSense	Additional depth sensing
Communication	WebRTC (remote) / DDS (local)	Protocol selection

10. References

Official Documentation

Nvidia Resources

Research Papers

OpenVLA: An Open-Source Vision-Language-Action Model (CoRL 2024)
RT-X: Open X-Embodiment Robot Learning (arXiv 2023)
GR00T N1.5: VLA Model for Humanoid Robots (NVIDIA 2025)

Community Resources

11. Future Work

Potential directions for GO2 Pro research:

Quadruped Manipulation: Mount arm on GO2, train VLA for mobile manipulation
Navigation VLA: Adapt OmniVLA for quadruped navigation
Multi-robot Coordination: Use ROS2 multi-robot support for fleet behavior
Sim-to-Real: Isaac Sim → GR00T → Real GO2 pipeline
LLM Reasoning: Chain-of-thought prompting for complex tasks
Dimensional Evaluation: Test production readiness, latency, and joint control parity

Survey compiled by Tachi 🕷️ Last updated: 2026-02-22