Local Planning Quickstart: Implementation Guide¶
Executive Summary¶
Critical Discovery: DIMOS VFH local planner eliminates SLAM dependency for MVP.
Revised Timeline: ~1 week to working embodied AI mission (vs 2-3 weeks with global planning)
Key Phases:
1. MockRobot (1-2 days) - Enable CI/CD
2. Local Planning Validation (2-3 days) - Test VFH on hardware
3. Perception Integration (1-2 days) - Connect YOLO to navigation
4. End-to-End Mission (1 day) - "Find the ball" working
Success Criteria: Robot executes "Find the red ball" mission using camera perception + reactive navigation.
Background: What Changed¶
Original Plan (BLOCKED)¶
Week 1-2: Map environment with SLAM
↓
Week 2-3: Test Nav2 global planning
↓
Week 3-4: Add camera perception
↓
Week 4: End-to-end mission
Risk: SLAM + Nav2 untested, complex, high failure rate
Timeline: 3-4 weeks minimum
New Plan (UNBLOCKED) ✅¶
Week 1: Test local planner + Add YOLO
↓
Week 1: End-to-end mission
Risk: Low (simpler stack, fewer dependencies)
Timeline: 1 week
Why Local Planning First¶
Benefits: - ✅ No mapping phase required - ✅ No localization failures - ✅ Simpler to test and debug - ✅ More robust (reactive vs planned) - ✅ Sufficient for object search missions
Trade-offs: - ⚠️ Less optimal paths (reactive not planned) - ⚠️ Odometry drift over long missions - ⚠️ Can't return to specific locations
MVP Verdict: Benefits >> Trade-offs for "find object" missions
Phase 0: MockRobot Foundation (1-2 days)¶
Objective¶
Enable rapid development and CI/CD without hardware dependency.
Deliverables¶
1. MockRobot Class (shadowhound_robot/mock_robot.py)
class MockRobot:
"""Mock robot for testing without hardware."""
def __init__(self):
self.pose = {"x": 0.0, "y": 0.0, "theta": 0.0}
self.local_planner = MockLocalPlanner()
self.camera = MockCamera()
self.lidar = MockLiDAR()
def move_vel_control(self, x: float, y: float, yaw: float):
"""Mock velocity control (updates internal pose)."""
self.pose["x"] += x * 0.1 # Simple integration
self.pose["theta"] += yaw * 0.1
def get_ros_video_stream(self) -> Observable:
"""Return mock video stream with test images."""
return rx.interval(0.033).pipe( # 30 FPS
ops.map(lambda _: self.camera.get_frame())
)
2. MockLocalPlanner (stub VFH interface)
class MockLocalPlanner:
def __init__(self):
self.goal_xy = None
self.goal_reached = False
def set_goal(self, goal_xy, frame="odom"):
self.goal_xy = goal_xy
self.goal_reached = False
def plan(self) -> Dict[str, float]:
"""Return mock velocity commands."""
if self.goal_xy is None:
return {"x_vel": 0.0, "angular_vel": 0.0}
# Simple proportional controller
dx = self.goal_xy[0] - robot.pose["x"]
dy = self.goal_xy[1] - robot.pose["y"]
distance = math.sqrt(dx**2 + dy**2)
if distance < 0.2:
self.goal_reached = True
return {"x_vel": 0.0, "angular_vel": 0.0}
return {"x_vel": 0.3, "angular_vel": math.atan2(dy, dx)}
def is_goal_reached(self) -> bool:
return self.goal_reached
3. Pytest Fixtures (tests/conftest.py)
import pytest
@pytest.fixture
def mock_robot():
"""Provide MockRobot for tests."""
return MockRobot()
@pytest.fixture
def mock_perception(mock_robot):
"""Provide mock perception stream."""
return MockPerceptionStream(mock_robot)
4. CI Workflow (.github/workflows/test.yml)
name: Unit Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install pytest pytest-cov
- name: Run unit tests
run: |
pytest tests/ --cov=shadowhound --cov-report=xml
- name: Upload coverage
uses: codecov/codecov-action@v3
Success Criteria¶
- ✅ MockRobot can be instantiated
- ✅ All existing tests pass with MockRobot
- ✅ CI workflow runs on every commit
- ✅ Test coverage > 80%
Effort Estimate¶
- Time: 1-2 days
- Resources: 1 developer
- Blockers: None
Phase 1: Local Planning Validation (2-3 days)¶
Objective¶
Validate VFH local planner works on Go2 hardware with real LiDAR.
Prerequisites¶
- [ ] go2_ros2_sdk publishing
/local_costmap/costmap - [ ]
/odomtopic available - [ ]
/cmd_veltopic connected - [ ] Go2 battery charged, safe test environment
Test Scenarios¶
Test 1: Straight Line Navigation
# Goal: Drive forward 2 meters
robot.local_planner.set_goal((2.0, 0.0), frame="base_link")
success = navigate_to_goal_local(robot, (2.0, 0.0), timeout=20.0)
assert success
# Validate
final_distance = calculate_distance(robot.pose, (2.0, 0.0))
assert final_distance < 0.3 # Within 30cm
Test 2: Obstacle Avoidance
# Place obstacle between robot and goal
# Goal: Navigate around obstacle
robot.local_planner.set_goal((2.0, 0.0), frame="base_link")
success = navigate_to_goal_local(robot, (2.0, 0.0), timeout=30.0)
assert success
assert no_collisions_detected() # Robot avoided obstacle
Test 3: Goal in Multiple Frames
# Test frame transformations work
robot.local_planner.set_goal((2.0, 1.0), frame="base_link") # Robot frame
robot.local_planner.set_goal((5.0, 3.0), frame="odom") # Odom frame
# Both should navigate correctly
Test 4: Recovery Behavior
# Place robot in corner (trapped)
# Goal: Recovery behavior activates
robot.local_planner.set_goal((2.0, 0.0), frame="base_link")
# Robot should:
# 1. Detect stuck condition
# 2. Execute recovery (rotate in place)
# 3. Either escape or timeout gracefully
Test 5: Dynamic Obstacles
# Human walks in front of robot
# Goal: Robot stops and waits
robot.local_planner.set_goal((3.0, 0.0), frame="base_link")
# Robot should:
# 1. Navigate toward goal
# 2. Stop when human appears
# 3. Resume when path clears
Parameter Tuning¶
Initial Parameters:
VFHPurePursuitPlanner(
safety_threshold=0.8, # 80cm from obstacles
max_linear_vel=0.5, # Conservative speed
max_angular_vel=1.0, # ~60°/s
lookahead_distance=1.0, # 1m lookahead
goal_tolerance=0.2, # 20cm goal accuracy
control_frequency=10.0, # 10 Hz control
)
Tuning Process:
1. Start conservative (slow speeds, large margins)
2. Run test scenarios
3. Adjust parameters based on behavior:
- Too cautious? Reduce safety_threshold, increase max_linear_vel
- Oscillating? Increase lookahead_distance
- Overshooting? Reduce goal_tolerance
Telemetry Collection¶
Log Data:
# For each navigation attempt, log:
{
"timestamp": "2025-10-14T10:30:00Z",
"test_id": "straight_line_2m",
"success": True,
"duration_seconds": 12.3,
"distance_traveled": 2.1,
"final_error_meters": 0.15,
"recovery_activations": 0,
"parameters": {
"safety_threshold": 0.8,
"max_linear_vel": 0.5,
# ...
}
}
Analysis: - Success rate per scenario - Average navigation time - Parameter sensitivity analysis
Success Criteria¶
- ✅ 90%+ success rate on all test scenarios
- ✅ No collisions during testing
- ✅ Recovery behaviors work correctly
- ✅ Frame transformations validated
- ✅ Parameters documented and tuned
Effort Estimate¶
- Time: 2-3 days
- Resources: 1 developer + Go2 hardware
- Blockers: go2_ros2_sdk local costmap availability
Phase 2: Perception Integration (1-2 days)¶
Objective¶
Connect YOLO object detection to local planning navigation.
Architecture¶
Video Stream (30 FPS)
↓
YOLO Detector (10 FPS sampling)
↓
Metric3D Depth Estimation
↓
3D Position Calculation (base_link)
↓
Transform to Odom Frame
↓
Local Planner Goal Setting
↓
VFH Navigation
Implementation¶
1. ObjectDetectionStream Setup
# shadowhound_perception/object_detection_node.py
from dimos.perception.object_detection_stream import ObjectDetectionStream
from dimos.perception.detection2d.yolo_2d_det import Yolo2DDetector
class PerceptionNode(Node):
def __init__(self):
super().__init__('perception_node')
# Initialize YOLO detector
self.detector = Yolo2DDetector(
model="yolo11n.pt", # Nano model for speed
device="cuda",
threshold=0.6
)
# Create detection stream
self.detection_stream = ObjectDetectionStream(
detector=self.detector,
camera_intrinsics=robot.camera_intrinsics,
transform_to_map=robot.ros_control.transform_pose,
video_stream=robot.get_ros_video_stream()
).get_stream()
# Subscribe to detections
self.detection_stream.pipe(
ops.sample(0.1), # 10 FPS
).subscribe(
on_next=self.on_detection,
on_error=self.on_error
)
def on_detection(self, detection):
"""Handle new detection."""
objects = detection.get("objects", [])
# Filter for target class
target_objects = [obj for obj in objects if obj["label"] == self.target_class]
if target_objects:
# Navigate to closest object
closest = min(target_objects, key=lambda obj: obj["position"]["distance"])
self.navigate_to_object(closest)
def navigate_to_object(self, obj):
"""Navigate to detected object."""
position = obj["position"]
# Set goal in odom frame (already transformed)
self.robot.local_planner.set_goal(
(position["x"], position["y"]),
frame="odom"
)
2. Frame Validation Tests
def test_frame_transformation():
"""Validate detections transform correctly."""
# Mock detection in base_link
detection_base_link = {
"position": {"x": 2.0, "y": 0.0, "z": 0.5},
"frame": "base_link"
}
# Transform to odom
detection_odom = transform_detection(detection_base_link, "odom")
# Validate
assert detection_odom["frame"] == "odom"
assert detection_odom["position"]["x"] > 0 # Forward in odom
def test_detection_to_navigation():
"""Test full pipeline: detection → navigation."""
# Place object at known position
object_pos_base_link = (2.0, 1.0)
# Detect object
detections = yolo_stream.pipe(ops.take(1)).run()
assert len(detections["objects"]) > 0
# Verify position accuracy
detected_pos = detections["objects"][0]["position"]
error = calculate_distance(detected_pos, object_pos_base_link)
assert error < 0.5 # Within 50cm (depth estimation error)
3. Integration Test (Real Hardware)
# Terminal 1: Launch robot + perception
ros2 launch shadowhound_bringup perception.launch.py
# Terminal 2: Place ball 2m in front of robot
# Terminal 3: Trigger detection
ros2 topic pub /perception/target_class std_msgs/String "data: 'ball'"
# Expected:
# 1. YOLO detects ball
# 2. Position calculated: (2.0, 0.0) in odom
# 3. Local planner set_goal((2.0, 0.0))
# 4. Robot navigates to ball
# 5. Stops within 1m of ball
Success Criteria¶
- ✅ YOLO detections appear in stream (10 FPS)
- ✅ Depth estimation working (< 20% error)
- ✅ Frame transforms correct (detection → odom)
- ✅ Navigation starts when object detected
- ✅ Robot reaches object (within 1m)
Effort Estimate¶
- Time: 1-2 days
- Resources: 1 developer + Go2 hardware
- Blockers: Phase 1 completion
Phase 3: End-to-End Mission (1 day)¶
Objective¶
Demonstrate complete embodied AI mission: "Find the red ball"
Mission Specification¶
Natural Language: "Find the red ball"
Decomposition: 1. Search: Rotate and scan environment with camera 2. Detect: YOLO finds "ball" candidates 3. Verify: VLM confirms "red" (optional for MVP) 4. Navigate: Local planner drives to ball 5. Stop: Stop within 1m of ball 6. Report: Confirm mission success
Implementation¶
Option A: YOLO Only (MVP Minimum)
class FindBallMission:
def __init__(self, robot):
self.robot = robot
self.target_class = "ball"
self.found = False
def execute(self):
# Phase 1: Search (rotate 360°)
logger.info("Searching for ball...")
self.rotate_and_scan()
# Phase 2: Detect + Navigate
detection_stream = self.create_detection_stream()
detection_stream.pipe(
ops.take(1) # First detection
).subscribe(
on_next=self.navigate_to_object,
on_error=self.on_failure
)
# Wait for completion
timeout = 60.0
start_time = time.time()
while not self.found and time.time() - start_time < timeout:
time.sleep(0.1)
if self.found:
logger.info("Mission complete: Ball found!")
return True
else:
logger.error("Mission failed: Ball not found")
return False
def rotate_and_scan(self):
"""Rotate 360° to scan for objects."""
for angle in range(0, 360, 45): # 45° increments
self.robot.local_planner.set_goal(
(0.0, 0.0), # Don't move
goal_theta=math.radians(angle)
)
time.sleep(2) # Wait for rotation
def navigate_to_object(self, detection):
"""Navigate to detected object."""
logger.info(f"Ball detected at {detection['position']}")
success = navigate_to_goal_local(
self.robot,
goal_xy_robot=(detection["position"]["x"],
detection["position"]["y"]),
distance=1.0, # Stop 1m away
timeout=30.0
)
self.found = success
Option B: YOLO + VLM (Better)
class FindRedBallMission:
def execute(self):
# Phase 1: YOLO finds "ball" candidates
yolo_stream = self.create_yolo_stream(target_class="ball")
# Phase 2: VLM verifies "red"
verified_stream = yolo_stream.pipe(
ops.sample(2.0), # Throttle VLM
ops.map(lambda det: self.verify_with_vlm(det, "Is this ball red?")),
ops.filter(lambda det: det["verified"])
)
# Phase 3: Navigate to verified red ball
verified_stream.pipe(ops.take(1)).subscribe(
on_next=self.navigate_to_object
)
Test Protocol¶
Setup: 1. Place red ball 3m in front of robot 2. Add distractors (blue ball, yellow ball) 3. Clear obstacles in path
Execution:
# Launch system
ros2 launch shadowhound_bringup full_stack.launch.py
# Send mission
ros2 service call /mission/execute shadowhound_interfaces/srv/ExecuteMission \
"command: 'Find the red ball'"
Expected Behavior: 1. Robot rotates to scan environment (5-10 seconds) 2. YOLO detects balls (blue, yellow, red) 3. VLM verifies red ball (optional) 4. Local planner navigates to red ball (10-15 seconds) 5. Robot stops 1m from red ball 6. Mission reports success
Success Metrics: - Mission completion time: < 30 seconds - Navigation accuracy: Within 1m of ball - Success rate: > 90% (10 trials)
Edge Cases to Test¶
- No object found: Robot rotates 360°, reports failure
- Object out of reach: Robot navigates as close as possible, reports partial success
- Multiple objects: Robot navigates to closest one
- Object moves: Robot re-detects and updates goal
- Path blocked: Local planner avoids obstacle, finds alternate path
Success Criteria¶
- ✅ Mission completes in < 30 seconds
- ✅ Robot finds correct object (red ball, not distractors)
- ✅ Navigation successful (reaches within 1m)
- ✅ No collisions during mission
- ✅ Success rate > 90% over 10 trials
Effort Estimate¶
- Time: 1 day
- Resources: 1 developer + Go2 hardware
- Blockers: Phase 1 and 2 completion
Phase 4: VLM Integration (Optional, +1-2 days)¶
Objective¶
Add semantic reasoning to handle nuanced queries.
Use Cases¶
- "Find the RED ball" (not just any ball)
- "Find the person in the blue shirt" (specific person)
- "Find the open door" (state reasoning)
Implementation¶
See hybrid_perception_architecture.md for detailed patterns.
Quick Integration (use existing DIMOS code):
from dimos.models.qwen.video_query import get_bbox_from_qwen_frame
def verify_with_vlm(detection, query: str) -> bool:
"""Verify detection with VLM."""
frame = detection["frame"]
bbox = detection["bbox"]
# Crop to object
x1, y1, x2, y2 = map(int, bbox)
roi = frame[y1:y2, x1:x2]
# Query Qwen VLM
vlm_bbox, confidence = get_bbox_from_qwen_frame(roi, object_name=query)
# If VLM finds object in ROI, verification succeeds
return vlm_bbox is not None and confidence > 0.5
Effort Estimate¶
- Time: 1-2 days
- Resources: 1 developer
- Blockers: Phase 3 completion
- Priority: Optional for MVP
Timeline Summary¶
Week 1: Core MVP¶
| Day | Phase | Deliverable | Status |
|---|---|---|---|
| 1 | MockRobot | CI/CD enabled | Not started |
| 2 | MockRobot | Tests passing | Not started |
| 3 | Local Planning | VFH validated on hardware | Not started |
| 4 | Local Planning | Parameters tuned | Not started |
| 5 | Perception | YOLO → Navigation working | Not started |
| 6 | Perception | Frame transforms validated | Not started |
| 7 | End-to-End | "Find the ball" mission working | Not started |
Deliverable: Working embodied AI mission (YOLO + local planning)
Week 2: Refinement (Optional)¶
| Day | Phase | Deliverable | Status |
|---|---|---|---|
| 1-2 | VLM | Semantic verification working | Optional |
| 3-4 | Mission Agent | Natural language interface | Optional |
| 5 | Documentation | User guide, demos | Optional |
Deliverable: Enhanced mission with VLM reasoning
Risk Assessment¶
High Risk Items¶
1. go2_ros2_sdk Local Costmap
- Risk: May not publish /local_costmap/costmap
- Impact: VFH planner can't run (blocks Phase 1)
- Mitigation: Check go2_ros2_sdk source, generate costmap from /scan if needed
- Probability: Medium (30%)
2. Frame Transform Errors - Risk: base_link → odom transforms incorrect - Impact: Robot navigates to wrong position - Mitigation: Extensive frame validation tests in Phase 2 - Probability: Medium (40%)
3. Depth Estimation Accuracy - Risk: Metric3D depth errors > 50cm - Impact: Goal positions inaccurate - Mitigation: Calibrate depth model, use stereo if available - Probability: Low (20%)
Medium Risk Items¶
4. VFH Parameter Tuning - Risk: Default parameters don't work well - Impact: Navigation jittery or fails - Mitigation: Systematic tuning in Phase 1 - Probability: High (60%), low impact
5. YOLO False Positives - Risk: Detects wrong objects - Impact: Navigates to wrong target - Mitigation: Add VLM verification - Probability: Medium (30%)
Low Risk Items¶
6. Odometry Drift - Risk: Odom frame drifts over time - Impact: Position errors accumulate - Mitigation: Short missions (< 1 min), visual goal updates - Probability: Low (10% for MVP missions)
Success Metrics¶
MVP Success Criteria (Must Have)¶
| Metric | Target | Measurement |
|---|---|---|
| Mission Success Rate | > 90% | 10 trials, "Find the ball" |
| Navigation Accuracy | < 1m error | Final distance to object |
| Mission Duration | < 30 seconds | Time to completion |
| Collision Rate | 0% | No collisions in 10 trials |
| Code Coverage | > 80% | Pytest coverage report |
Quality Metrics (Nice to Have)¶
| Metric | Target | Measurement |
|---|---|---|
| Detection Latency | < 200ms | Frame → navigation goal |
| Path Efficiency | > 70% | Actual distance / straight line distance |
| Recovery Success | > 80% | Successful escapes from stuck |
| Frame Transform Accuracy | < 10cm | Detection position error |
Hardware Requirements¶
Minimum¶
- ✅ Unitree Go2 robot
- ✅ Laptop with ROS2 (Ubuntu 22.04)
- ✅ Wi-Fi connection to Go2
- ✅ Test environment (5m x 5m clear space)
- ✅ Test objects (colored balls, boxes)
Optional (Improved Performance)¶
- ⏸️ External GPU (for VLM on laptop)
- ⏸️ Better depth camera (Intel RealSense)
- ⏸️ Motion capture system (ground truth validation)
Deliverables¶
Code¶
- [ ]
shadowhound_robot/mock_robot.py- MockRobot implementation - [ ]
shadowhound_perception/object_detection_node.py- Perception node - [ ]
shadowhound_mission/find_ball_mission.py- Mission implementation - [ ]
tests/- Unit and integration tests - [ ] Launch files for all phases
Documentation¶
- [ ]
docs/guides/local_planning_quickstart.md- Setup guide - [ ]
docs/guides/perception_integration.md- Integration guide - [ ]
docs/guides/mission_creation.md- Creating new missions - [ ] Video demos of working missions
Data¶
- [ ] Telemetry logs from Phase 1 testing
- [ ] Parameter tuning results
- [ ] Benchmark results (success rate, timing)
Next Steps After MVP¶
Phase 2: Advanced Capabilities (2-3 weeks)¶
Semantic Spatial Memory (NEW - Critical for Persistent Intelligence) - Integrate DIMOS SpatialMemory (CLIP embeddings) - Store observations with XY locations in ChromaDB - Query interface: "Where did I see X?" - Scene similarity for transfer learning - Memory-guided search (check memory before exploring) - Effort: 1-2 days (infrastructure exists in DIMOS)
Global Planning Integration - Add Nav2 + SLAM for multi-room navigation - Return to dock capability - Semantic location memory ("kitchen")
VLM Enhancement - Local VLM deployment (LLaVA on Thor GPU) - Advanced semantic reasoning - Multi-step verification - VLM queries spatial memory
Mission Agent - Natural language mission interface - Mission planning and decomposition - RAG context from spatial memory - Telemetry and logging
Phase 3: Persistent Intelligence (4-6 weeks)¶
Trajectory Logging with Scene Embeddings - Log all decisions and outcomes - CLIP embeddings for every frame - Link trajectories to spatial memory - WAL (Write-Ahead Logging) for durability - Message contracts for data consistency
Transfer Learning via Semantic Memory - Query similar past situations (scene embeddings) - Retrieve successful strategies - Apply learned parameters - "This hallway looks like that hallway where I got stuck"
Learning Loop - Offline analysis of trajectories - Scene similarity clustering (via CLIP embeddings) - Adapt parameters based on outcomes - Test improvements in Isaac Sim
Multi-Brain Deployment - Thor (mobile brainstem) - Spark (cortex for learning) - Tower (simulation/testing)
See persistent_intelligence_mvp.md for complete roadmap with semantic memory integration.
Conclusion¶
Key Takeaways¶
- Local planning is sufficient for MVP - No SLAM required
- Timeline dramatically reduced - 1 week vs 2-3 weeks
- Lower risk - Simpler stack, fewer dependencies
- Clear path forward - Well-defined phases and deliverables
Decision Points¶
After Phase 1 (Local Planning Validation): - ✅ Continue to Phase 2 if VFH works well - ⏸️ Debug/tune if navigation issues - ⚠️ Fallback to global planning if critical issues
After Phase 3 (End-to-End Mission): - ✅ Ship MVP if success rate > 90% - ⏸️ Add VLM if semantic reasoning needed - ⏸️ Add global planning if multi-room needed
Open Questions¶
- [ ] Does go2_ros2_sdk provide local costmap?
- [ ] What is costmap update rate?
- [ ] Camera calibration parameters available?
- [ ] Depth camera quality sufficient?
- [ ] VLM API keys / rate limits?
References¶
- Local Planning Architecture:
local_planning_architecture.md - Hybrid Perception:
hybrid_perception_architecture.md - Persistent Intelligence:
persistent_intelligence_dimos_integration.md - Original MVP:
../project_overview/mvp_embodied_ai_platform.md - DIMOS Documentation:
src/dimos-unitree/docs/