ShadowHound Performance Analysis & Optimization Plan¶
Purpose¶
Preserve historical context while signaling that this page requires verification against the current workflow.
Prerequisites¶
- Review the legacy notes below to understand original assumptions and instructions.
- Cross-check commands and links with the latest tooling before execution.
Steps¶
- Read through the legacy notes captured under Legacy Notes and flag outdated guidance.
- Update or replace the content with validated procedures as time permits.
- Record verification outcomes in the validation checklist and mark follow-up tasks in the backlog.
Legacy Notes¶
Date: October 7, 2025
Branch: feature/dimos-integration
Status: Pre-VLM Integration Analysis
🎯 Objective¶
Identify and fix latency issues in the DIMOS-integrated mission execution pipeline before adding VLM integration, which will add additional 1-3 seconds per query.
📊 Current State¶
What's Working¶
- ✅ Vision skills package implemented and tested
- ✅ Web UI with camera feed
- ✅ Nav2 and teleop: Very responsive
- ✅ Basic infrastructure solid
Observed Issues¶
- ⏱️ Agent response latency: Noticeable delay between command and response
- 🤔 Unclear bottleneck: Cloud API vs DIMOS skill execution
Key Question¶
Is the delay from OpenAI cloud API calls or DIMOS skill execution?
🔍 Investigation Plan¶
Phase 1: Instrumentation ✅ DONE¶
Commit: f248557
Added timing breakdown to mission_executor.py:
⏱️ Timing breakdown:
Agent call: X.XXs # LLM + skill execution
Total: X.XXs # End-to-end
Phase 2: Data Collection (NEXT)¶
Test Scenarios¶
-
Simple Commands (Baseline) ```bash # Test these commands and record timing
take one step forward rotate 90 degrees stop ``` Expected: 1-2s (mostly cloud API)
-
Navigation Commands (Nav2 skills) ```bash
go to the kitchen navigate to waypoint A ``` Expected: 2-5s (API + skill execution + robot movement)
-
Complex Commands (Multi-step) ```bash
patrol the hallway explore the room and return ``` Expected: Variable (depends on planning)
Metrics to Collect¶
For each command:
- Agent call duration: agent_duration (from logs)
- Total duration: total_duration (from logs)
- User-perceived latency: Time from pressing enter to seeing response
- Robot response time: Time from response to robot starting movement
Data Collection Template¶
| Command | Agent Call | Total | User Perceived | Robot Action | Notes |
|---------|-----------|-------|----------------|--------------|-------|
| "forward" | 1.2s | 1.3s | ~1.5s | Immediate | Quick |
| "kitchen" | 3.5s | 3.6s | ~4s | 0.5s delay | Nav2 plan |
Phase 3: Analysis¶
Expected Breakdown¶
User Command → Mission Executor → DIMOS Agent → OpenAI API → Skill Execution → Robot
0ms +50ms +100ms +1-2s +50-500ms Response
Hypothesis: - 1-2s: OpenAI API call (GPT-4-turbo inference) - 50-500ms: DIMOS skill execution (varies by skill) - 50-100ms: ROS/Python overhead
Phase 4: Optimization Strategies¶
Based on findings, choose approach:
If Bottleneck = Cloud API (Most Likely)¶
Options: 1. Switch to faster model - gpt-4-turbo → gpt-3.5-turbo (3-5x faster, cheaper) - Trade-off: Less capable reasoning
- Optimize prompts
- Shorter system prompts
- Remove unnecessary context
-
Stream responses (show progress)
-
Local LLM (future)
- Ollama with llama3 or mistral
- 100-500ms inference on good hardware
-
Trade-off: Setup complexity, need GPU
-
Hybrid approach
- Simple commands → local/fast model
- Complex commands → GPT-4
- Best of both worlds
If Bottleneck = DIMOS Skill Execution¶
Options: 1. Profile individual skills - Add timing to each skill - Identify slow skills
- Optimize skill implementations
- Remove unnecessary waits
- Parallelize where possible
-
Cache expensive operations
-
Skill execution feedback
- Show "Executing..." message immediately
- Stream progress updates
- Better UX even if not faster
If Bottleneck = ROS/Communication¶
Options: 1. Optimize topic communication - Use compressed messages - Reduce message size - Batch operations
- Reduce unnecessary callbacks
- Profile callback timing
- Combine multiple callbacks
🎬 VLM Integration Impact¶
Current Vision Skills Timing¶
From testing:
- vision.snapshot: <50ms (local, no network)
- vision.describe_scene: 1-3s (VLM API call)
- vision.locate_object: 1-3s (VLM API call)
- vision.detect_objects: 1-3s (VLM API call)
Total Mission Timing with VLM¶
Example: "describe what you see and move forward"
User Command
↓ 50ms (ROS overhead)
DIMOS Agent
↓ 1-2s (OpenAI LLM: understand command + plan)
Vision Skill (describe_scene)
↓ 1-3s (Qwen VLM API: analyze image)
Navigation Skill (move forward)
↓ 50-500ms (execute movement)
Response
↓
Total: 2.5-6s
Implication: Need to optimize base latency first!
🚀 Recommended Path Forward¶
Option A: Optimize First, Then Integrate VLM ⭐ RECOMMENDED¶
Timeline: 2-3 hours Benefit: Fast baseline + optimized VLM experience
- ✅ Add timing instrumentation (DONE)
- ⏳ Collect timing data (15 min)
- ⏳ Analyze bottlenecks (15 min)
- ⏳ Implement optimizations (1-2 hours)
- Likely: Switch to gpt-3.5-turbo for simple commands
- Add streaming/progress feedback
- ⏳ Test and verify improvements
- ⏳ Then integrate VLM with optimized base
Pro: Clean, fast system. VLM adds to solid foundation. Con: Delays VLM integration by 2-3 hours.
Option B: Integrate VLM Now, Optimize Later¶
Timeline: 1-2 hours for VLM, unknown for optimization Benefit: VLM working quickly
- ⏳ Wire vision skills to mission_agent
- ⏳ Register with DIMOS MyUnitreeSkills
- ⏳ Test end-to-end vision missions
- ⏳ Deal with 4-8s total latency
- ⏳ Optimize later (harder with more complexity)
Pro: VLM features available sooner. Con: Slower system, harder to optimize with VLM complexity added.
Option C: Parallel Development¶
Timeline: 2-3 hours total Benefit: Both done simultaneously
- ⏳ You: Test missions, collect timing data, analyze
- ⏳ Me: Integrate VLM skills with mission_agent
- ⏳ Merge both when ready
Pro: Fastest total time. Con: Requires coordination, potential merge conflicts.
📋 Action Items¶
Immediate (Option A - Recommended)¶
# 1. Rebuild with timing instrumentation
cd /workspaces/shadowhound
colcon build --packages-select shadowhound_mission_agent
source install/setup.bash
# 2. Launch agent
ros2 launch shadowhound_bringup shadowhound_bringup.launch.py
# 3. Test commands and watch logs for timing
# In web UI or via:
ros2 topic pub /mission_command std_msgs/String "data: 'take one step forward'"
# 4. Collect timing from logs:
# Look for: "⏱️ Timing breakdown:"
Data to Collect¶
| Command Type | Example | Expected Agent Call | Notes |
|---|---|---|---|
| Simple motion | "forward" | 1-2s | Baseline |
| Simple motion | "rotate 90" | 1-2s | Baseline |
| Navigation | "go to kitchen" | 2-4s | Nav2 planning |
| Complex | "patrol" | 3-6s | Multi-step |
Analysis Questions¶
- Is agent_duration consistently 1-2s? → Cloud API bottleneck
- Does agent_duration vary widely? → Skill execution varies
- Is total_duration much > agent_duration? → Overhead issue
🎯 Success Criteria¶
Before VLM Integration¶
- ✅ Understand where latency comes from
- ✅ Optimize bottlenecks (target: <2s for simple commands)
- ✅ Add progress feedback for better UX
- ✅ Document baseline performance
After VLM Integration¶
- ✅ Vision missions work end-to-end
- ✅ Total latency <5s for vision+action commands
- ✅ Good UX with progress indicators
- ✅ System feels responsive
📚 References¶
- Timing code:
mission_executor.pyline 214 - Agent call:
run_observable_query().run()(DIMOS OpenAIAgent) - Vision skills:
src/shadowhound_skills/shadowhound_skills/vision.py - DIMOS docs:
docs/dimos_vision_capabilities.md
🔮 Future Optimizations¶
Short-term (After VLM)¶
- [ ] Stream LLM responses (show thinking in real-time)
- [ ] Parallel skill execution where possible
- [ ] Cache common queries/responses
- [ ] Compressed image topic (reduce bandwidth)
Long-term¶
- [ ] Local LLM for simple commands
- [ ] Hybrid cloud/local routing
- [ ] Predictive skill loading
- [ ] Advanced caching with embeddings
💡 Recommendation¶
Start with Option A: Optimize base system first (2-3 hours), then add VLM.
Why: VLM will add 1-3s per query. If base system is already slow, VLM missions will be 4-8s total - poor UX. Fix the foundation first, then build on it.
Next Step: Run test missions and collect timing data. Should take ~15 minutes to understand the bottleneck clearly.
Validation¶
- [ ] Legacy guidance reviewed for accuracy and converted to the new workflow where applicable.
- [ ] Links updated to use vault-friendly wikilinks or confirmed for external references.
- [ ] Outstanding migration work captured as tasks in the backlog.