ShadowHound Architecture Review Summary¶
Date: October 3, 2025
Changes Made¶
1. Comprehensive Architecture Design¶
Problem: Documentation was aspirational with vague boundaries and missing implementation details.
Solution: Created a clear four-layer architecture with explicit responsibilities:
Figure 1: Complete system architecture showing the layered design from Web UI through Mission Agent, DIMOS Skills, ROS2 Bridge, to GO2 hardware.
- Application Layer: Launch files, configs, deployment
- Agent Layer: LLM/VLM orchestration, mission planning
- Skills Layer: Execution engine with safety guards
- Robot Layer: ROS2 bridge to hardware
- Hardware Layer: go2_ros2_sdk (external)
2. Concrete Package Structure¶
Before: Generic package names with unclear purposes After: Five core packages with clear roles:
shadowhound_interfaces/ → Custom ROS2 messages/services/actions
shadowhound_robot/ → Hardware interface (go2_ros2_sdk bridge)
shadowhound_skills/ → Skills registry + implementations
shadowhound_agent/ → Mission planner + LLM integration
shadowhound_bringup/ → Launch files and configurations
Each package now has detailed internal structure documented.
3. Clear Data Flows¶
Command Flow: User → Agent → Skills → Robot → Hardware Feedback Flow: Hardware → Robot → Skills → Agent → User
Both flows now explicitly documented with interfaces at each boundary.
4. Phase-Based Implementation Plan¶
Before: Phases marked as "complete" with no actual code After: Honest status assessment and actionable phases:
- Phase 0 (CURRENT): Bootstrap - Package scaffolding
- Phase 1: Basic Skills - Implement without hardware
- Phase 2: Robot Integration - Connect to Go2
- Phase 3: Agent Integration - Natural language missions
- Phase 4: Advanced Navigation - Mapping and autonomy
- Phase 5: Vision & Perception - VLM integration
- Phase 6: Onboard Deployment - Run on Jetson Thor
Each phase has specific tasks and exit criteria.
5. Skills API Specification¶
Before: Abstract concept without implementation guidance After: Complete specification with: - Registration pattern with decorators - Validation requirements - Result structure (SkillResult with data + telemetry) - Safety requirements (timeouts, clamps, error handling) - Code examples for implementation and usage
6. Robot Interface Patterns¶
New: Explicit separation between Skills and ROS topics - Skills never publish directly to /cmd_vel - All hardware access through RobotInterface - Safety clamps built into interface - State monitoring and aggregation
7. Updated Documentation¶
Files Updated:
1. docs/project.md (Backed up old version)
- Complete architecture with diagrams
- Full package structure with file listings
- Skills API reference
- Development workflow
- All 6 implementation phases
- Safety constraints and ROS topics
- Configuration and deployment guides
.github/copilot-instructions.md- Phase-aware guidance
- Code templates for skills
- Common patterns and anti-patterns
- Troubleshooting guides
-
Testing patterns
-
README.md - User-focused quick start
- Clear status (Phase 0)
- Architecture overview
- Development workflow
- Skills API examples
- Roadmap with checkboxes
Key Architectural Decisions¶
1. Skills-First Design¶
- All robot control goes through Skills API
- No direct ROS topic publishing from agents
- Enforces safety, validation, and telemetry
2. Type-Safe Interfaces¶
- Python type hints required
- Parameter validation before execution
- Structured result objects (not bare returns)
3. Safety by Default¶
- Every skill has timeout
- Velocity commands clamped to safe ranges
- Input validation required
- Telemetry for all operations
4. Layered Responsibilities¶
- Agent doesn't know about ROS topics
- Skills don't know about LLMs
- Robot interface doesn't know about missions
- Clear separation of concerns
5. Container-First Development¶
- All development in devcontainer
- Consistent environment across team
- Pre-configured tools and aliases
What's Next¶
Immediate (Phase 0)¶
- Create
shadowhound_interfacespackage with custom messages - Create skeleton packages for robot, skills, agent, bringup
- Set up basic launch files
- Create
shadowhound.reposfor go2_ros2_sdk - Verify clean build:
cbsucceeds without errors
Short Term (Phase 1)¶
- Implement Skills registry and executor
- Create 4 basic skills (say, stop, rotate, snapshot)
- Write unit tests for all skills
- CLI tool for testing skills
Medium Term (Phase 2-3)¶
- Import and integrate go2_ros2_sdk
- Implement RobotInterface
- Test skills on real hardware
- Add LLM client and mission planner
- Natural language mission execution
Documentation Structure¶
ShadowHound/
├── README.md # Quick start, user-focused
├── docs/
│ └── project.md # Complete architecture (THIS IS THE SOURCE OF TRUTH)
├── .github/
│ └── copilot-instructions.md # AI coding agent guide
└── src/
├── shadowhound_interfaces/
├── shadowhound_robot/
├── shadowhound_skills/
├── shadowhound_agent/
└── shadowhound_bringup/
Documentation Hierarchy:
1. docs/project.md - Architecture authority
2. .github/copilot-instructions.md - Development patterns
3. README.md - User onboarding
4. Package READMEs - Implementation details
Success Metrics¶
Phase 0 Complete When:¶
- [ ] All 5 packages created with proper structure
- [ ]
cbbuilds workspace without errors - [ ]
ros2 pkg list | grep shadowhoundshows all packages - [ ] Each package has README.md
Phase 1 Complete When:¶
- [ ] Skills registry functional
- [ ] 4 basic skills implemented and tested
- [ ] Unit tests pass:
pytest src/shadowhound_skills/test/ - [ ] Skills callable via CLI
Overall Project Complete When:¶
- [ ] Demo mission works: "Find the blue ball in the kitchen"
- [ ] Natural language → navigation → perception → reporting
- [ ] Works on real Go2 hardware
- [ ] Deployable on Jetson Thor
Feedback & Iteration¶
This architecture provides: - ✅ Clear boundaries between layers - ✅ Testable components (can test skills without hardware) - ✅ Safe defaults (all control through validated Skills API) - ✅ Extensible (easy to add new skills) - ✅ Maintainable (clear responsibilities)
Questions for team: 1. Does the package structure make sense? 2. Are the phase priorities correct? 3. Should we add any additional safety mechanisms? 4. Any concerns about the Skills API design?