ShadowHound Architecture Review Summary¶

Date: October 3, 2025

Changes Made¶

1. Comprehensive Architecture Design¶

Problem: Documentation was aspirational with vague boundaries and missing implementation details.

Solution: Created a clear four-layer architecture with explicit responsibilities:

Figure 1: Complete system architecture showing the layered design from Web UI through Mission Agent, DIMOS Skills, ROS2 Bridge, to GO2 hardware.

Application Layer: Launch files, configs, deployment
Agent Layer: LLM/VLM orchestration, mission planning
Skills Layer: Execution engine with safety guards
Robot Layer: ROS2 bridge to hardware
Hardware Layer: go2_ros2_sdk (external)

2. Concrete Package Structure¶

Before: Generic package names with unclear purposes After: Five core packages with clear roles:

shadowhound_interfaces/   → Custom ROS2 messages/services/actions
shadowhound_robot/        → Hardware interface (go2_ros2_sdk bridge)
shadowhound_skills/       → Skills registry + implementations
shadowhound_agent/        → Mission planner + LLM integration
shadowhound_bringup/      → Launch files and configurations

Each package now has detailed internal structure documented.

3. Clear Data Flows¶

Command Flow: User → Agent → Skills → Robot → Hardware Feedback Flow: Hardware → Robot → Skills → Agent → User

Both flows now explicitly documented with interfaces at each boundary.

4. Phase-Based Implementation Plan¶

Before: Phases marked as "complete" with no actual code After: Honest status assessment and actionable phases:

Phase 0 (CURRENT): Bootstrap - Package scaffolding
Phase 1: Basic Skills - Implement without hardware
Phase 2: Robot Integration - Connect to Go2
Phase 3: Agent Integration - Natural language missions
Phase 4: Advanced Navigation - Mapping and autonomy
Phase 5: Vision & Perception - VLM integration
Phase 6: Onboard Deployment - Run on Jetson Thor

Each phase has specific tasks and exit criteria.

5. Skills API Specification¶

Before: Abstract concept without implementation guidance After: Complete specification with: - Registration pattern with decorators - Validation requirements - Result structure (SkillResult with data + telemetry) - Safety requirements (timeouts, clamps, error handling) - Code examples for implementation and usage

6. Robot Interface Patterns¶

New: Explicit separation between Skills and ROS topics - Skills never publish directly to /cmd_vel - All hardware access through RobotInterface - Safety clamps built into interface - State monitoring and aggregation

7. Updated Documentation¶

Files Updated: 1. docs/project.md (Backed up old version) - Complete architecture with diagrams - Full package structure with file listings - Skills API reference - Development workflow - All 6 implementation phases - Safety constraints and ROS topics - Configuration and deployment guides

.github/copilot-instructions.md
Phase-aware guidance
Code templates for skills
Common patterns and anti-patterns
Troubleshooting guides
Testing patterns
README.md
User-focused quick start
Clear status (Phase 0)
Architecture overview
Development workflow
Skills API examples
Roadmap with checkboxes

Key Architectural Decisions¶

1. Skills-First Design¶

All robot control goes through Skills API
No direct ROS topic publishing from agents
Enforces safety, validation, and telemetry

2. Type-Safe Interfaces¶

Python type hints required
Parameter validation before execution
Structured result objects (not bare returns)

3. Safety by Default¶

Every skill has timeout
Velocity commands clamped to safe ranges
Input validation required
Telemetry for all operations

4. Layered Responsibilities¶

Agent doesn't know about ROS topics
Skills don't know about LLMs
Robot interface doesn't know about missions
Clear separation of concerns

5. Container-First Development¶

All development in devcontainer
Consistent environment across team
Pre-configured tools and aliases

What's Next¶

Immediate (Phase 0)¶

Create shadowhound_interfaces package with custom messages
Create skeleton packages for robot, skills, agent, bringup
Set up basic launch files
Create shadowhound.repos for go2_ros2_sdk
Verify clean build: cb succeeds without errors

Short Term (Phase 1)¶

Implement Skills registry and executor
Create 4 basic skills (say, stop, rotate, snapshot)
Write unit tests for all skills
CLI tool for testing skills

Medium Term (Phase 2-3)¶

Import and integrate go2_ros2_sdk
Implement RobotInterface
Test skills on real hardware
Add LLM client and mission planner
Natural language mission execution

Documentation Structure¶

ShadowHound/
├── README.md                          # Quick start, user-focused
├── docs/
│   └── project.md                     # Complete architecture (THIS IS THE SOURCE OF TRUTH)
├── .github/
│   └── copilot-instructions.md        # AI coding agent guide
└── src/
    ├── shadowhound_interfaces/
    ├── shadowhound_robot/
    ├── shadowhound_skills/
    ├── shadowhound_agent/
    └── shadowhound_bringup/

Documentation Hierarchy: 1. docs/project.md - Architecture authority 2. .github/copilot-instructions.md - Development patterns 3. README.md - User onboarding 4. Package READMEs - Implementation details

Success Metrics¶

Phase 0 Complete When:¶

[ ] All 5 packages created with proper structure
[ ] cb builds workspace without errors
[ ] ros2 pkg list | grep shadowhound shows all packages
[ ] Each package has README.md

Phase 1 Complete When:¶

[ ] Skills registry functional
[ ] 4 basic skills implemented and tested
[ ] Unit tests pass: pytest src/shadowhound_skills/test/
[ ] Skills callable via CLI

Overall Project Complete When:¶

[ ] Demo mission works: "Find the blue ball in the kitchen"
[ ] Natural language → navigation → perception → reporting
[ ] Works on real Go2 hardware
[ ] Deployable on Jetson Thor

Feedback & Iteration¶

This architecture provides: - ✅ Clear boundaries between layers - ✅ Testable components (can test skills without hardware) - ✅ Safe defaults (all control through validated Skills API) - ✅ Extensible (easy to add new skills) - ✅ Maintainable (clear responsibilities)

Questions for team: 1. Does the package structure make sense? 2. Are the phase priorities correct? 3. Should we add any additional safety mechanisms? 4. Any concerns about the Skills API design?