Troubleshooting Index¶
Purpose¶
Centralize troubleshooting guides to reduce mean time to recovery across hardware, software, and networking failures. This index covers startup validation, robot testing procedures, and diagnostic workflows.
Prerequisites¶
- Access to telemetry logs or observability dashboards
- Knowledge of the impacted subsystem
- Familiarity with ROS 2 diagnostics tools
Active Troubleshooting Guides¶
Startup & Validation¶
- Startup Validation Flow — Two-layer LLM backend validation (pre-flight checks + runtime)
- Start script pre-flight checks (fail fast)
- Mission agent runtime validation
- Ollama and OpenAI backend validation
Robot Testing¶
- Quick Start: Robot Testing — Complete testing procedure with local LLM
- GPU setup and monitoring (jtop)
- Ollama configuration (phi4:14b)
- End-to-end robot command testing
- Performance validation
Common Issues & Solutions¶
LLM Backend Issues¶
Symptom: Mission agent fails to start or hangs
Solution: See Startup Validation for pre-flight checks
Symptom: Slow or no responses from LLM
Solution: Check backend configuration in Backend Validation
Robot Connectivity Issues¶
Symptom: Robot not responding to commands
Solution:
1. Verify DDS connectivity: DDS Direct Test
2. Check WebRTC connection: WebRTC Direct Test
3. Validate network topology: Network Topologies
ROS 2 Topic Issues¶
Symptom: Topics not visible or no data
Diagnostic Commands:
# List all topics
ros2 topic list
# Check topic info
ros2 topic info /topic_name
# Echo topic data
ros2 topic echo /topic_name
# Check DDS discovery
ros2 daemon status
ros2 daemon stop # If needed to reset
ros2 daemon start
Diagnostic Workflow¶
1. Identify Subsystem¶
- Hardware: Power, sensors, networking → See Hardware Docs
- Software: ROS 2, agent, skills → See Software Docs
- Networking: DDS, WebRTC, WiFi → See Networking Docs
2. Gather Information¶
# Check system logs
journalctl -xe
# ROS 2 node status
ros2 node list
ros2 node info /node_name
# Network connectivity
ping 192.168.10.103 # GO2 robot
ping 192.168.10.1 # Router
# GPU status (on Thor)
jtop
3. Apply Solution¶
- Follow relevant troubleshooting guide
- Document resolution steps
- Update this index if new pattern found
4. Verify Resolution¶
- Test the fixed functionality
- Monitor for recurrence
- Update telemetry/alerting if needed
Steps¶
- Identify affected subsystem using diagnostic workflow above
- Gather diagnostic information (logs, topic status, network connectivity)
- Follow relevant troubleshooting guide from Active Guides section
- Verify resolution and document lessons learned
Validation¶
- [ ] Each troubleshooting guide tested on current build
- [ ] Diagnostic commands validated and produce expected output
- [ ] Resolution procedures documented with verification steps
- [ ] Cross-links to related docs verified
See Also¶
- LLM Backend Validation — Runtime backend health checks
- DDS Direct Test — ROS 2 connectivity validation
- WebRTC Direct Test — Robot WiFi validation
- Network Topologies — Wiring and connectivity reference
- Start Script — Startup sequence and validation
References¶
- Documentation Root
- Hardware Index
- Software Index
- Networking Index
- ROS 2 Troubleshooting: https://docs.ros.org/en/humble/Tutorials/Beginner-CLI-Tools.html