Quick Start: Robot Testing with phi4:14b¶
Date: 2025-10-11
Goal: Test Ollama local LLM (phi4:14b) with GO2 robot
Time Estimate: 2-3 hours total
☕ Morning Setup (30 min)¶
Step 1: Reboot Thor for Clean GPU State¶
# On Thor
sudo reboot
Wait ~2 minutes for system to come back up
Step 2: Install jtop for GPU Monitoring¶
# SSH back into Thor after reboot
ssh thor
# Pull latest code
cd ~/shadowhound
git pull origin feature/local-llm-support
# Install jtop (requires sudo, ~10 min)
sudo ./scripts/install_jtop_thor.sh
# Verify installation
systemctl status jtop.service # Should be active (running)
sudo jtop # Should show GPU metrics, not N/A
# Press 'q' to exit
Success Check: - ✅ jtop shows GPU memory (MiB/GiB) - ✅ jtop shows GPU utilization % - ✅ No "N/A" values
Step 3: Verify Ollama Container Running¶
# On Thor
docker ps | grep ollama
# Should show: ghcr.io/nvidia-ai-iot/ollama:r38.2.arm64-sbsa-cu130-24.04
# If not running, start it
./scripts/setup_ollama_thor.sh
Step 4: Test phi4:14b Baseline Performance¶
# On Thor
docker exec ollama ollama run --verbose phi4:14b "Count from 1 to 10, one number per line."
# Watch the output for speed metrics:
# eval rate: should be 15-20 tok/s ✅
# If <10 tok/s, something is wrong ❌
Example Good Output:
1
2
3
...
10
total duration: 2.5s
load duration: 800ms
prompt eval rate: 50 tok/s
eval rate: 18.5 tok/s ← CHECK THIS!
If slow (<10 tok/s): GPU degradation issue present, see troubleshooting below
Step 5: Setup Monitoring Terminals¶
On Thor, open 3 terminal windows:
Terminal 1 (jtop - GPU monitoring):
sudo jtop
Leave this open to watch GPU memory and utilization
Terminal 2 (Docker stats - container monitoring):
watch -n 5 'docker stats ollama --no-stream'
Terminal 3 (Ollama logs - if needed):
docker logs -f ollama
🤖 Robot Testing (2 hours)¶
Pre-Flight Checklist¶
- [ ] Thor rebooted (clean GPU state)
- [ ] jtop installed and showing metrics
- [ ] phi4:14b baseline verified (15-20 tok/s)
- [ ] GO2 robot powered on
- [ ] ROS2 bridge running (go2_ros2_sdk on Thor)
- [ ] Monitoring terminals open
Test Phase 1: Launch Mission Agent (10 min)¶
On laptop in devcontainer:
# Terminal 1: Launch mission agent
ros2 launch shadowhound_mission_agent mission_agent.launch.py \
agent_backend:=ollama \
ollama_base_url:=http://192.168.50.10:11434 \
ollama_model:=phi4:14b \
web_host:=0.0.0.0 \
web_port:=8080
Watch for:
============================================================
🔍 VALIDATING LLM BACKEND CONNECTION
============================================================
Testing ollama backend...
✅ Ollama service responding
✅ Model 'phi4:14b' available
✅ Test prompt succeeded
============================================================
✅ Ollama backend validation PASSED
============================================================
Check jtop on Thor: GPU memory should jump to ~7-8GB when model loads
Open browser: http://localhost:8080 - Should show: Backend: OLLAMA, Model: phi4:14b
✅ Success: Mission agent running, backend validated, dashboard accessible
Test Phase 2: Text-Only Tests (15 min)¶
Test 1: Simple Response
Mission: "Describe what you are in one sentence"
Expected: Fast response (<3s), coherent answer
Test 2: JSON Generation
Mission: "Generate JSON with these keys: action='test', status='ok', value=42"
Expected: Valid JSON output, <5s response
Test 3: Navigation Plan
Mission: "Create a navigation plan to move forward 2 meters then rotate 90 degrees left"
Expected JSON:
{
"steps": [
{"action": "nav.goto", "params": {"x": 2.0, "y": 0.0}},
{"action": "nav.rotate", "params": {"yaw": 1.57}}
]
}
✅ Success Criteria: - All 3 responses coherent - Response times <5 seconds - JSON valid and correct - No errors in logs
Test Phase 3: Robot Motion (30 min)¶
⚠️ Safety First: - Clear area around robot (3m radius) - Emergency stop ready - Start with small movements
Test 4: Simple Forward Motion
Mission: "Move forward 0.5 meters"
Watch for:
1. LLM generates plan (JSON with nav.goto)
2. /cmd_vel topic starts publishing
3. Robot moves forward ~0.5m
4. Mission reports success
On laptop Terminal 2:
ros2 topic echo /cmd_vel
✅ Success: Robot moves approximately correct distance
Test 5: Rotation
Mission: "Rotate 45 degrees to the left"
Expected: Robot turns left ~45 degrees
Test 6: Multi-Step Mission
Mission: "Move forward 1 meter, rotate 90 degrees right, then move forward 1 meter"
Expected: - LLM generates 3-step plan - Robot executes each step in sequence - Final position ~1m forward, ~1m right of start
Test 7: Error Handling
Mission: "Move forward 100 meters"
(Assuming you don't have 100m of clear space)
Expected: - LLM generates plan - Navigation attempts execution - Nav2 reports error (obstacle/timeout) - System doesn't crash - Clear error message to user
✅ Success Criteria: - 4+ robot motions successful - Movements approximately correct (±20% tolerance) - System stable, no crashes - Error handling works
Test Phase 4: Performance Check (15 min)¶
Run 5 missions consecutively:
1. "Move forward 0.5m"
2. "Rotate left 45 degrees"
3. "Move forward 0.5m"
4. "Rotate right 45 degrees"
5. "Return to starting position"
Monitor on Thor (jtop): - GPU memory: Should stay ~7-8GB (stable) - GPU utilization: Spikes during LLM responses - Temperature: <80°C (typical for Orin)
Time each mission: Response time should stay <5 seconds
✅ Success Criteria: - All 5 missions complete successfully - Response times consistent (<5s, low variance) - GPU memory stable (no leaks) - No performance degradation
⚠️ If Degradation Occurs: - Note the pattern (after which mission?) - Check jtop for memory/thermal issues - This is the known GPU degradation issue - Continue testing but document behavior
📊 Results Documentation (15 min)¶
Update Deployment Checklist¶
Open docs/OLLAMA_DEPLOYMENT_CHECKLIST.md and fill in:
Test Results Summary:
Date Tested: 2025-10-11
Tested By: [Your Name]
System: Thor + GO2 Robot
Overall Results: [ ] PASS / [ ] PASS WITH ISSUES / [ ] FAIL
Performance Summary:
| Metric | Target | Actual | Status |
|---------------------|--------|--------|--------|
| Mission Success | >95% | ____% | ✅/❌ |
| Avg Response Time | <5s | ___s | ✅/❌ |
| Memory Stability | Stable | Stable/Leak | ✅/❌ |
| Robot Motion Accuracy | ±20% | ____% | ✅/❌ |
Issues Encountered:
1. Issue: [Description]
Severity: Critical / High / Medium / Low
Workaround: [If any]
🎯 Decision Time¶
If All Tests PASS ✅¶
You're ready to merge!
- Update main README.md with Ollama setup instructions
- Merge feature/local-llm-support → dev
- Tag release:
v1.1.0-ollama-support - Document production model: phi4:14b
- 🎉 Celebrate!
cd /workspaces/shadowhound
git checkout dev
git merge feature/local-llm-support
git push
git tag -a v1.1.0 -m "Add Ollama local LLM support with phi4:14b"
git push --tags
If Tests FAIL ❌¶
Common issues and fixes:
Issue: Slow responses (>10s)
- Check baseline: docker exec ollama ollama run --verbose phi4:14b "test"
- If baseline also slow: GPU degradation, reboot Thor
- If only during robot test: Check CPU/memory on Thor (htop)
Issue: Invalid JSON - Try simpler prompt: "Generate JSON: {action: test}" - May need to switch to qwen2.5-coder:32b (better JSON) - Check LLM prompt templates in mission_agent.py
Issue: Robot doesn't move
- Check ROS2 topics: ros2 topic list | grep cmd_vel
- Check nav2 stack: ros2 node list | grep nav
- Check robot bridge: ros2 topic echo /robot_state
Issue: Mission agent crashes - Check logs for specific error - Verify backend validation passed - Check Thor network connectivity - Try restarting mission agent
If GPU Degradation Occurs 🟡¶
This is a known issue, document it:
- Note when it occurred (after how many missions?)
- Measure the degradation (baseline speed vs current)
- Check jtop for:
- Memory fragmentation
- Clock speed reduction
- Thermal throttling
- Document in
docs/OLLAMA_STATUS_AND_TODOS.md - Decision:
- If degradation is gradual (>20 missions): ACCEPTABLE, merge with known issue
- If degradation is immediate (<5 missions): INVESTIGATE before merge
🚨 Emergency Procedures¶
Robot Behaving Unsafely¶
# STOP EVERYTHING
Ctrl+C in mission agent terminal
# Or emergency stop on robot (physical button)
Ollama Container Crashes¶
# On Thor
docker stop ollama
docker rm ollama
./scripts/setup_ollama_thor.sh
Mission Agent Won't Start¶
# Check backend manually
curl http://192.168.50.10:11434/api/tags
# If no response, check Thor:
ssh thor
docker ps | grep ollama
Forgot to Reboot Thor?¶
If baseline performance is slow:
# Stop testing
# On Thor
sudo reboot
# Wait and restart from Step 1
📚 Reference Documents¶
- Full deployment checklist:
docs/OLLAMA_DEPLOYMENT_CHECKLIST.md - Status and TODOs:
docs/OLLAMA_STATUS_AND_TODOS.md - Performance notes:
docs/THOR_PERFORMANCE_NOTES.md - Backend validation:
docs/LLM_BACKEND_VALIDATION.md - jtop security analysis:
docs/SECURITY_ANALYSIS_JTOP.md
🎯 Quick Success Checklist¶
Before Testing¶
- [ ] Thor rebooted
- [ ] jtop installed and showing GPU metrics
- [ ] phi4:14b baseline verified (15-20 tok/s)
- [ ] GO2 powered on
- [ ] Monitoring terminals open
During Testing¶
- [ ] Mission agent starts successfully
- [ ] Backend validation passes
- [ ] 3+ text-only tests pass
- [ ] 3+ robot motion tests pass
- [ ] Performance stable over 5+ missions
After Testing¶
- [ ] Results documented in checklist
- [ ] Issues documented (if any)
- [ ] Decision made: merge or investigate
- [ ] Celebrate or plan next steps! 🚀
Good luck! You've got this! 💪
Remember: The goal is to validate phi4:14b works well enough for robot testing. Perfect performance not required - just stable, safe, and functional.
If you hit any blockers, see:
- docs/OLLAMA_STATUS_AND_TODOS.md - Known issues and TODOs
- docs/THOR_PERFORMANCE_NOTES.md - GPU degradation workarounds
Last Updated: 2025-10-10 23:30
Next Update: After testing 2025-10-11