LLM Backend Validation¶
Overview¶
The ShadowHound mission agent now includes startup validation for LLM backends. This catches configuration errors immediately when the node starts, rather than failing silently when the first mission is attempted.
Why This Matters¶
Before validation: - Mission agent appears to start successfully - First mission command fails with cryptic errors - User must debug to find root cause (wrong URL, model not loaded, service down, etc.) - Poor debugging experience
After validation: - Mission agent validates backend connection on startup - Clear error messages if backend is unreachable - Fails fast with actionable diagnostics - Excellent debugging experience
What Gets Validated¶
Ollama Backend (agent_backend=ollama)¶
- โ Service reachability - Ollama API responding at configured URL
- โ Model availability - Configured model is pulled and ready
- โ Test prompt - Send simple prompt, verify response received
Example validation output:
============================================================
๐ VALIDATING LLM BACKEND CONNECTION
============================================================
Testing ollama backend...
URL: http://192.168.50.10:11434
Model: qwen2.5-coder:32b
Checking Ollama service...
โ
Ollama service responding
โ
Model 'qwen2.5-coder:32b' available
Sending test prompt...
โ
Test prompt succeeded (response: 'OK')
============================================================
โ
Ollama backend validation PASSED
============================================================
OpenAI Backend (agent_backend=openai)¶
- โ
API key present -
OPENAI_API_KEYenvironment variable is set - โ Test prompt - Send simple prompt to OpenAI, verify response
Example validation output:
============================================================
๐ VALIDATING LLM BACKEND CONNECTION
============================================================
Testing openai backend...
โ
OPENAI_API_KEY found
Base URL: https://api.openai.com/v1
Model: gpt-4-turbo
Sending test prompt...
โ
Test prompt succeeded (response: 'OK')
============================================================
โ
OpenAI backend validation PASSED
============================================================
Common Errors Caught¶
Ollama Service Not Running¶
โ Cannot connect to Ollama at http://192.168.50.10:11434
Error: [Errno 111] Connection refused
Check that Ollama is running and URL is correct
Fix:
# On Thor or gaming PC
systemctl status ollama # Check if running
docker ps | grep ollama # If containerized
Model Not Pulled¶
โ Model 'qwen2.5-coder:32b' not found in Ollama
Available models: llama3.1:70b, phi4:14b, qwen2.5:14b
Pull the model with: ollama pull qwen2.5-coder:32b
Fix:
# On Thor or gaming PC
ollama pull qwen2.5-coder:32b
Wrong URL / Network Issues¶
โ Timeout connecting to Ollama at http://192.168.50.10:11434
Check network connectivity and Ollama status
Fix:
# Check Thor is reachable
ping 192.168.50.10
# Check correct port
curl http://192.168.50.10:11434/api/tags
# Verify parameter in launch file
grep ollama_base_url launch/shadowhound_bringup.launch.py
OpenAI API Key Missing¶
โ OPENAI_API_KEY environment variable not set
Set it with: export OPENAI_API_KEY='sk-...'
Fix:
export OPENAI_API_KEY='sk-...'
# Or add to ~/.bashrc for persistence
Slow Model Response¶
โ Timeout waiting for test prompt response (>30s)
Model 'qwen2.5-coder:32b' may be too slow or not loaded
Fix: - Model is cold-loading (first request after pull) - Wait for model to fully load, then restart mission agent - Consider using faster model (e.g., phi4:14b for testing)
Testing Validation Manually¶
Use the standalone test script:
cd /workspaces/shadowhound
# Test Ollama validation (adjust THOR_IP if needed)
export THOR_IP=192.168.50.10
python3 test_backend_validation.py
# Test OpenAI validation (requires API key)
export OPENAI_API_KEY='sk-...'
python3 test_backend_validation.py
Expected output:
๐งช Testing Ollama Backend Validation
------------------------------------------------------------
============================================================
๐ TESTING OLLAMA BACKEND VALIDATION
============================================================
URL: http://192.168.50.10:11434
Model: qwen2.5-coder:32b
1. Checking Ollama service...
โ
Ollama service responding
2. Checking model availability...
Available models: qwen2.5-coder:32b, phi4:14b, llama3.1:70b
โ
Model 'qwen2.5-coder:32b' available
3. Sending test prompt...
โ
Test prompt succeeded
Response: 'OK'
============================================================
โ
Ollama backend validation PASSED
============================================================
============================================================
๐ VALIDATION TEST SUMMARY
============================================================
Ollama: โ
PASS
OpenAI: โญ๏ธ SKIPPED
============================================================
Integration with Mission Agent¶
The validation is integrated into the mission agent startup sequence:
# In mission_agent.py
self.mission_executor.initialize()
# Validate LLM backend connection on startup
if not self._validate_llm_backend():
raise RuntimeError(
"LLM backend validation failed. Check logs above for details. "
"Ensure the backend service is running and accessible."
)
self.get_logger().info("MissionExecutor ready!")
If validation fails, the node exits immediately with a clear error message:
[ERROR] [shadowhound_mission_agent]: LLM backend validation failed. Check logs above for details. Ensure the backend service is running and accessible.
Disabling Validation (Not Recommended)¶
If you need to bypass validation for testing (e.g., no network access), you can comment out the validation check in mission_agent.py:
# Validate LLM backend connection on startup
# if not self._validate_llm_backend():
# raise RuntimeError(...)
โ ๏ธ Warning: This is NOT recommended for production deployments. The validation catches real issues that will cause mission failures.
Performance Impact¶
- Ollama validation: ~2-5 seconds (service check + model check + test prompt)
- OpenAI validation: ~1-3 seconds (test prompt via internet)
The validation adds minimal startup time but provides significant value in catching misconfigurations early.
Future Enhancements¶
Potential improvements:
- VLM validation - Add validation for vision-language models when implemented
- Model warm-up - Pre-load model during validation to reduce first mission latency
- Health check endpoint - Expose validation status via ROS service or web API
- Retry logic - Automatic retry with exponential backoff for transient failures
- Fallback backend - Auto-switch to OpenAI if Ollama validation fails
Related Documentation¶
- Ollama Deployment:
docs/OLLAMA_DEPLOYMENT_CHECKLIST.md - Benchmark System:
docs/OLLAMA_BENCHMARK_MEMORY_MANAGEMENT.md - Architecture:
docs/project_context.md - Test Script:
test_backend_validation.py
Troubleshooting Tips¶
"Connection refused" error¶
The most common issue. Check:
1. Is Ollama running? systemctl status ollama or docker ps
2. Is the URL correct? Check launch file parameters
3. Is Thor reachable? ping 192.168.50.10
4. Is the port correct? Ollama uses 11434 by default
"Model not found" error¶
Check available models:
curl http://192.168.50.10:11434/api/tags | jq '.models[].name'
Pull the missing model:
ollama pull qwen2.5-coder:32b
Validation passes but missions still fail¶
This suggests an issue with DIMOS integration or robot communication, not the LLM backend. Check: 1. Robot connectivity (CycloneDDS or WebRTC) 2. ROS topic diagnostics in startup logs 3. DIMOS skill library initialization
Last Updated: 2025-01-XX
Status: โ
Implemented and tested