Ollama Model Recommendations for ShadowHound¶
Last Updated: 2025-10-09
Target Hardware: NVIDIA Jetson AGX Thor (128GB RAM)
Model Selection Guide¶
IMPORTANT: llama3.1 is only available in 8B, 70B, and 405B sizes. There is NO 13B variant!
See: https://ollama.com/library/llama3.1/tags
Primary Recommendation: llama3.1:70b¶
Why this model: - Best model Thor can run - 128GB RAM is perfect for 70B! - 70B parameters - Near GPT-4 level quality - ~60-70 GB RAM during inference (plenty of headroom for ROS + navigation) - Response time: 2-5 seconds for typical mission commands - Quality: Exceptional instruction following, reasoning, and planning
Download size: ~43 GB
Runtime RAM: ~60-70 GB
Recommended for: Primary mission agent operation
docker exec ollama ollama pull llama3.1:70b
Alternative Models¶
Backup: llama3.1:8b¶
When to use: - Faster responses needed (0.5-1.5s typical) - Running alongside heavy perception workloads - Testing/development with quick iteration - When you need snappy responses over maximum quality
Download size: ~4.9 GB
Runtime RAM: ~10-12 GB
Trade-off: Lower quality reasoning compared to 70B, but still very capable
docker exec ollama ollama pull llama3.1:8b
For Experimentation: llama3.2:3b¶
When to use: - Minimal resource footprint - Testing/development only (not for production missions) - Very fast responses (<0.5s)
Download size: ~2 GB
Runtime RAM: ~3-4 GB
Trade-off: Reduced instruction following, simpler reasoning
docker exec ollama ollama pull llama3.2:3b
High-End Option: llama3.1:405b ⚠️¶
Status: NOT recommended for Thor
Why avoid: - Requires ~250+ GB RAM - exceeds Thor's 128GB - Model size: ~243GB just to download - Would cause out-of-memory errors - Better suited for high-end server with 512GB+ RAM
Model Comparison¶
| Model | Size (GB) | RAM (GB) | Speed | Quality | Thor Compatible |
|---|---|---|---|---|---|
| llama3.1:70b | 43 | 60-70 | ★★★☆☆ | ★★★★★ | ✅ Recommended |
| llama3.1:8b | 4.9 | 10-12 | ★★★★★ | ★★★★☆ | ✅ Faster backup |
| llama3.2:3b | 2.0 | 3-4 | ★★★★★ | ★★★☆☆ | ✅ Testing only |
| llama3.1:405b | 243+ | 250+ | ★☆☆☆☆ | ★★★★★ | ❌ Too large |
Performance Expectations¶
llama3.1:70b on Thor (128GB RAM)¶
| Task Type | Expected Time | vs OpenAI (gpt-4-turbo) |
|---|---|---|
| Simple command ("rotate 90 degrees") | 2-4s | 5x faster (was 12s) |
| Multi-step plan ("explore the lab") | 3-6s | 4x faster (was 25s) |
| Complex reasoning | 4-8s | 3x faster (was 20s) |
Quality: Near GPT-4 level - significantly better than 8B model
Network overhead: +0.1-0.3s (laptop → Thor via LAN)
llama3.1:8b on Thor (for comparison)¶
| Task Type | Expected Time | vs 70B Quality |
|---|---|---|
| Simple command | 0.5-1.5s | Good enough |
| Multi-step plan | 1-3s | Noticeably simpler |
| Complex reasoning | 2-4s | May miss nuance |
Model Management Commands¶
Pull a model¶
docker exec ollama ollama pull <model-name>
List installed models¶
docker exec ollama ollama list
Remove a model¶
docker exec ollama ollama rm <model-name>
Test a model interactively¶
docker exec -it ollama ollama run llama3.1:13b
Check model info¶
docker exec ollama ollama show llama3.1:13b
Switching Models at Runtime¶
Option 1: Launch Parameter¶
ros2 launch shadowhound_mission_agent mission_agent.launch.py \
agent_backend:=ollama \
ollama_model:=llama3.1:8b # Change model here
Option 2: Config File¶
Edit configs/laptop_dev_ollama.yaml:
ollama_model: "llama3.1:8b" # Change from 13b to 8b
Then launch:
ros2 launch shadowhound_bringup shadowhound.launch.py \
config:=configs/laptop_dev_ollama.yaml
Advanced: Quantization Variants¶
Ollama uses Q4_0 quantization by default (good balance).
For more control, you can specify variants:
# Higher quality, more RAM
docker exec ollama ollama pull llama3.1:13b-q8_0
# Lower RAM, faster
docker exec ollama ollama pull llama3.1:13b-q4_K_M
# Ultra-low RAM
docker exec ollama ollama pull llama3.1:13b-q3_K_S
Default Q4_0 is recommended - good balance without manual tuning.
Multi-Model Setup¶
You can have multiple models installed and switch between them:
# Pull both models
docker exec ollama ollama pull llama3.1:13b
docker exec ollama ollama pull llama3.1:8b
# Use 13b for missions
ros2 launch ... ollama_model:=llama3.1:13b
# Use 8b for testing/development
ros2 launch ... ollama_model:=llama3.1:8b
Disk usage: Models stored in ~/ollama-data/ on Thor
- 13B: ~7.4 GB
- 8B: ~4.7 GB
- Total: ~12 GB for both
Troubleshooting¶
Model download fails¶
# Check Thor's internet connection
ping google.com
# Check disk space
df -h ~/ollama-data
# Retry download
docker exec ollama ollama pull llama3.1:13b
Out of memory during inference¶
# Check memory usage
docker exec ollama free -h
# Switch to smaller model
# Use llama3.1:8b instead of 13b
# Or close other applications on Thor
Slow inference¶
# Check GPU utilization on Thor
nvidia-smi
# Verify GPU is being used by container
docker exec ollama nvidia-smi
# Check if model is loaded (first request is slower)
# Subsequent requests should be faster
Recommended Setup¶
For ShadowHound development with Thor's 128GB RAM:
# Primary for production - BEST quality
docker exec ollama ollama pull llama3.1:70b
# Backup for fast iteration during development
docker exec ollama ollama pull llama3.1:8b
This gives you flexibility to switch based on needs: - 70B: Best quality for actual missions, complex planning, near GPT-4 performance - 8B: Faster responses during development/testing
Total disk: ~48 GB (totally fine on Thor)
Peak RAM: ~70 GB when running 70B (still leaves ~58GB free for ROS/Nav/Perception)
Future: Specialized Models¶
As Ollama ecosystem grows, consider these for specific tasks:
- codellama:13b - For code generation skills
- mistral:7b - Alternative to llama, good reasoning
- phi-2 - Tiny (2.7B) but surprisingly capable
- neural-chat:7b - Optimized for dialogue
For now: Stick with llama3.1:13b - it's well-tested and reliable.
This guide is based on NVIDIA Jetson AGX Thor specifications (32GB RAM, integrated GPU). Performance may vary with workload and concurrent processes.