Ollama Model Selection Guide¶

Tool/Function Calling Support¶

Shadow Hound's planning agent requires function calling (also known as tool use) support from the LLM. This is critical because the agent needs to:

Parse structured robot skill calls
Generate JSON-formatted skill parameters
Chain multiple skills together in a plan

Ollama Models with Function Calling¶

✅ Recommended Models (Tested)¶

Model	Size	Tool Support	Notes
qwen2.5-coder:32b	19GB	✅ Excellent	Best choice for robotics, strong function calling
llama3.3:70b	43GB	✅ Good	Large but capable, requires more VRAM
deepseek-r1:7b	4.7GB	⚠️ Experimental	Smaller, may have limitations

❌ Models Without Function Calling¶

Model	Size	Tool Support	Notes
phi4:14b	9GB	❌ No	Reasoning-focused, no function calling. ⚠️ UNSTABLE on Jetson AGX Orin - crashes frequently (see #18)
llama3.2:3b	2GB	❌ No	Too small
phi3:3.8b	2.3GB	❌ No	Chat only

How to Check Tool Support¶

Method 1: Test with curl¶

curl -X POST http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "MODEL_NAME",
    "messages": [{"role": "user", "content": "test"}],
    "tools": [{"type": "function", "function": {"name": "test", "description": "test"}}],
    "stream": false
  }'

Expected responses: - ✅ 200 OK: Model supports tools - ❌ 400 Bad Request with "does not support tools": No tool support

Method 2: Check Ollama Model Card¶

docker exec ollama ollama show MODEL_NAME

Look for mentions of: - "function calling" - "tool use"
- "structured output"

Configuration¶

For Planning Agent (Requires Tools)¶

# .env
AGENT_BACKEND=ollama
OLLAMA_MODEL=qwen2.5-coder:32b
USE_PLANNING_AGENT=true  # Requires function calling

For Simple Agent (No Tools Needed)¶

# .env
AGENT_BACKEND=ollama
OLLAMA_MODEL=phi4:14b  # Can use non-tool models
USE_PLANNING_AGENT=false  # Text-only responses

Keep Model Loaded (Prevent Cold Start 500 Errors)¶

By default, Ollama unloads models after 5 minutes of inactivity. This causes HTTP 500 errors on the next request while the model reloads.

Option 1: Container Environment Variable (Recommended for Development)

Edit scripts/setup_ollama_thor.sh to add OLLAMA_KEEP_ALIVE environment variable:

docker run -d \
  --name "$CONTAINER_NAME" \
  --gpus all \
  --tty \
  -p ${OLLAMA_PORT}:11434 \
  -v "${DATA_DIR}:/data" \
  -e OLLAMA_KEEP_ALIVE=-1 \  # Add this line - keeps model loaded indefinitely
  --restart unless-stopped \
  "$OLLAMA_IMAGE"

Values: - -1: Keep loaded forever (good for active development/testing) - 30m: Keep for 30 minutes (balanced) - 0: Unload immediately after each request (good for benchmarking cold starts)

Option 2: Per-Request Keep-Alive (Automatic)

The start.sh validation automatically includes keep_alive: 30m in test requests to keep the model loaded for 30 minutes after validation. This happens automatically when you run ./start.sh.

Impact on Benchmarking: - The scripts/benchmark_ollama_models.sh script has UNLOAD_BETWEEN_MODELS=true by default - This ensures accurate measurements by unloading models between tests - Setting OLLAMA_KEEP_ALIVE=-1 on the container does NOT interfere with manual ollama pull/rm commands - Benchmark script explicitly unloads models for clean measurements

Impact on Robot Restarts: - ✅ No impact - Keep-alive only affects model memory, not robot driver - ✅ Robot driver runs separately from Ollama - ✅ Restarting robot driver (start.sh stop/start) does not affect Ollama - ⚠️ Only restarting Thor itself or the Ollama container will clear loaded models

Performance Benchmarking Targets¶

Based on tool support, focus benchmarking on:

qwen2.5-coder:32b - Primary target
llama3.3:70b - High capability baseline
deepseek-r1:7b - Resource-constrained option

Benchmark Metrics¶

For each model, measure:

Latency: Time to first token, total completion time
Accuracy: Correct skill selection, parameter extraction
Resource Usage: GPU VRAM, CPU load
Reliability: Success rate, error handling

Test Cases¶

Simple navigation: "Go forward 5 meters"
Multi-step: "Go to waypoint A, turn around, take a photo"
Complex reasoning: "Find the red object and approach it"
Error handling: Invalid requests, out-of-bounds coordinates

Known Limitations¶

Phi4:14b (⚠️ Jetson AGX Orin Issue)¶

❌ UNSTABLE on ARM64/Jetson - llama runner crashes with exit status 2
❌ Cannot use with planning agent (no tool support)
❌ Transient HTTP 500 errors during inference
✅ Can use with simple agent (USE_PLANNING_AGENT=false) when it works
Recommendation: Use qwen2.5-coder:7b instead on Thor
Tracking: Issue #18

Phi4:14b (General)¶

❌ Cannot use with planning agent
✅ Can use with simple agent (USE_PLANNING_AGENT=false)
Strong at reasoning, weak at structured output

Qwen2.5-Coder:32b¶

✅ Excellent function calling
⚠️ Requires ~20GB VRAM
May be slower than smaller models

LLaMA3.3:70b¶

✅ Very capable
❌ Requires ~50GB VRAM (may need multiple GPUs or high quantization)
Better for high-stakes missions

Future Work¶

See shadowhound#12 for vLLM evaluation, which may improve: - Batching efficiency - Multi-model serving - Quantization options

References¶

OpenAI Function Calling Docs
Ollama Model Library
Issue dimos-unitree#1: Proper tokenizer mapping for local models