vLLM Tool Calling Configuration Issue¶
Date: October 12, 2025
Status: RESOLVED ✅
Priority: HIGH (blocks agent function calling)
Problem¶
Agent fails with 400 Bad Request when trying to use tool calling:
INFO:httpx:HTTP Request: POST http://192.168.10.116:8000/v1/chat/completions "HTTP/1.1 400 Bad Request"
ERROR - Unexpected error in API call: Error code: 400 - {'error': {'message': '"auto" tool choice requires --enable-auto-tool-choice and --tool-call-parser to be set', 'type': 'BadRequestError', 'param': None, 'code': 400}}
Context¶
- LocalSemanticMemory embeddings fix ✅ WORKING
- vLLM server running on Thor:8000 ✅ RESPONDING
- Agent using OpenAIAgent with function calling ✅ CONFIGURED
- vLLM server missing tool calling flags ❌ BLOCKING
Root Cause¶
The vLLM server must be started with specific flags to enable tool/function calling:
vllm serve MODEL \
--enable-auto-tool-choice \ # ← MISSING!
--tool-call-parser hermes # ← MISSING!
Without these flags, vLLM rejects requests with tool_choice="auto" (which OpenAI SDK sends by default when tools are provided).
Why This Matters¶
DIMOS agents (OpenAIAgent, PlanningAgent) use function calling for robot control:
- Agent receives mission: "spin left 90 degrees"
- LLM generates function call: SpinLeft(angle=90)
- Agent executes skill via MyUnitreeSkills.spin_left(90)
Without tool calling support, the agent cannot control the robot.
Solution¶
Fix Applied¶
Updated scripts/setup_vllm_thor.sh to include tool calling flags:
# OLD (broken):
VLLM_ATTENTION_BACKEND=FLASHINFER vllm serve "$MODEL" \
--port 8000 \
--host 0.0.0.0 \
--trust-remote-code \
--max-model-len 8192 \
--gpu-memory-utilization 0.8 \
--tensor-parallel-size 1
# NEW (fixed):
VLLM_ATTENTION_BACKEND=FLASHINFER vllm serve "$MODEL" \
--port 8000 \
--host 0.0.0.0 \
--trust-remote-code \
--max-model-len 8192 \
--gpu-memory-utilization 0.8 \
--tensor-parallel-size 1 \
--enable-auto-tool-choice \ # Enable auto tool choice
--tool-call-parser hermes # Use Hermes parser
Deployment Steps¶
On Thor:
# 1. Stop current vLLM server (Ctrl+C)
# 2. Pull latest script
cd ~/shadowhound
git pull origin feature/local-llm-support
# 3. Restart with new flags
./scripts/setup_vllm_thor.sh
On Laptop:
# No changes needed - agent already configured correctly
# Just wait for Thor to restart, then test
./start.sh --prod
Verification¶
Test 1: Direct API Call¶
# From laptop - test tool calling directly
curl -X POST http://192.168.10.116:8000/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{
"model": "Qwen/Qwen2.5-Coder-7B-Instruct",
"messages": [{"role": "user", "content": "Move forward 1 meter"}],
"tools": [{
"type": "function",
"function": {
"name": "move",
"description": "Move the robot forward",
"parameters": {
"type": "object",
"properties": {
"distance": {"type": "number"}
}
}
}
}],
"tool_choice": "auto"
}'
Expected: No 400 error, response includes tool call
Test 2: Agent Mission¶
# From laptop - full integration test
./start.sh --prod
# In web UI (http://localhost:8501):
# Send: "spin left 90 degrees"
Expected:
- ✅ No 400 error in logs
- ✅ LLM generates SpinLeft function call
- ✅ Agent executes skill
- ✅ Robot spins left
Technical Details¶
Tool Call Parser Options¶
vLLM supports several parsers:
| Parser | Models | Quality | Speed |
|---|---|---|---|
hermes |
General | Good | Fast |
mistral |
Mistral models | Best | Fast |
internlm |
InternLM models | Good | Fast |
We use hermes because:
- Works with Qwen2.5-Coder-7B-Instruct ✅
- Fast and reliable ✅
- Standard format ✅
Alternative: Mistral Parser¶
If you switch to Mistral models:
--tool-call-parser mistral
Verification of Server Config¶
Check vLLM server logs to confirm flags are active:
ssh thor
docker logs vllm-server 2>&1 | grep -E "enable-auto-tool-choice|tool-call-parser"
Should see:
enable_auto_tool_choice=True
tool_call_parser=hermes
Related Issues¶
- LocalSemanticMemory import bug - ✅ Fixed (moved import to module level)
- Mock mode topic dependency - ⏸️ Documented, not yet fixed
- Start script issues - ⏸️ Documented, prioritized
Timeline¶
- 2025-10-12 13:21 - Error discovered during robot testing
- 2025-10-12 13:30 - Root cause identified (missing vLLM flags)
- 2025-10-12 13:35 - Fix applied to
setup_vllm_thor.sh
References¶
- vLLM OpenAI Compatible Server
- vLLM Tool Calling Support
- Hermes Tool Call Format
- DIMOS:
agents/agent_openai.py- Function calling implementation
Next Steps¶
- ✅ Fix applied to script
- ⏳ User restarts vLLM on Thor
- ⏳ Test agent function calling
- ⏳ Update vLLM quickstart docs with new flags
- ⏳ Consider adding flag validation to start script (Issue #8)