Forcing Function Calling with System Prompts¶

Problem¶

Mistral LLM (via vLLM) returns explanations instead of calling functions:

User: "take one small step back"

Expected: Reverse(x=-0.1, y=0.0, yaw=0.0, duration=1.0)

Actual: "To physically move the robot one small step back using the 
provided functions, I would use the `Reverse` function with a small 
backward velocity. Here's an example of how to do this..."

Root Cause: - OpenAI API expects tool_choice="auto" to enforce function calling - DIMOS OpenAIAgent doesn't set this parameter - Without it, LLM treats tools as "suggestions" not "requirements"

Why We Can't Fix It Properly¶

Ideal Solution: Modify dimos/agents/agent.py to add:

response = self.client.chat.completions.create(
    model=self.model_name,
    messages=messages,
    tools=self.skill_library.get_tools(),
    tool_choice="auto",  # ← Add this
    temperature=0.0,     # ← And this for speed
)

Blocker: - DIMOS is a git submodule (src/dimos-unitree/) - Project policy: Never edit submodule files directly - See: docs/submodule_policy.md - Changes must go through upstream DIMOS contribution process

Workaround: System Prompt Engineering¶

Instead of modifying DIMOS, we use aggressive prompt engineering to force function calling behavior.

Implementation¶

File: src/shadowhound_mission_agent/shadowhound_mission_agent/mission_executor.py

@dataclass
class MissionExecutorConfig:
    # Reduced tokens for faster responses
    max_output_tokens: int = 150  # Was 512

    # Aggressive system prompt
    system_prompt: str = (
        "You are a quadruped robot controller. "
        "You MUST call the provided functions to control the robot. "
        "NEVER explain how to use functions - ALWAYS call them directly. "
        "When the user says 'move forward', call Move(). "
        "When the user says 'step back', call Reverse(). "
        "When the user says 'spin left', call SpinLeft(). "
        "When the user says 'spin right', call SpinRight(). "
        "Be extremely brief with any text responses. "
        "Your PRIMARY job is to execute functions, not to chat."
    )

# Pass to agent
agent_kwargs = {
    "system_query": self.config.system_prompt,  # ← Use custom prompt
    "max_output_tokens_per_request": self.config.max_output_tokens,
    # ... other params
}

Prompt Design Principles¶

Imperative Language:
"You MUST call" not "You can call"
"NEVER explain" not "Try not to explain"
Creates strong behavioral constraints
Explicit Examples:
Maps user language to function names
"move forward" → Move()
"step back" → Reverse()
Reduces ambiguity for the LLM
Role Definition:
"You are a robot controller" (not "assistant")
"PRIMARY job is to execute functions"
Sets expectation hierarchy
Brevity Enforcement:
"Be extremely brief"
Reduced max_tokens (512 → 150)
Faster inference, less rambling

Testing¶

Before (Default DIMOS Prompt):¶

User: take one small step back
Time: ~30 seconds
Response: [200+ tokens explaining how to use Reverse()]
Result: ❌ No robot motion

After (Custom Prompt):¶

User: take one small step back
Time: ~5-10 seconds (expected)
Response: [Function call to Reverse() + brief confirmation]
Result: ✅ Robot moves backward

Test Commands:¶

# On laptop
cd ~/shadowhound
rm -rf build/ install/ log/
./start.sh --prod

# In web UI (http://localhost:8501):
1. "take one small step back"    # Should call Reverse()
2. "move forward 2 meters"        # Should call Move()
3. "spin left 90 degrees"         # Should call SpinLeft()
4. "turn around"                  # Should call SpinLeft(180) or SpinRight(180)

Effectiveness Analysis¶

Strengths ✅¶

No DIMOS changes - respects submodule policy
Fast to implement - just config changes
Easy to tune - adjust prompt text as needed
Portable - works with any OpenAI-compatible backend

Weaknesses ⚠️¶

Not guaranteed - LLM can still ignore prompts
Model-dependent - some models respect instructions better than others
Needs tuning - may require iteration for different models
Indirect - prompt engineering vs. API parameter

When It Works Best:¶

Instruction-following models (Mistral, Llama 3.1, GPT-4)
Clear, unambiguous commands
Function names that match natural language
Low temperature settings (if model supports)

When It Fails:¶

Very creative/open-ended queries
Models trained primarily for chat (not tool use)
Complex multi-step reasoning tasks
Ambiguous user commands

Alternative Approaches (If This Fails)¶

Option 1: Client Wrapper¶

Wrap OpenAI client to inject tool_choice before requests reach DIMOS:

class ForcedToolCallingClient(OpenAI):
    def chat_completions_create(self, *args, **kwargs):
        if 'tools' in kwargs and kwargs['tools']:
            kwargs['tool_choice'] = 'auto'
            kwargs['temperature'] = 0.0
        return super().chat_completions_create(*args, **kwargs)

# Pass to agent
client = ForcedToolCallingClient(base_url=...)
agent = OpenAIAgent(openai_client=client, ...)

Pros: Guarantees tool_choice is set Cons: Fragile, depends on OpenAI client internals

Option 2: Fork DIMOS¶

Create temporary fork with tool_choice fix:

cd src/dimos-unitree
git checkout -b fix/tool-choice-auto
# Make changes
# Use forked version in .gitmodules

Pros: Proper fix, can PR upstream later Cons: Diverges from upstream, merge conflicts

Option 3: Upstream Contribution¶

Submit PR to DIMOS with tool_choice fix:

Pros: Benefits everyone, proper solution Cons: Takes time, no guarantee of acceptance

Success Metrics¶

Track these to evaluate prompt effectiveness:

# In mission_executor.py or mission_agent.py
metrics = {
    "function_calls": 0,      # Count of actual function executions
    "text_responses": 0,      # Count of pure text responses
    "avg_response_time": 0,   # Seconds per command
    "avg_token_count": 0,     # Tokens in response
    "success_rate": 0,        # Function calls / total commands
}

Target: >90% success rate (9/10 commands trigger function calls)

Configuration Options¶

Users can customize the prompt in their launch files:

# In launch file or config
custom_config = MissionExecutorConfig(
    agent_backend="openai",
    openai_model="mistralai/Mistral-7B-Instruct-v0.3",
    max_output_tokens=100,  # Even terser
    system_prompt=(
        "Robot controller. Call functions. No explanations. "
        "Move=Move(), Back=Reverse(), Left=SpinLeft(), Right=SpinRight()."
    )
)

Status¶

Current State: Implemented, awaiting testing

Next Steps: 1. ✅ Commit changes 2. ✅ Push to remote 3. ⏳ Test with real robot 4. ⏳ Measure success rate 5. ⏳ Tune prompt if needed 6. ⏳ Document results

Expected Outcome: - Commands trigger function calls 90%+ of the time - Response time: 5-10 seconds (down from 30s) - Robot responds to natural language commands

Fallback Plan: If prompt engineering isn't effective enough, implement client wrapper (Option 1 above).

vllm_tool_calling_configuration - vLLM setup for tool calling
vllm_mistral_tokenizer_hang - Mistral tokenizer fix
DIMOS Agent Architecture - Why OpenAIAgent is required

References¶

OpenAI Tool Calling Docs: https://platform.openai.com/docs/guides/function-calling
vLLM Tool Calling: https://docs.vllm.ai/en/stable/features/tool_calling.html
Mistral Function Calling: https://docs.mistral.ai/capabilities/function_calling/