DIMOS Agent Architecture - Complete Reference¶
Date: October 12, 2025
Context: Comprehensive investigation of DIMOS agent types, capabilities, and ShadowHound integration strategy
Related: dimos_development_policy.md, dimos_branch_consolidation.md, local_llm_memory_roadmap.md
TL;DR - Critical Findings¶
Agent Capabilities Matrix:
| Agent Type | Local LLM | Cloud LLM | Skills/Tools | Default Memory | Status |
|---|---|---|---|---|---|
| OpenAIAgent | ✅ (vLLM) | ✅ | ✅ Function Calling | OpenAISemanticMemory | ⭐ Recommended |
| ClaudeAgent | ❌ | ✅ | ✅ Function Calling | OpenAISemanticMemory | Production Ready |
| PlanningAgent | ✅ (vLLM) | ✅ | ✅ Function Calling | OpenAISemanticMemory | Production Ready |
| HuggingFaceLocalAgent | ✅ | ❌ | ❌ No Tools | LocalSemanticMemory | Limited Use |
| HuggingFaceRemoteAgent | ❌ | ✅ (HF API) | ❌ No Tools | LocalSemanticMemory | Legacy |
| CTransformersGGUFAgent | ✅ (GGUF) | ❌ | ❌ No Tools | LocalSemanticMemory | Experimental |
Key Insights:
- ✅ Skills/function calling REQUIRES OpenAIAgent, ClaudeAgent, or PlanningAgent
- ✅ HuggingFaceLocalAgent has NO tools support - Cannot use skills-based robot control
- ✅ OpenAIAgent works with vLLM - OpenAI-compatible API support
- ✅ Local embeddings work with ANY agent - Just pass LocalSemanticMemory() explicitly
- ⚠️ DIMOS README is misleading - Claims "OpenAI API key required for all LLMAgents" (NOT true)
- ⚠️ Missing import bug in LocalSemanticMemory - SentenceTransformer not imported (line 147)
ShadowHound Decision:
- Using: OpenAIAgent + vLLM + LocalSemanticMemory + MyUnitreeSkills
- Why: Only combination supporting local LLM + local memory + robot control skills
- Blocker: 1-line import fix needed in DIMOS chroma_impl.py
Table of Contents¶
- Agent Type Comparison
- Skills/Function Calling Architecture
- Memory System Architecture
- Local LLM Support
- Local Embeddings Support
- ShadowHound Implementation
- Configuration Examples
- Known Issues and Bugs
- Recommendations
Agent Type Comparison¶
1. OpenAIAgent (⭐ Recommended for ShadowHound)¶
File: dimos/agents/agent.py (line 660)
Inherits: LLMAgent
Capabilities: - ✅ OpenAI-compatible API (works with OpenAI, Azure, vLLM, LocalAI, etc.) - ✅ Function calling / tools support - ✅ Vision capabilities (multimodal) - ✅ Structured outputs (response_model) - ✅ RAG with agent memory - ✅ Token counting and limits
Parameters:
def __init__(self,
dev_name: str,
agent_type: str = "Vision",
query: str = "What do you see?",
input_query_stream: Optional[Observable] = None,
input_data_stream: Optional[Observable] = None,
input_video_stream: Optional[Observable] = None,
output_dir: str = "assets/agent",
agent_memory: Optional[AbstractAgentSemanticMemory] = None, # ⚠️ Defaults to OpenAISemanticMemory
system_query: Optional[str] = None,
max_input_tokens_per_request: int = 128000,
max_output_tokens_per_request: int = 16384,
model_name: str = "gpt-4o",
prompt_builder: Optional[PromptBuilder] = None,
tokenizer: Optional[AbstractTokenizer] = None,
rag_query_n: int = 4,
rag_similarity_threshold: float = 0.45,
skills: Optional[Union[AbstractSkill, list[AbstractSkill], SkillLibrary]] = None, # ⭐ Skills support
response_model: Optional[BaseModel] = None,
frame_processor: Optional[FrameProcessor] = None,
image_detail: str = "low",
pool_scheduler: Optional[ThreadPoolScheduler] = None,
process_all_inputs: Optional[bool] = None,
):
Skills Implementation:
# Lines 748-758 in agent.py
self.skills = skills
if isinstance(self.skills, SkillLibrary):
self.skill_library = self.skills
elif isinstance(self.skills, list):
self.skill_library = SkillLibrary()
for skill in self.skills:
self.skill_library.add(skill)
elif isinstance(self.skills, AbstractSkill):
self.skill_library = SkillLibrary()
self.skill_library.add(self.skills)
Use Cases: - Primary agent for cloud OpenAI/Azure deployments - ⭐ ShadowHound: Local vLLM + local embeddings + robot skills - Any scenario requiring function calling / tools - Vision tasks requiring image understanding
Example:
from dimos.agents.agent import OpenAIAgent
from dimos.agents.memory.chroma_impl import LocalSemanticMemory
from dimos.robot.unitree.unitree_skills import MyUnitreeSkills
agent = OpenAIAgent(
dev_name="ShadowHound",
model_name="Qwen/Qwen2.5-Coder-7B-Instruct",
agent_memory=LocalSemanticMemory(), # ⚠️ Must pass explicitly for local
skills=MyUnitreeSkills(robot=robot), # ⭐ Robot control skills
)
2. ClaudeAgent¶
File: dimos/agents/claude_agent.py (line 77)
Inherits: LLMAgent
Capabilities: - ✅ Anthropic Claude API - ✅ Function calling / tools support (Claude-specific format) - ✅ Extended thinking mode (thinking_budget_tokens) - ✅ Vision capabilities - ✅ Structured outputs - ✅ RAG with agent memory
Parameters:
def __init__(self,
dev_name: str,
agent_type: str = "Vision",
# ... (similar to OpenAIAgent)
model_name: str = "claude-3-7-sonnet-20250219",
skills: Optional[AbstractSkill] = None, # ⭐ Skills support
thinking_budget_tokens: Optional[int] = 2000, # Claude-specific
):
Skills Implementation: Similar to OpenAIAgent but converts tools to Claude-specific format:
def _convert_tools_to_claude_format(self, tools: List[Dict]) -> List[Dict]:
# Converts OpenAI tool format to Claude format
Use Cases: - Cloud deployments requiring Claude's extended thinking - Tasks requiring very long context (200K+ tokens) - Alternative to OpenAI with similar capabilities
3. PlanningAgent (Specialized OpenAIAgent)¶
File: dimos/agents/planning_agent.py (line 38)
Inherits: OpenAIAgent
Capabilities: - ✅ All OpenAIAgent capabilities - ✅ Task decomposition and planning - ✅ Multi-turn dialogue for plan refinement - ✅ Structured plan output (list of steps) - ✅ Plan confirmation workflow
Parameters:
def __init__(self,
dev_name: str = "PlanningAgent",
model_name: str = "gpt-4",
input_query_stream: Optional[Observable] = None,
use_terminal: bool = False,
skills: Optional[AbstractSkill] = None, # ⭐ Skills support
):
Specialized Behavior:
- Returns structured JSON: {"type": "dialogue"|"plan", "content": ..., "needs_confirmation": bool}
- Maintains conversation history for iterative refinement
- Breaks complex tasks into executable skill calls
Use Cases: - Complex missions requiring multi-step planning - Tasks where user confirmation is needed before execution - High-level mission orchestration
Example:
from dimos.agents.planning_agent import PlanningAgent
agent = PlanningAgent(
dev_name="MissionPlanner",
model_name="gpt-4o",
skills=MyUnitreeSkills(robot=robot),
)
# Returns plans like: ["Move forward 2m", "Rotate 90 degrees", "Move forward 1m"]
4. HuggingFaceLocalAgent¶
File: dimos/agents/agent_huggingface_local.py (line 47)
Inherits: LLMAgent
Capabilities: - ✅ 100% local LLM execution (no API) - ✅ Loads models via transformers (AutoModelForCausalLM) - ✅ GPU/CPU auto-detection - ✅ Local embeddings by default - ❌ NO skills/function calling support - ❌ No vision capabilities
Parameters:
def __init__(self,
dev_name: str,
agent_type: str = "HF-LLM",
query: str = "What do you see?",
model_name: str = "Qwen/Qwen2.5-3B",
system_query: Optional[str] = None,
agent_memory: Optional[AbstractAgentSemanticMemory] = None, # ⚠️ Defaults to LocalSemanticMemory
max_output_tokens_per_request: int = 512,
do_sample: bool = True,
temperature: float = 0.1,
top_p: float = 0.95,
top_k: int = 50,
):
Implementation Details:
# Lines 79: Defaults to local memory
agent_memory = agent_memory or LocalSemanticMemory()
# Lines 135-220: Simple text generation, NO tools
def _send_query(self, query: str, images: list = None) -> str:
# Just generates text - no function calling!
prompt = self._build_prompt(query)
inputs = self.tokenizer(prompt, return_tensors="pt").to(self.device)
outputs = self.model.generate(...)
return self.tokenizer.decode(outputs[0], skip_special_tokens=True)
⚠️ Critical Limitation:
$ grep -n "skills\|tools\|function" dimos/agents/agent_huggingface_local.py
# Returns: (empty) - NO skills support!
Use Cases: - Simple text generation tasks - Deployments requiring 100% local operation (no API dependencies) - Scenarios where skills/function calling NOT needed - ❌ NOT suitable for ShadowHound - Cannot control robot via skills
Example:
from dimos.agents.agent_huggingface_local import HuggingFaceLocalAgent
agent = HuggingFaceLocalAgent(
dev_name="TextAgent",
model_name="Qwen/Qwen2.5-3B",
# agent_memory defaults to LocalSemanticMemory automatically
)
response = agent.send_query("Describe the scene") # Text only, no actions
5. HuggingFaceRemoteAgent¶
File: dimos/agents/agent_huggingface_remote.py (line 44)
Inherits: LLMAgent
Capabilities: - ✅ HuggingFace Inference API - ✅ Access to HF hosted models - ✅ Local embeddings by default - ❌ NO skills/function calling support - ❌ No vision capabilities
Status: Similar to HuggingFaceLocalAgent but uses HF API instead of local loading. Legacy option.
6. CTransformersGGUFAgent¶
File: dimos/agents/agent_ctransformers_gguf.py (line 86)
Inherits: LLMAgent
Capabilities: - ✅ Runs GGUF quantized models - ✅ CPU-optimized inference - ✅ Low memory footprint - ✅ Local embeddings by default - ❌ NO skills/function calling support
Status: Experimental. For resource-constrained deployments. No skills support.
Skills/Function Calling Architecture¶
What Are Skills?¶
Skills in DIMOS are executable functions that agents can call to interact with the world:
from dimos.skills.skills import AbstractSkill, SkillLibrary
from pydantic import Field
class Move(AbstractSkill):
"""Move the robot using direct velocity commands."""
x: float = Field(..., description="Forward velocity (m/s)")
y: float = Field(default=0.0, description="Left/right velocity (m/s)")
yaw: float = Field(default=0.0, description="Rotational velocity (rad/s)")
def __call__(self):
super().__call__()
return self._robot.move_vel(x=self.x, y=self.y, yaw=self.yaw)
How Skills Work¶
- Registration: Skills are added to a
SkillLibrary - Tool Schema Generation: Skills are converted to OpenAI tool format
- LLM Decision: Agent decides which skill to call based on context
- Execution: Skill is invoked with parsed parameters
- Result: Skill returns result to agent for next decision
Agent Support Matrix¶
| Agent Type | Skills Support | Implementation |
|---|---|---|
| OpenAIAgent | ✅ Yes | Native OpenAI function calling |
| ClaudeAgent | ✅ Yes | Claude tool use API |
| PlanningAgent | ✅ Yes | Inherits from OpenAIAgent |
| HuggingFaceLocalAgent | ❌ No | Simple text generation only |
| HuggingFaceRemoteAgent | ❌ No | Simple text generation only |
| CTransformersGGUFAgent | ❌ No | Simple text generation only |
ShadowHound Skills¶
MyUnitreeSkills provides robot control:
from dimos.robot.unitree.unitree_skills import MyUnitreeSkills
class MyUnitreeSkills(SkillLibrary):
# Dynamically generated skills from UNITREE_ROS_CONTROLS
# + Custom skills:
class Move(AbstractRobotSkill):
"""Move the robot using direct velocity commands."""
# Calls: self._robot.move_vel()
class Reverse(AbstractRobotSkill):
"""Reverse the robot using direct velocity commands."""
# Calls: self._robot.move_vel() with negative x
class SpinLeft(AbstractRobotSkill):
"""Spin the robot left using degree commands."""
# Calls: self._robot.spin(degrees=positive)
class SpinRight(AbstractRobotSkill):
"""Spin the robot right using degree commands."""
# Calls: self._robot.spin(degrees=negative)
class Wait(AbstractSkill):
"""Wait for a specified amount of time."""
# Calls: time.sleep()
Critical Point: Without skills support, ShadowHound cannot control the robot. The agent would only generate text describing what it should do, but couldn't execute commands.
Memory System Architecture¶
Memory System Architecture¶
Memory Types in DIMOS¶
Location: dimos/agents/memory/chroma_impl.py
DIMOS provides two ChromaDB-based memory implementations:
1. OpenAISemanticMemory (Cloud)¶
class OpenAISemanticMemory(ChromaAgentSemanticMemory):
"""Uses OpenAI embeddings API (requires OPENAI_API_KEY)"""
def __init__(
self,
collection_name: str = "agent_memory",
model_name: str = "text-embedding-3-large",
persist_dir: str = "~/.chroma",
):
# Uses OpenAIEmbeddings from langchain-openai
embedding_function = OpenAIEmbeddings(model=model_name)
Dependencies: langchain-openai, chromadb
API Endpoint: https://api.openai.com/v1/embeddings
Cost: ~$0.13 per 1M tokens
2. LocalSemanticMemory (Local) ⭐¶
class LocalSemanticMemory(ChromaAgentSemanticMemory):
"""Uses local sentence-transformers for embeddings (no API required)"""
def __init__(
self,
collection_name: str = "agent_memory",
model_name: str = "sentence-transformers/all-MiniLM-L6-v2",
persist_dir: str = "~/.chroma",
):
# ⚠️ BUG: Line 147 - SentenceTransformer not imported!
self.model = SentenceTransformer(self.model_name, device=device)
Dependencies: sentence-transformers, chromadb, langchain-chroma
Model: 384-dimensional embeddings, ~80MB download
Cost: Free (runs locally)
⚠️ Known Bug (Line 147):
# Missing import at top of file:
from sentence_transformers import SentenceTransformer
# Error when initializing:
NameError: name 'SentenceTransformer' is not defined
Default Memory by Agent Type¶
| Agent Type | Default Memory |
|---|---|
| OpenAIAgent | OpenAISemanticMemory() |
| ClaudeAgent | OpenAISemanticMemory() |
| PlanningAgent | OpenAISemanticMemory() (inherits from OpenAI) |
| HuggingFaceLocalAgent | LocalSemanticMemory() ✅ |
| HuggingFaceRemoteAgent | LocalSemanticMemory() ✅ |
| CTransformersGGUFAgent | LocalSemanticMemory() ✅ |
Key Insight: You can override any agent's default memory by passing agent_memory parameter explicitly!
# OpenAIAgent with LOCAL embeddings (our use case!)
agent = OpenAIAgent(
dev_name="ShadowHound",
agent_memory=LocalSemanticMemory(), # ⭐ Override default
skills=MyUnitreeSkills(robot=robot),
)
Local LLM Support¶
Option 1: HuggingFaceLocalAgent (Fully Local)¶
Option 1: HuggingFaceLocalAgent (Fully Local)¶
Best for: Simple text generation without skills/function calling
from dimos.agents.agent_huggingface_local import HuggingFaceLocalAgent
agent = HuggingFaceLocalAgent(
dev_name="LocalAgent",
model_name="Qwen/Qwen2.5-3B",
# agent_memory defaults to LocalSemanticMemory() automatically
)
Pros: - ✅ 100% local (no API dependencies) - ✅ Defaults to LocalSemanticMemory automatically - ✅ GPU/CPU auto-detection
Cons: - ❌ NO skills/function calling support - ❌ Cannot control robot via MyUnitreeSkills - ❌ Text generation only
Option 2: OpenAIAgent + vLLM (⭐ ShadowHound's Choice)¶
Best for: Local LLM with skills/function calling support
Architecture:
┌─────────────────────────────────────┐
│ OpenAIAgent │
│ ├─ skills: MyUnitreeSkills │ ← Robot control
│ ├─ agent_memory: LocalSemanticM. │ ← Local embeddings
│ └─ model: vLLM API endpoint │ ← Local LLM
└──────────────┬──────────────────────┘
│
▼
Thor: vLLM (OpenAI-compatible API)
- Model: Qwen/Qwen2.5-Coder-7B-Instruct
- Port: 8000
- Endpoint: http://192.168.10.116:8000/v1
Configuration:
# .env
AGENT_BACKEND=openai
OPENAI_BASE_URL=http://192.168.10.116:8000/v1
OPENAI_MODEL=Qwen/Qwen2.5-Coder-7B-Instruct
# USE_LOCAL_EMBEDDINGS auto-detected as true (non-OpenAI URL)
Code:
from dimos.agents.agent import OpenAIAgent
from dimos.agents.memory.chroma_impl import LocalSemanticMemory
from dimos.robot.unitree.unitree_skills import MyUnitreeSkills
agent = OpenAIAgent(
dev_name="ShadowHound",
model_name="Qwen/Qwen2.5-Coder-7B-Instruct",
agent_memory=LocalSemanticMemory(), # ⚠️ Must pass explicitly!
skills=MyUnitreeSkills(robot=robot), # ⭐ Robot control
)
Pros: - ✅ Local LLM via vLLM (no cloud API costs) - ✅ Local embeddings via sentence-transformers - ✅ Skills/function calling support - ✅ Robot control via MyUnitreeSkills - ✅ OpenAI-compatible (easy to switch to cloud if needed)
Cons: - ⚠️ Requires vLLM server setup (separate process) - ⚠️ Must explicitly pass LocalSemanticMemory (doesn't default) - ⚠️ Requires fixing DIMOS import bug (1 line)
vLLM Setup on Thor (Jetson AGX Orin)¶
Container:
docker run -d \
--name vllm \
--gpus all \
--shm-size 8g \
-p 8000:8000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
nvcr.io/nvidia/vllm:25.09-py3 \
--model Qwen/Qwen2.5-Coder-7B-Instruct \
--dtype float16 \
--max-model-len 4096 \
--gpu-memory-utilization 0.9
Verify:
curl http://192.168.10.116:8000/v1/models
# Response: {"data": [{"id": "Qwen/Qwen2.5-Coder-7B-Instruct", ...}]}
⚠️ Important: vLLM does NOT provide /v1/embeddings endpoint - this is why we need LocalSemanticMemory!
Local Embeddings Support¶
Local Embeddings Support¶
LocalSemanticMemory Implementation¶
File: dimos/agents/memory/chroma_impl.py
class LocalSemanticMemory(ChromaAgentSemanticMemory):
"""Uses local sentence-transformers for embeddings (no API required)"""
def __init__(
self,
collection_name: str = "agent_memory",
model_name: str = "sentence-transformers/all-MiniLM-L6-v2",
persist_dir: str = "~/.chroma",
):
# Embedding function using local sentence-transformers
embedding_function = self._create_embedding_function()
# Initialize ChromaDB
self.collection_name = collection_name
self.model_name = model_name
# ...
Dependencies¶
Required packages:
pip install chromadb>=0.4.22 langchain-chroma sentence-transformers
Model download (first run only):
- sentence-transformers/all-MiniLM-L6-v2 (~80MB)
- Cached in ~/.cache/huggingface/
Verifying Local Embeddings¶
Test script: scripts/test_local_embeddings.py
cd /workspaces/shadowhound
python3 scripts/test_local_embeddings.py
Expected output:
✅ All dependencies installed
✅ Model loaded: sentence-transformers/all-MiniLM-L6-v2
✅ Test embeddings generated: 384 dimensions
✅ ChromaDB collection created
✅ Documents stored successfully
✅ Semantic search working
✅ All tests passed!
⚠️ Known Bug: Missing Import¶
Location: dimos/agents/memory/chroma_impl.py line 147
Error:
NameError: name 'SentenceTransformer' is not defined
Root Cause:
# Line 147:
self.model = SentenceTransformer(self.model_name, device=device)
# But imports section (lines 1-40) is MISSING:
from sentence_transformers import SentenceTransformer
Fix (1 line):
# Add to imports at top of file:
from sentence_transformers import SentenceTransformer
Status: Identified, fix pending (waiting for DIMOS branch consolidation)
ShadowHound Implementation¶
Agent API keys
Full functionality will require API keys for the following:
Requirements: - OpenAI API key (required for all LLMAgents due to OpenAIEmbeddings) - Claude API key (required for ClaudeAgent) - Alibaba API key (required for Navigation skills)
This is WRONG!
HuggingFaceLocalAgentdefaults toLocalSemanticMemory()(no OpenAI key needed)- Any agent can use
LocalSemanticMemoryif you pass it explicitly - OpenAI key is only required if using
OpenAISemanticMemory
Why the confusion?
- Most DIMOS examples use OpenAIAgent without passing agent_memory
- OpenAIAgent defaults to OpenAISemanticMemory() if no agent_memory provided
- But this is just a default, not a requirement!
Test Examples from DIMOS Repository¶
With HuggingFaceLocalAgent + Local Embeddings¶
# tests/test_agent_huggingface_local.py
from dimos.agents.agent_huggingface_local import HuggingFaceLocalAgent
agent = HuggingFaceLocalAgent(
dev_name="HuggingFaceLLMAgent",
model_name="Qwen/Qwen2.5-3B",
# agent_memory defaults to LocalSemanticMemory()
)
With OpenAIAgent + OpenAI Embeddings¶
# tests/test_unitree_agent.py
from dimos.agents.agent import OpenAIAgent
agent = OpenAIAgent(
dev_name="UnitreePerceptionAgent",
# agent_memory defaults to OpenAISemanticMemory()
)
ChromaDB Direct Usage (No Agent)¶
# tests/test_standalone_chromadb.py
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
embeddings = OpenAIEmbeddings(
model="text-embedding-3-large",
api_key=OPENAI_API_KEY,
)
db_connection = Chroma(
collection_name="my_collection",
embedding_function=embeddings,
)
⚠️ Missing: No example of OpenAIAgent + LocalSemanticMemory + vLLM!
Summary and Key Takeaways¶
What We Learned¶
- Skills/Function Calling is Critical
- Only OpenAIAgent, ClaudeAgent, and PlanningAgent support skills
- HuggingFaceLocalAgent, HuggingFaceRemoteAgent, CTransformersGGUFAgent do NOT
-
Without skills, cannot control robot via MyUnitreeSkills
-
Local LLM + Local Embeddings is Possible
- vLLM provides OpenAI-compatible API for local LLMs
- sentence-transformers provides local embeddings
-
OpenAIAgent works with both (not just OpenAI cloud)
-
Agent Memory is Separate from LLM
- Can mix: local LLM + cloud embeddings
- Can mix: cloud LLM + local embeddings
-
Default memory depends on agent type
-
DIMOS Has Bugs
- LocalSemanticMemory missing import (line 147)
- AgentMemoryConnectionError.str() bug (fixed)
-
Documentation is misleading/incomplete
-
OpenAIAgent is Most Flexible
- Works with OpenAI, Azure, vLLM, LocalAI
- Supports skills/function calling
- Can use any memory backend
ShadowHound Architecture (Final)¶
┌─────────────────────────────────────────────────────┐
│ ShadowHound Mission Agent │
│ │
│ OpenAIAgent │
│ ├─ LLM: vLLM (Thor Jetson) │ ← Qwen2.5-Coder-7B
│ │ └─ http://192.168.10.116:8000/v1 │
│ │ │
│ ├─ Memory: LocalSemanticMemory │ ← ChromaDB + sentence-transformers
│ │ ├─ Model: all-MiniLM-L6-v2 │
│ │ └─ Storage: ~/.chroma/ │
│ │ │
│ └─ Skills: MyUnitreeSkills │ ← Robot control
│ ├─ Move(x, y, yaw, duration) │
│ ├─ Reverse(x, y, yaw, duration) │
│ ├─ SpinLeft(degrees) │
│ ├─ SpinRight(degrees) │
│ └─ Wait(seconds) │
│ │
└──────────────┬──────────────────────────────────────┘
│
▼
Unitree Go2 Robot
(WebRTC via go2_ros2_sdk)
Status: - ✅ Architecture validated - ✅ Dependencies installed - ⚠️ Blocked by DIMOS import bug - 📋 Next: Consolidate DIMOS branches + apply fix
References and Links¶
DIMOS Source Files¶
- Base Agent Interface:
dimos/agents/agent.py - OpenAIAgent:
dimos/agents/agent.py(line 660) - ClaudeAgent:
dimos/agents/claude_agent.py - PlanningAgent:
dimos/agents/planning_agent.py - HuggingFaceLocalAgent:
dimos/agents/agent_huggingface_local.py - Memory Implementations:
dimos/agents/memory/chroma_impl.py - Skills Base:
dimos/skills/skills.py - Unitree Skills:
dimos/robot/unitree/unitree_skills.py
DIMOS Test Files¶
- HuggingFace Local Tests:
tests/test_agent_huggingface_local*.py - Unitree Agent Tests:
tests/test_unitree_agent.py - ChromaDB Tests:
tests/test_standalone_chromadb.py
ShadowHound Files¶
- Mission Executor:
src/shadowhound_mission_agent/shadowhound_mission_agent/mission_executor.py - Test Script:
scripts/test_local_embeddings.py - Documentation:
- This file:
docs/dimos_local_llm_findings.md - Branch consolidation:
docs/dimos_branch_consolidation.md - Development policy:
docs/dimos_development_policy.md - Local memory roadmap:
docs/local_llm_memory_roadmap.md
External Resources¶
- vLLM Documentation: https://docs.vllm.ai/
- sentence-transformers: https://www.sbert.net/
- ChromaDB: https://docs.trychroma.com/
- OpenAI API (compatibility reference): https://platform.openai.com/docs/api-reference
Document Version: 2.0
Last Updated: October 12, 2025
Next Review: After DIMOS branch consolidation and fix application
Bug Found and Fixed¶
AgentMemoryConnectionError.str() AttributeError¶
Location: dimos/exceptions/agent_memory_exceptions.py line 44
Bug:
def __str__(self):
return f"{self.message}\nCaused by: {repr(self.cause)}" if self.cause else self.message
# ERROR: self.message doesn't exist!
Why it failed:
- Python's Exception class stores the message in args[0], not as a message attribute
- When exception is converted to string (e.g., during logging), it crashes
Fix:
def __str__(self):
# Python Exception stores message in args[0], not as self.message attribute
message = self.args[0] if self.args else "Unknown error"
return f"{message}\nCaused by: {repr(self.cause)}" if self.cause else message
Fixed in DIMOS commit: cfcaa24
ShadowHound Implementation¶
Architecture Decision¶
Goal: Fully local AI stack with robot control
Requirements: 1. ✅ Local LLM (no cloud API costs/latency) 2. ✅ Local embeddings (no cloud API costs) 3. ✅ Skills/function calling (robot control) 4. ✅ Vector memory (RAG for context)
Agent Evaluation:
| Requirement | HuggingFaceLocalAgent | OpenAIAgent + vLLM |
|---|---|---|
| Local LLM | ✅ Yes | ✅ Yes (via vLLM) |
| Local Embeddings | ✅ Yes (default) | ✅ Yes (explicit) |
| Skills Support | ❌ NO | ✅ YES |
| Robot Control | ❌ Cannot use MyUnitreeSkills | ✅ Full support |
Decision: OpenAIAgent + vLLM + LocalSemanticMemory ⭐
Why: Only combination meeting all requirements. HuggingFaceLocalAgent cannot control robot due to missing skills support.
Implementation in mission_executor.py¶
File: src/shadowhound_mission_agent/shadowhound_mission_agent/mission_executor.py
1. Auto-Detection Logic (Lines 260-285)¶
# Determine embeddings strategy based on backend and configuration
use_local_env = os.getenv("USE_LOCAL_EMBEDDINGS", "").lower()
if use_local_env in ("true", "false"):
# User explicitly set preference
use_local_embeddings = use_local_env == "true"
else:
# Auto-detect based on OPENAI_BASE_URL
base_url = os.getenv("OPENAI_BASE_URL", "https://api.openai.com/v1")
use_local_embeddings = "api.openai.com" not in base_url
self.logger.info(f"📊 Using local embeddings: {use_local_embeddings}")
Logic:
1. Explicit USE_LOCAL_EMBEDDINGS=true/false takes precedence
2. Otherwise, detect based on OPENAI_BASE_URL:
- Contains api.openai.com → Use OpenAI embeddings
- Any other URL (vLLM, LocalAI, etc.) → Use local embeddings
2. LocalSemanticMemory Initialization (Lines 288-323)¶
if use_local_embeddings:
try:
from dimos.agents.memory.chroma_impl import LocalSemanticMemory
agent_memory = LocalSemanticMemory(
collection_name="shadowhound_memory",
model_name="sentence-transformers/all-MiniLM-L6-v2",
)
self.logger.info("✅ LocalSemanticMemory initialized")
except ImportError as e:
# Missing dependencies
self.logger.warning("⚠ LocalSemanticMemory dependencies not installed")
self.logger.warning(f" Install: pip install chromadb langchain-chroma sentence-transformers")
agent_memory = "skip" # Prevent None (which triggers OpenAISemanticMemory)
except Exception as e:
# DIMOS bugs or initialization failures
error_msg = str(e)
self.logger.warning(f"⚠ Failed to initialize LocalSemanticMemory: {error_msg}")
if "SentenceTransformer" in error_msg or "name 'SentenceTransformer' is not defined" in error_msg:
self.logger.error("🐛 DIMOS Bug: Missing import in chroma_impl.py line 147")
self.logger.error(" Fix: Add 'from sentence_transformers import SentenceTransformer'")
agent_memory = "skip"
else:
agent_memory = None # OpenAIAgent will auto-create OpenAISemanticMemory
Graceful Fallback:
- Catches ImportError (missing packages)
- Catches Exception (DIMOS bugs)
- Sets agent_memory="skip" to prevent None (which triggers OpenAISemanticMemory)
- Provides actionable error messages
3. OpenAIAgent Initialization (Lines 334-360)¶
# Prepare agent kwargs
agent_kwargs = {
"dev_name": "shadowhound",
"model_name": model_name,
"skills": self.skills, # ⭐ MyUnitreeSkills
"input_video_stream": self.robot.video_rx_stream,
"system_query": dedent("""
You are ShadowHound, an autonomous Unitree Go2 quadruped robot...
"""),
}
# Only pass agent_memory if we successfully created one
if agent_memory != "skip":
agent_kwargs["agent_memory"] = agent_memory
self.logger.info("✅ Using configured agent_memory")
else:
self.logger.warning("⚠ Running without memory (no RAG)")
# Initialize agent
self.agent = OpenAIAgent(**agent_kwargs)
Key Points:
- Always passes skills=MyUnitreeSkills for robot control
- Only passes agent_memory if initialization succeeded
- If agent_memory not passed and backend is OpenAI cloud, defaults to OpenAISemanticMemory
- If agent_memory not passed and backend is vLLM, will try OpenAISemanticMemory and fail (no embeddings endpoint)
4. Skills Initialization (Lines 175-215)¶
from dimos.robot.unitree.unitree_skills import MyUnitreeSkills
self.skills = MyUnitreeSkills(robot=self.robot)
self.logger.info(f"✅ Initialized {len(list(self.skills))} robot skills:")
for skill in self.skills:
self.logger.info(f" - {skill.__name__}")
Output:
✅ Initialized 5 robot skills:
- Move
- Reverse
- SpinLeft
- SpinRight
- Wait
Configuration Files¶
.env (Development - vLLM on Thor)¶
# Agent Backend
AGENT_BACKEND=openai
OPENAI_BASE_URL=http://192.168.10.116:8000/v1
OPENAI_MODEL=Qwen/Qwen2.5-Coder-7B-Instruct
# OPENAI_API_KEY not needed for vLLM
# Embeddings (auto-detected as local due to non-OpenAI URL)
# USE_LOCAL_EMBEDDINGS=true # Optional explicit override
# Robot
MOCK_ROBOT=false
CONN_TYPE=webrtc
GO2_IP=192.168.1.103
.env (Production - Cloud Fallback)¶
# Agent Backend
AGENT_BACKEND=openai
# OPENAI_BASE_URL defaults to https://api.openai.com/v1
OPENAI_MODEL=gpt-4o
OPENAI_API_KEY=sk-...
# Embeddings (auto-detected as cloud)
# Uses OpenAISemanticMemory automatically
# Robot
MOCK_ROBOT=false
CONN_TYPE=webrtc
GO2_IP=192.168.1.103
Testing the Stack¶
1. Test embeddings dependencies:
python3 scripts/test_local_embeddings.py
# Should pass after DIMOS bug fixed
2. Test mission agent:
./start.sh
# Check logs for:
# ✅ LocalSemanticMemory initialized
# ✅ Initialized 5 robot skills
# ✅ DIMOS OpenAI-compatible agent initialized
3. Test robot control:
# In mission agent terminal, give command:
"Move forward 2 meters"
# Should see:
# - LLM generates skill call: Move(x=0.5, duration=4.0)
# - Skill executes: robot moves forward
# - Result stored in memory for future context
Configuration Examples¶
Configuration Examples¶
1. Local LLM + Local Embeddings (⭐ ShadowHound Production)¶
Use Case: Fully autonomous robot with no cloud dependencies
# .env
AGENT_BACKEND=openai
OPENAI_BASE_URL=http://192.168.10.116:8000/v1
OPENAI_MODEL=Qwen/Qwen2.5-Coder-7B-Instruct
# USE_LOCAL_EMBEDDINGS auto-detected as true
Stack: - LLM: vLLM on Thor (Jetson AGX Orin) - Embeddings: sentence-transformers/all-MiniLM-L6-v2 - Vector DB: ChromaDB (~/.chroma/) - Agent: OpenAIAgent with MyUnitreeSkills
Benefits: - ✅ Zero cloud costs - ✅ Low latency (LAN only) - ✅ Works offline - ✅ Full robot control - ✅ Semantic memory/RAG
2. Cloud LLM + Cloud Embeddings¶
Use Case: Maximum quality, don't care about costs
# .env
AGENT_BACKEND=openai
# OPENAI_BASE_URL defaults to https://api.openai.com/v1
OPENAI_MODEL=gpt-4o
OPENAI_API_KEY=sk-...
# USE_LOCAL_EMBEDDINGS auto-detected as false
Stack: - LLM: OpenAI GPT-4o - Embeddings: OpenAI text-embedding-3-large - Vector DB: ChromaDB (~/.chroma/) - Agent: OpenAIAgent with MyUnitreeSkills
Benefits: - ✅ Highest quality LLM - ✅ Best embeddings - ✅ No local GPU needed
Costs: - LLM: ~$5/1M input tokens, ~$15/1M output tokens - Embeddings: ~$0.13/1M tokens
3. Hybrid: Local LLM + Cloud Embeddings¶
Use Case: Save on LLM costs but use best embeddings
# .env
AGENT_BACKEND=openai
OPENAI_BASE_URL=http://192.168.10.116:8000/v1
OPENAI_MODEL=Qwen/Qwen2.5-Coder-7B-Instruct
USE_LOCAL_EMBEDDINGS=false # ⚠️ Explicit override
OPENAI_API_KEY=sk-... # For embeddings only
Stack: - LLM: vLLM (free) - Embeddings: OpenAI (paid) - Vector DB: ChromaDB (~/.chroma/) - Agent: OpenAIAgent with MyUnitreeSkills
4. Hybrid: Cloud LLM + Local Embeddings¶
Use Case: Best LLM, no embeddings costs
# .env
AGENT_BACKEND=openai
OPENAI_BASE_URL=https://api.openai.com/v1
OPENAI_MODEL=gpt-4o
OPENAI_API_KEY=sk-...
USE_LOCAL_EMBEDDINGS=true # ⚠️ Explicit override
Stack: - LLM: OpenAI GPT-4o (paid) - Embeddings: sentence-transformers (free) - Vector DB: ChromaDB (~/.chroma/) - Agent: OpenAIAgent with MyUnitreeSkills
5. Simple Text Agent (No Skills)¶
Use Case: Local text generation without robot control
from dimos.agents.agent_huggingface_local import HuggingFaceLocalAgent
agent = HuggingFaceLocalAgent(
dev_name="TextAgent",
model_name="Qwen/Qwen2.5-3B",
# agent_memory defaults to LocalSemanticMemory()
)
Limitations: - ❌ NO skills/function calling - ❌ Cannot control robot - ✅ Good for chatbots, summarization, etc.
Known Issues and Bugs¶
1. Missing Import in LocalSemanticMemory ⚠️ CRITICAL¶
File: src/dimos-unitree/dimos/agents/memory/chroma_impl.py
Line: 147
Status: Identified, fix pending
Error:
NameError: name 'SentenceTransformer' is not defined
Root Cause:
# Line 147:
self.model = SentenceTransformer(self.model_name, device=device)
# But import is missing at top of file (lines 1-40)
Fix:
# Add to imports section:
from sentence_transformers import SentenceTransformer
Impact: - ❌ LocalSemanticMemory initialization fails - ❌ Falls back to no memory (no RAG) - ❌ Blocks local embeddings for OpenAIAgent + vLLM
Workaround: - Use cloud embeddings temporarily - OR fix DIMOS locally (violates submodule policy but documented in emergency workflow)
Resolution Plan: 1. Consolidate DIMOS branches (dev + fix/webrtc) 2. Apply fix to consolidated dev branch 3. Test with ShadowHound 4. Submit PR to upstream DIMOS 5. Update submodule SHA in ShadowHound
2. README Misleading Claim¶
File: dimos-unitree/README.md
Claim:
Agent API keys
Full functionality will require API keys for the following:
Requirements: - OpenAI API key (required for all LLMAgents due to OpenAIEmbeddings)
Reality:
- ❌ NOT required for HuggingFaceLocalAgent (uses LocalSemanticMemory by default)
- ❌ NOT required for any agent if you pass agent_memory=LocalSemanticMemory() explicitly
- ✅ Only required if using OpenAISemanticMemory (which is just a default, not a requirement)
Impact: Misleading documentation discourages local deployments
Resolution: Update DIMOS README to clarify local options
3. OpenAIAgent Memory Default¶
File: dimos/agents/agent.py line 95
Issue:
self.agent_memory = agent_memory or OpenAISemanticMemory()
This means if you pass agent_memory=None, it creates OpenAISemanticMemory which:
- Requires OpenAI API key
- Makes API calls to /v1/embeddings
- Fails with vLLM (no embeddings endpoint)
Workaround: Never pass None, always pass explicit memory or omit parameter
Our Solution: Use agent_memory="skip" as sentinel value to prevent None
4. AgentMemoryConnectionError.str() Bug (FIXED)¶
File: dimos/exceptions/agent_memory_exceptions.py line 44
Status: ✅ Fixed in DIMOS commit cfcaa24
Bug:
def __str__(self):
return f"{self.message}\nCaused by: {repr(self.cause)}" if self.cause else self.message
# ERROR: self.message doesn't exist! Python stores in args[0]
Fix:
def __str__(self):
message = self.args[0] if self.args else "Unknown error"
return f"{message}\nCaused by: {repr(self.cause)}" if self.cause else message
5. No vLLM + LocalSemanticMemory Examples¶
Issue: DIMOS documentation has no examples of: - OpenAIAgent with vLLM backend - OpenAIAgent with LocalSemanticMemory - Hybrid configurations
Impact: Users don't know these combinations are possible
Resolution: Add examples to DIMOS docs (and this document!)
Recommendations¶
For DIMOS Project¶
If contributing back to DIMOS:
-
Fix LocalSemanticMemory import (1 line, critical)
python from sentence_transformers import SentenceTransformer -
Update README to clarify:
- OpenAI key is optional
- Local embeddings available via LocalSemanticMemory
-
HuggingFaceLocalAgent works completely locally
-
Add examples of:
- OpenAIAgent + vLLM backend
- OpenAIAgent + LocalSemanticMemory
-
Hybrid configurations
-
Document agent capabilities clearly:
- Which agents support skills/tools
- Which agents support vision
-
Which agents support local operation
-
Add dependency checking in LocalSemanticMemory:
python try: from sentence_transformers import SentenceTransformer except ImportError: raise ImportError("sentence-transformers not installed. Run: pip install sentence-transformers")
For ShadowHound Project¶
Immediate: 1. ✅ Consolidate DIMOS branches (dev + fix/webrtc) 2. ✅ Apply LocalSemanticMemory import fix 3. ✅ Test full stack with local LLM + local embeddings 4. ✅ Verify robot control via skills works end-to-end
Short-term: 1. Monitor vLLM performance and quality 2. Benchmark local embeddings vs OpenAI embeddings 3. Consider contributing fixes back to DIMOS 4. Document any additional issues found
Long-term: 1. Evaluate switching to PlanningAgent for complex missions 2. Consider ClaudeAgent for scenarios requiring extended thinking 3. Explore vision capabilities (multimodal missions) 4. Benchmark different embedding models
Test Examples in DIMOS¶
- DIMOS Agent Interface:
dimos/agents/agent.py - HuggingFace Local Agent:
dimos/agents/agent_huggingface_local.py - Memory Implementations:
dimos/agents/memory/chroma_impl.py - Test Examples:
tests/test_agent_huggingface_local*.py - ShadowHound Implementation:
src/shadowhound_mission_agent/shadowhound_mission_agent/mission_executor.py