LLM Integration Documentation¶
Purpose¶
Central hub for local LLM backend documentation covering vLLM deployment, Ollama alternatives, performance benchmarking, and integration guides.
Prerequisites¶
- NVIDIA Jetson AGX Thor or compatible GPU system
- Understanding of LLM concepts and model serving
- Familiarity with Docker/containers
Quick Start Guides¶
Production (Recommended)¶
- vLLM Quick Start — Deploy Hermes-2-Pro on Thor using NVIDIA's official container
Alternative Backends¶
- Ollama Setup — Alternative local LLM backend (24x faster than cloud)
- Ollama Backend Integration — Integration details
- llama.cpp Migration — Migration from llama.cpp
Backend Selection & Validation¶
Validation & Testing¶
- LLM Backend Validation — Startup validation system
- Backend Validation Summary — Validation results
- Backend Quick Reference — Quick command reference
- Ollama Testing Guide — Test procedures
Model Selection¶
- Ollama Models — Available models overview
- Ollama Model Selection — Choosing the right model
- Ollama Model Comparison — Performance comparison
Performance & Benchmarking¶
Benchmarking Guides¶
- Ollama Benchmarking — Benchmark methodology
- Benchmark Results — Performance data
- Benchmark Memory Management — Memory optimization
- Quality Scoring Summary — Quality metrics
- Ollama Quality Scoring — Detailed scoring
Thor-Specific Notes¶
- Thor Resources — Thor GPU/memory specs
- Thor Performance Notes — Performance observations
- Security Analysis (jtop) — Security considerations
Deployment & Operations¶
Deployment¶
- Ollama Deployment Checklist — Pre-deployment verification
- Feature Complete (Ollama) — Feature status
- Local AI Implementation Status — Overall status
Integration¶
- Local LLM Integration Summary — Integration overview
- Local LLM Memory Roadmap — Memory/RAG roadmap
- Ollama Status & TODOs — Current status and tasks
Authentication & Configuration¶
- vLLM HuggingFace Auth — Authentication setup (if needed)
Architecture Overview¶
Backend Options¶
| Backend | Speed | Use Case | Status |
|---|---|---|---|
| vLLM (Thor) | 1-2s | Production, autonomous | ✅ Recommended |
| Ollama (Gaming PC) | 0.5-1s | Development, fastest | ✅ Validated |
| Ollama (Thor) | 1-2s | Alternative to vLLM | ✅ Validated |
| Cloud (GPT-4) | 3-7s | Fallback, highest quality | ✅ Default |
Key Components¶
- Mission Planner — Converts natural language to skill sequences
- Backend Validation — Startup health checks for LLM connectivity
- Embeddings — ChromaDB for local vector storage
- Skills API — Typed interface for robot commands
Common Tasks¶
Deploy vLLM on Thor¶
# See vllm_quickstart.md for full guide
docker run --runtime nvidia --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-p 8000:8000 \
nvcr.io/nvidia/pytorch:24.07-py3 \
/bin/bash -c "pip install vllm && vllm serve NousResearch/Hermes-2-Pro-Llama-3-8B"
Test Backend Connection¶
# Validate backend at startup
ros2 launch shadowhound_bringup mission_agent.launch.py
# Check logs for validation results
# Should see: "✅ LLM backend validated successfully"
Switch Backends¶
# Edit .env file
# AGENT_BACKEND=cloud # or 'local' for vLLM/Ollama
# Restart mission agent
ros2 launch shadowhound_bringup mission_agent.launch.py
Validation¶
- [ ] vLLM deployment tested on Thor
- [ ] Ollama benchmarks validated
- [ ] Backend validation working correctly
- [ ] Model selection guide accurate
- [ ] Performance metrics up-to-date
See Also¶
- Environment Variables — LLM backend configuration
- Agent Architecture — How agent uses LLMs
- Hardware Topologies — Thor power/network setup
- Software Index — Complete software documentation
References¶
- Documentation Root
- vLLM Documentation: https://docs.vllm.ai/
- Ollama Documentation: https://ollama.ai/
- Hermes-2-Pro Model: https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B
- NVIDIA PyTorch Containers: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch