LLM Integration Documentation¶

Purpose¶

Central hub for local LLM backend documentation covering vLLM deployment, Ollama alternatives, performance benchmarking, and integration guides.

Prerequisites¶

NVIDIA Jetson AGX Thor or compatible GPU system
Understanding of LLM concepts and model serving
Familiarity with Docker/containers

Quick Start Guides¶

Production (Recommended)¶

vLLM Quick Start — Deploy Hermes-2-Pro on Thor using NVIDIA's official container

Alternative Backends¶

Ollama Setup — Alternative local LLM backend (24x faster than cloud)
Ollama Backend Integration — Integration details
llama.cpp Migration — Migration from llama.cpp

Backend Selection & Validation¶

Validation & Testing¶

LLM Backend Validation — Startup validation system
Backend Validation Summary — Validation results
Backend Quick Reference — Quick command reference
Ollama Testing Guide — Test procedures

Model Selection¶

Ollama Models — Available models overview
Ollama Model Selection — Choosing the right model
Ollama Model Comparison — Performance comparison

Performance & Benchmarking¶

Benchmarking Guides¶

Ollama Benchmarking — Benchmark methodology
Benchmark Results — Performance data
Benchmark Memory Management — Memory optimization
Quality Scoring Summary — Quality metrics
Ollama Quality Scoring — Detailed scoring

Thor-Specific Notes¶

Thor Resources — Thor GPU/memory specs
Thor Performance Notes — Performance observations
Security Analysis (jtop) — Security considerations

Deployment & Operations¶

Deployment¶

Ollama Deployment Checklist — Pre-deployment verification
Feature Complete (Ollama) — Feature status
Local AI Implementation Status — Overall status

Integration¶

Local LLM Integration Summary — Integration overview
Local LLM Memory Roadmap — Memory/RAG roadmap
Ollama Status & TODOs — Current status and tasks

Authentication & Configuration¶

vLLM HuggingFace Auth — Authentication setup (if needed)

Architecture Overview¶

Backend Options¶

Backend	Speed	Use Case	Status
vLLM (Thor)	1-2s	Production, autonomous	✅ Recommended
Ollama (Gaming PC)	0.5-1s	Development, fastest	✅ Validated
Ollama (Thor)	1-2s	Alternative to vLLM	✅ Validated
Cloud (GPT-4)	3-7s	Fallback, highest quality	✅ Default

Key Components¶

Mission Planner — Converts natural language to skill sequences
Backend Validation — Startup health checks for LLM connectivity
Embeddings — ChromaDB for local vector storage
Skills API — Typed interface for robot commands

Common Tasks¶

Deploy vLLM on Thor¶

# See vllm_quickstart.md for full guide
docker run --runtime nvidia --gpus all \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -p 8000:8000 \
  nvcr.io/nvidia/pytorch:24.07-py3 \
  /bin/bash -c "pip install vllm && vllm serve NousResearch/Hermes-2-Pro-Llama-3-8B"

Test Backend Connection¶

# Validate backend at startup
ros2 launch shadowhound_bringup mission_agent.launch.py

# Check logs for validation results
# Should see: "✅ LLM backend validated successfully"

Switch Backends¶

# Edit .env file
# AGENT_BACKEND=cloud  # or 'local' for vLLM/Ollama

# Restart mission agent
ros2 launch shadowhound_bringup mission_agent.launch.py

Validation¶

[ ] vLLM deployment tested on Thor
[ ] Ollama benchmarks validated
[ ] Backend validation working correctly
[ ] Model selection guide accurate
[ ] Performance metrics up-to-date

References¶

Documentation Root
vLLM Documentation: https://docs.vllm.ai/
Ollama Documentation: https://ollama.ai/
Hermes-2-Pro Model: https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B
NVIDIA PyTorch Containers: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch

LLM Integration Documentation¶

Purpose¶

Prerequisites¶

Quick Start Guides¶

Production (Recommended)¶

Alternative Backends¶

Backend Selection & Validation¶

Validation & Testing¶

Model Selection¶

Performance & Benchmarking¶

Benchmarking Guides¶

Thor-Specific Notes¶

Deployment & Operations¶

Deployment¶

Integration¶

Authentication & Configuration¶

Architecture Overview¶

Backend Options¶

Key Components¶

Common Tasks¶

Deploy vLLM on Thor¶

Test Backend Connection¶

Switch Backends¶

Validation¶

See Also¶

References¶