Skip to content

LLM Integration Documentation

Purpose

Central hub for local LLM backend documentation covering vLLM deployment, Ollama alternatives, performance benchmarking, and integration guides.

Prerequisites

  • NVIDIA Jetson AGX Thor or compatible GPU system
  • Understanding of LLM concepts and model serving
  • Familiarity with Docker/containers

Quick Start Guides

  • vLLM Quick Start — Deploy Hermes-2-Pro on Thor using NVIDIA's official container

Alternative Backends

Backend Selection & Validation

Validation & Testing

Model Selection

Performance & Benchmarking

Benchmarking Guides

Thor-Specific Notes

Deployment & Operations

Deployment

Integration

Authentication & Configuration

Architecture Overview

Backend Options

Backend Speed Use Case Status
vLLM (Thor) 1-2s Production, autonomous ✅ Recommended
Ollama (Gaming PC) 0.5-1s Development, fastest ✅ Validated
Ollama (Thor) 1-2s Alternative to vLLM ✅ Validated
Cloud (GPT-4) 3-7s Fallback, highest quality ✅ Default

Key Components

  • Mission Planner — Converts natural language to skill sequences
  • Backend Validation — Startup health checks for LLM connectivity
  • Embeddings — ChromaDB for local vector storage
  • Skills API — Typed interface for robot commands

Common Tasks

Deploy vLLM on Thor

# See vllm_quickstart.md for full guide
docker run --runtime nvidia --gpus all \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -p 8000:8000 \
  nvcr.io/nvidia/pytorch:24.07-py3 \
  /bin/bash -c "pip install vllm && vllm serve NousResearch/Hermes-2-Pro-Llama-3-8B"

Test Backend Connection

# Validate backend at startup
ros2 launch shadowhound_bringup mission_agent.launch.py

# Check logs for validation results
# Should see: "✅ LLM backend validated successfully"

Switch Backends

# Edit .env file
# AGENT_BACKEND=cloud  # or 'local' for vLLM/Ollama

# Restart mission agent
ros2 launch shadowhound_bringup mission_agent.launch.py

Validation

  • [ ] vLLM deployment tested on Thor
  • [ ] Ollama benchmarks validated
  • [ ] Backend validation working correctly
  • [ ] Model selection guide accurate
  • [ ] Performance metrics up-to-date

See Also

References

  • Documentation Root
  • vLLM Documentation: https://docs.vllm.ai/
  • Ollama Documentation: https://ollama.ai/
  • Hermes-2-Pro Model: https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B
  • NVIDIA PyTorch Containers: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch