Thor GPU Resources and Future Directions¶
Last Updated: 2025-10-10
Purpose: Reference links and notes for Thor optimization and advanced features
🔗 Key Resources¶
Performance Optimization¶
vLLM on Thor (Promising Alternative to Ollama)¶
- URL: https://forums.developer.nvidia.com/t/performance-comparison-of-qwen3-30b-a3b-awq-on-jetson-thor-vs-orin-agx-64gb/345449/5
- Topic: Performance comparison of Qwen3-30B-A3B-AWQ on Thor vs Orin AGX 64GB
- Why Important:
- vLLM may offer better performance than Ollama on Thor
- Direct performance comparisons between Thor and Orin AGX
- AWQ quantization techniques for Jetson platforms
- Community discussion of real-world results
- Potential Impact:
- Alternative to Ollama if GPU degradation issue persists
- May solve performance consistency problems
- Worth investigating if current setup has issues
- Next Steps:
- [ ] Test vLLM on Thor with phi4:14b or qwen models
- [ ] Compare vLLM vs Ollama performance (speed, stability, memory)
- [ ] Check if vLLM has model unload/reload degradation issue
- [ ] Evaluate ease of integration with mission agent
Jetson AI Stack Documentation¶
- URL: https://elinux.org/Jetson/L4T/Jetson_AI_Stack#AGX_Thor
- Topic: Official Jetson AI Stack documentation for Thor
- Why Important:
- Comprehensive guide to AI stack on Thor
- Official NVIDIA recommendations
- Integration patterns and best practices
- Supported frameworks and optimizations
- Contains:
- Installation instructions for AI frameworks
- Performance tuning guidelines
- Container configurations
- Known issues and workarounds
- Relevant Sections:
- AGX Thor specific configurations
- JetPack 7.x compatibility notes
- GPU optimization settings
- Memory management best practices
- Next Steps:
- [ ] Review recommended AI stack configuration
- [ ] Compare our setup vs official recommendations
- [ ] Check for Thor-specific optimizations we're missing
- [ ] Look for CUDA persistence mode settings
Advanced Features and Use Cases¶
Jetson Thor Setup and Demo Guide (PDF)¶
- URL: https://international.download.nvidia.com/JetsonThorReview/Jetson-Thor-Setup-and-Demo-Guide.pdf
- Topic: Official Thor setup guide with demo applications
- Why Important:
- Official NVIDIA documentation
- Real-world application examples
- Performance benchmarks and expectations
- Advanced feature demonstrations
Key Sections:
- Gr00t (Robot Foundation Model)
- Humanoid robot control
- Foundation model for robotics
- May be relevant for advanced GO2 behaviors
-
Potential future integration for complex navigation/manipulation
-
VSS (Video Search and Summarization) ⭐
- Relevance: Mentioned for separate project
- Video processing on Thor GPU
- Real-time video analysis
- Content summarization capabilities
- Use Cases:
- Robot camera feed analysis
- Environment understanding from video
- Mission logging and review
- Separate project requirements
-
Next Steps for VSS Project:
- [ ] Review VSS implementation details in PDF
- [ ] Check hardware requirements (VRAM, compute)
- [ ] Evaluate integration with Thor setup
- [ ] Test VSS demo on Thor
-
Other Demos (scan PDF for):
- [ ] LLM inference benchmarks
- [ ] Vision model examples (relevant for GO2 perception)
- [ ] Multi-modal AI demonstrations
- [ ] Performance optimization techniques
🔍 Investigation Priorities¶
Priority 1: Solve Current GPU Degradation Issue¶
Before exploring alternatives: - [ ] Complete robot testing with phi4:14b + Ollama - [ ] Install jtop and monitor GPU behavior - [ ] Test CUDA persistence mode - [ ] Document reproducible test case
If degradation persists: - [ ] Test vLLM as Ollama alternative (see forum link above) - [ ] Review Jetson AI Stack docs for optimization hints - [ ] Contact NVIDIA/Ollama maintainers with findings
Priority 2: vLLM Evaluation (If Needed)¶
Why Consider vLLM: - Specifically optimized for inference performance - May handle model loading/unloading better - Active Jetson community support - Performance data available (see forum link)
Evaluation Criteria: 1. Performance: Speed vs Ollama on same models 2. Stability: Does model cycling cause degradation? 3. Memory: VRAM usage vs Ollama 4. Integration: Ease of use with mission agent 5. Features: Model support, API compatibility
Test Plan (if pursuing):
# Install vLLM on Thor
pip install vllm
# Test with phi4:14b or qwen models
python -m vllm.entrypoints.openai.api_server \
--model phi4:14b \
--host 0.0.0.0 \
--port 11435
# Benchmark against Ollama
# Use same test prompts
# Compare: speed, memory, stability over time
Integration Notes:
- vLLM has OpenAI-compatible API
- Mission agent should work with minimal changes
- Update OLLAMA_BASE_URL to point to vLLM server
- May need to adjust model names/formats
Priority 3: Advanced Features Exploration¶
Gr00t (Robot Foundation Model): - Potential for advanced GO2 behaviors - Complex navigation and manipulation - Multi-modal robot control - Future consideration after basic LLM integration stable
VSS (Video Search and Summarization): - Read PDF section on VSS implementation - Evaluate for robot camera feed analysis - Consider for separate project needs - May complement GO2 perception skills
Other Thor Capabilities: - Review full PDF for relevant demos - Check performance benchmarks - Identify applicable optimizations
📚 Related Documentation¶
In This Repo:
- docs/THOR_PERFORMANCE_NOTES.md - Current GPU issues and workarounds
- docs/OLLAMA_STATUS_AND_TODOS.md - Current Ollama setup status
- scripts/setup_ollama_thor.sh - Current Ollama container setup
- scripts/install_jtop_thor.sh - GPU monitoring installation
External: - NVIDIA Jetson Forums: https://forums.developer.nvidia.com/c/agx-autonomous-machines/jetson-embedded-systems/ - vLLM Documentation: https://docs.vllm.ai/ - Ollama Documentation: https://github.com/ollama/ollama/tree/main/docs
🎯 Decision Framework¶
When to Consider vLLM¶
Switch if: - ✅ Ollama GPU degradation unsolvable - ✅ vLLM shows 2x+ performance improvement - ✅ vLLM is more stable over time - ✅ Integration effort is reasonable (<1 day)
Stay with Ollama if: - ✅ GPU degradation is solved (CUDA persistence, keep_alive, etc.) - ✅ Performance is acceptable (>15 tok/s sustained) - ✅ System is stable with model kept loaded - ✅ Current integration is working well
Evaluation Timeline¶
Phase 1: Current Setup (Tomorrow 2025-10-11) - Test Ollama + phi4:14b on robot - Monitor with jtop - Document performance and stability
Phase 2: Optimization (If needed, 2025-10-12+) - Try CUDA optimizations - Test keep_alive strategies - Review Jetson AI Stack recommendations
Phase 3: Alternative Evaluation (If Phase 2 fails, TBD) - Set up vLLM test environment - Benchmark vLLM vs Ollama - Evaluate integration effort - Make switch decision
💡 Notes and Ideas¶
vLLM vs Ollama Considerations¶
Ollama Pros: - ✅ Already set up and working - ✅ Simple API - ✅ Good model ecosystem (Modelfile format) - ✅ Easy model management - ✅ Mission agent integration complete
Ollama Cons: - ❌ GPU degradation issue (current blocker) - ❌ May not be optimized for Jetson - ❌ Limited control over inference settings
vLLM Pros: - ✅ Designed for inference performance - ✅ Jetson-specific optimizations available - ✅ Community reports good Thor performance - ✅ More inference control (batch size, tensor parallel, etc.)
vLLM Cons: - ❌ Not yet tested on our setup - ❌ May require code changes - ❌ Different model format (HuggingFace) - ❌ Additional setup complexity
VSS Project Notes¶
From PDF (to be filled in after reading): - Hardware requirements: ___ - Software stack: ___ - Performance expectations: ___ - Integration points: ___
Potential Applications: 1. Robot Camera Analysis: Real-time video understanding from GO2 cameras 2. Mission Logging: Summarize robot missions from video 3. Environment Mapping: Video-based scene understanding 4. Separate Project: (User mentioned other use case)
Next Steps: - [ ] Read VSS section of PDF thoroughly - [ ] Document hardware/software requirements - [ ] Evaluate feasibility on Thor with current setup - [ ] Consider resource sharing with LLM (memory, GPU)
🔗 Quick Links Reference¶
| Resource | URL | Purpose |
|---|---|---|
| vLLM on Thor Forum | Link | Performance data, alternative to Ollama |
| Jetson AI Stack Docs | Link | Official AI stack guide |
| Thor Setup Guide PDF | Link | Gr00t, VSS, demos |
| NVIDIA Jetson Forums | Link | Community support |
| vLLM Documentation | Link | vLLM setup and usage |
| Ollama Documentation | Link | Current setup reference |
📝 Action Items¶
Immediate (Tomorrow)¶
- [ ] Test current Ollama setup on robot
- [ ] Monitor with jtop during testing
- [ ] Document any performance issues
Short-term (This Week)¶
- [ ] Read Thor Setup PDF (Gr00t and VSS sections)
- [ ] Review Jetson AI Stack documentation
- [ ] Investigate CUDA persistence mode
- [ ] Test model keep_alive strategies
Medium-term (If Needed)¶
- [ ] Evaluate vLLM on Thor (forum link)
- [ ] Benchmark vLLM vs Ollama
- [ ] Test VSS demo from PDF
- [ ] Consider integration changes
Long-term (Future)¶
- [ ] Explore Gr00t for advanced robot behaviors
- [ ] Evaluate VSS for separate project
- [ ] Optimize Thor AI stack configuration
- [ ] Performance tuning based on production data
Remember: Current setup may be fine! These are backup options and future directions if needed.
Priority: Make current Ollama + phi4:14b work well before exploring alternatives.
Last Updated: 2025-10-10 23:50
Status: Resources captured, ready for future reference