Session 1: Overview of Transformers

Video: Not yet posted to YouTube playlist

Status: ⚠️ Slide insights only — awaiting lecture video for complete synthesis

Session Overview

Brief intro and overview of the history of ML/NLP, Transformers and how they work, and their impact on robotics, autonomy, and embodied AI. Discussion about recent trends, breakthroughs, applications, and current challenges/weaknesses.

View the full slide deck

Key Insights from Slides

(Extracted manually from Week 1 Overview slides - 2026-04-02)

1. Transformers Architecture: The Core components

Self-attention mechanism — Enables processing of relationships between words in sequences
Position encoding — Provides information about token position in the sequence
Feed-forward layers — Allow the model to consider information from previous positions when making predictions
Layer normalization — Stabilizes training, enables faster convergence

2. Evolution of NLP/ML

3. Impact on Robotics/Autonomy

World models — Transformers enable learning predictive models for robot decision-making
Embodied AI — Vision-language models (VLMs) for robot perception and control
Long-horizon planning — Extended context windows enable multi-step reasoning
Efficiency for Edge deployment — Smaller, quantized models for real-time control on robots
Safety-critical applications — Medical AI (Med-PaLM) demonstrates validation patterns for high-stakes domains
Sim-to-real transfer — Challenges in transferring simulation success to real-world deployment
Interpretability — Understanding how models make decisions is crucial for trustworthy autonomy
Catastrophic forgetting — Models can lose information over long sequences
Sample efficiency — Training requires massive datasets
Computational cost — Inference can be expensive without optimization
Robustness — Distribution shifts, adversarial inputs can cause unpredictable behavior

4. Current Trends

Multimodality — Vision, language, audio integration
Tool use — Function calling, API integration
Personalization — Fine-tuning for specific domains
Mixture of Experts — Combining specialized models

5. Open Challenges

Interpretability — Black-box nature makes understanding difficult
Hallucination — Models can generate confident but incorrect outputs
Alignment — Ensuring model behavior matches human intent
Compute sustainability — Environmental cost of training and deployment
Regulation — Governance frameworks for autonomous systems

Reference Links

Course Site: CS25: Transformers United
Slides: Week 1 Overview (PDF)
YouTube Playlist: CS25 V6 Playlist
Related Reading: Attention Is All You Need — foundational paper

Next Steps

Watch Week 1 lecture on YouTube once posted
Fill in Key Takeaways section with detailed notes
Complete Open Questions section with questions for the speakers
Add connections to autonomy research section

Awaiting Video

The following sections will be populated once the lecture video is posted to YouTube:

Transcript highlights — Key moments with timestamps
Answers to pre-read questions — Which questions from the Pre-Read were addressed?
Additional papers mentioned — Any papers referenced in lecture not in pre-read
Speaker insights — Perspectives shared verbally vs. slides

Week 2: JEPA — Hazel Nam & Lucas Maes (Brown University) — world models for robotics
Week 3: SSMs — Albert Gu (CMU) — Mamba creator, efficient long-context alternatives
Week 6: Interpretability — Andrew Lampinen (Anthropic) — understanding transformer reasoning
Week 7: Med-PaLM — Vivek Natarajan (DeepMind) — safety-critical deployment patterns

Last updated: 2026-04-04