CS25: Transformers United V6

Stanford University • Spring 2026

Overview

CS25 has become one of Stanford’s hottest seminar courses, featuring top researchers at the forefront of Transformers research. Each week dives into the latest breakthroughs in AI, from large language models to applications in art, biology, and robotics.

Quarter: Spring 2026 (March 30 - June 3)
When: Thursdays 4:30 - 5:50 pm PDT
Where: Skiling Auditorium + Zoom livestream (open to public)

This page tracks my notes, speaker research, and key insights from the course.

Note: Some pre-reads are based on speaker research only, with topics still TBD. These will be updated as the course announces topics. Check the official course site for the latest schedule.

Schedule

Date	Topic	Speaker(s)	Slides	Pre-Read	Session Notes
Apr 2	Overview of Transformers	Instructors	Course Slides	Pre-Read	Notes
Apr 9	JEPA & World Models	Hazel Nam & Lucas Maes (Brown)	Slides	Pre-Read	Notes
Apr 16	SSMs & Mamba	Albert Gu (CMU)	—	Pre-Read	Notes
Apr 23	Ultra-Scale Training	Nouamane Tazi (Hugging Face)	—	Pre-Read	—
Apr 30	TBA	TBD	—	—	—
May 7	TBD	Andrew Lampinen (Anthropic)	—	Pre-Read	—
May 14	TBD	Vivek Natarajan (DeepMind)	—	Pre-Read	—
May 21	TBA	TBD	—	—	—
May 28	TBD	Charles Frye (Modal)	—	Pre-Read	—

Quick Links

Speaker Research — Bio, papers, question banks, and relevance to autonomy for all course speakers
Themes & Connections — Cross-lecture patterns and connections
Deep Dives — Extended technical explanations (empty — content now in session notes)

Deep Dives

Topic	Description
Object-Centric vs. Patch-Based Representations & Bidirectional Transformers	Why JEPA uses object-centric representations instead of patches, how Slot Attention resolves binding, and what bidirectional transformers enable

Key Links

Course Site
YouTube Playlist
Live session access / Q&A / community / auditor signup: See the official course site for the current links and participation details.

Relevance to Autonomy Research

This course is particularly relevant for:

Embodied AI: JEPA architectures for world models (Week 2)
Long-horizon reasoning: SSMs/Mamba for extended context (Week 3)
Safety-critical deployment: Med-PaLM validation patterns (Week 7)
Efficient inference: Block importance, efficient transformers (Week 6)
Infrastructure: Serverless GPU deployment (Week 10)

Last updated: 2026-04-22