CS25: Transformers United V6

Stanford University • Spring 2026

View Course Site YouTube Playlist


Overview

CS25 has become one of Stanford’s hottest seminar courses, featuring top researchers at the forefront of Transformers research. Each week dives into the latest breakthroughs in AI, from large language models to applications in art, biology, and robotics.

Quarter: Spring 2026 (March 30 - June 3)
When: Thursdays 4:30 - 5:50 pm PDT
Where: Skiling Auditorium + Zoom livestream (open to public)

This page tracks my notes, speaker research, and key insights from the course.


Note: Some pre-reads are based on speaker research only, with topics still TBD. These will be updated as the course announces topics. Check the official course site for the latest schedule.


Schedule

Date Topic Speaker(s) Slides Pre-Read Session Notes
Apr 2 Overview of Transformers Instructors Course Slides Pre-Read Notes
Apr 9 JEPA & World Models Hazel Nam & Lucas Maes (Brown) Slides Pre-Read Notes
Apr 16 SSMs & Mamba Albert Gu (CMU) Pre-Read Notes
Apr 23 Ultra-Scale Training Nouamane Tazi (Hugging Face) Pre-Read
Apr 30 TBA TBD
May 7 TBD Andrew Lampinen (Anthropic) Pre-Read
May 14 TBD Vivek Natarajan (DeepMind) Pre-Read
May 21 TBA TBD
May 28 TBD Charles Frye (Modal) Pre-Read

  • Speaker Research — Bio, papers, question banks, and relevance to autonomy for all course speakers
  • Themes & Connections — Cross-lecture patterns and connections
  • Deep Dives — Extended technical explanations (empty — content now in session notes)

Deep Dives

Topic Description
Object-Centric vs. Patch-Based Representations & Bidirectional Transformers Why JEPA uses object-centric representations instead of patches, how Slot Attention resolves binding, and what bidirectional transformers enable

  • Course Site
  • YouTube Playlist
  • Live session access / Q&A / community / auditor signup: See the official course site for the current links and participation details.

Relevance to Autonomy Research

This course is particularly relevant for:

  • Embodied AI: JEPA architectures for world models (Week 2)
  • Long-horizon reasoning: SSMs/Mamba for extended context (Week 3)
  • Safety-critical deployment: Med-PaLM validation patterns (Week 7)
  • Efficient inference: Block importance, efficient transformers (Week 6)
  • Infrastructure: Serverless GPU deployment (Week 10)

Last updated: 2026-04-22