LLM From Scratch — Video Lessons Index ·

This blog is a treasure trove for the curious mind—an open classroom where modern artificial intelligence is unpacked, explained, and built piece by piece. It contains not only the complete text of Build a Large Language Model from Scratch but also an accompanying video series in which the author walks through every chapter step by step, explaining the code and concepts in real time.

Build a Large Language Model from Scratch offers a practical, hands-on exploration of how today’s AI systems like GPT actually work—layer by layer, tensor by tensor. It leads you from raw text and tokenization through attention mechanisms, training loops, and text generation, grounding every idea in both mathematics and executable code. Instead of treating large language models as mysterious black boxes, this work illuminates their inner workings with clarity and rigor. By the end, you’ll not only understand how an LLM thinks—you’ll have built one yourself.

Read the book — Build a Large Language Model From Scratch (PDF)

Below is a carefully organized, lesson-by-lesson index of the LLM From Scratch video series. Each link opens the corresponding MP4 directly so you can follow along from anywhere.

Tip: The videos are grouped by chapter (Unit). Begin with Unit 1 for setup and foundations, explore tokenization in Unit 2, dive into attention in Unit 3, and finish strong with training and text generation in Unit 5.

Unit 1 — Setup & Foundations
#

Python Environment Setup
Prepare your workstation and Python toolchain so all later notebooks and scripts run without fuss.
▶︎ U01M01 Python Environment Setup Video.mp4
Foundations to Build a Large Language Model (From Scratch)
Big-picture tour of what an LLM is, the building blocks you’ll implement, and how the pieces fit.
▶︎ U01M02 Foundations to Build a Large Language Model (From Scratch).mp4

Unit 2 — Tokenization & Data Pipeline
#

Prerequisites to Chapter 2
Short preface on goals and required background for the tokenization chapter.
▶︎ U02M01 Prerequisites to Chapter 2.mp4
Tokenizing Text
Turn raw text into tokens—the basic symbols your model understands.
▶︎ U02M02 Tokenizing text.mp4
Converting Tokens into Token IDs
Map tokens to integer IDs, the numeric form used by embeddings and models.
▶︎ U02M03 Converting tokens into token IDs.mp4
Adding Special Context Tokens
Insert markers like BOS/EOS and separators to give structure and intent to sequences.
▶︎ U02M04 Adding special context tokens.mp4
Byte Pair Encoding (BPE)
Learn subword tokenization to balance vocabulary size and coverage.
▶︎ U02M05 Byte pair encoding.mp4
Data Sampling with a Sliding Window
Build training sequences efficiently by sliding across long texts.
▶︎ U02M06 Data sampling with a sliding window.mp4
Creating Token Embeddings
Convert token IDs into dense vectors that capture meaning.
▶︎ U02M07 Creating token embeddings.mp4
Encoding Word Positions
Add positional information so the model knows where words occur.
▶︎ U02M08 Encoding word positions.mp4

Unit 3 — Attention Basics
#

Prerequisites to Chapter 3
What to expect before you implement attention mechanisms.
▶︎ U03M01 Prerequisites to Chapter 3.mp4
A Simple Self-Attention Mechanism (No Trainable Weights) — Part 1
Build intuition for how tokens attend to one another without jumping into full transformer math.
▶︎ U03M02 A simple self-attention mechanism without trainable weights Part 1.mp4

Unit 5 — Training & Text Generation
#

Prerequisites to Chapter 5
Scope, datasets, and what “training loop” really means here.
▶︎ U05M01 Prerequisites to Chapter 5.mp4
Using GPT to Generate Text
Wire up a generation function and produce your first outputs.
▶︎ U05M02 Using GPT to generate text.mp4
Text Generation Loss: Cross-Entropy & Perplexity
Measure how well the model predicts the next token, and interpret the metrics.
▶︎ U05M03 Calculating the text generation loss: cross entropy and perplexity.mp4
Training & Validation Losses
Track learning progress and catch overfitting early.
▶︎ U05M04 Calculating the training and validation set losses.mp4
Training an LLM
Put the pieces together: optimizer, batches, checkpoints, and sanity checks.
▶︎ U05M05 Training an LLM.mp4
Decoding Strategies to Control Randomness
Greedy, sampling, and friends—trade off diversity vs. determinism.
▶︎ U05M06 Decoding strategies to control randomness.mp4
Temperature Scaling
Tune output randomness with a single, powerful knob.
▶︎ U05M07 Temperature scaling.mp4
Top-k Sampling
Clip the candidate pool to the k most likely tokens for cleaner generations.
▶︎ U05M08 Top-k sampling.mp4
Modifying the Text Generation Function
Extend your generator to support new strategies and constraints.
▶︎ U05M09 Modifying the text generation function.mp4
Loading & Saving Model Weights in PyTorch
Serialize models cleanly; resume training or deploy for inference.
▶︎ U05M10 Loading and saving model weights in PyTorch.mp4
Loading Pretrained Weights from OpenAI
Plug in existing weights to compare, validate, or bootstrap experiments.
▶︎ U05M11 Loading pretrained weights from OpenAI.mp4

Unit 1 — Setup & Foundations#

Unit 2 — Tokenization & Data Pipeline#

Unit 3 — Attention Basics#

Unit 5 — Training & Text Generation#

Unit 1 — Setup & Foundations
#

Unit 2 — Tokenization & Data Pipeline
#

Unit 3 — Attention Basics
#

Unit 5 — Training & Text Generation
#