This blog is a treasure trove for the curious mind—an open classroom where modern artificial intelligence is unpacked, explained, and built piece by piece. It contains not only the complete text of Build a Large Language Model from Scratch but also an accompanying video series in which the author walks through every chapter step by step, explaining the code and concepts in real time.
Build a Large Language Model from Scratch offers a practical, hands-on exploration of how today’s AI systems like GPT actually work—layer by layer, tensor by tensor. It leads you from raw text and tokenization through attention mechanisms, training loops, and text generation, grounding every idea in both mathematics and executable code. Instead of treating large language models as mysterious black boxes, this work illuminates their inner workings with clarity and rigor. By the end, you’ll not only understand how an LLM thinks—you’ll have built one yourself.
Read the book — Build a Large Language Model From Scratch (PDF)
Below is a carefully organized, lesson-by-lesson index of the LLM From Scratch video series. Each link opens the corresponding MP4 directly so you can follow along from anywhere.
Tip: The videos are grouped by chapter (Unit). Begin with Unit 1 for setup and foundations, explore tokenization in Unit 2, dive into attention in Unit 3, and finish strong with training and text generation in Unit 5.
Unit 1 — Setup & Foundations#
Python Environment Setup
Prepare your workstation and Python toolchain so all later notebooks and scripts run without fuss.
▶︎U01M01 Python Environment Setup Video.mp4Foundations to Build a Large Language Model (From Scratch)
Big-picture tour of what an LLM is, the building blocks you’ll implement, and how the pieces fit.
▶︎U01M02 Foundations to Build a Large Language Model (From Scratch).mp4
Unit 2 — Tokenization & Data Pipeline#
Prerequisites to Chapter 2
Short preface on goals and required background for the tokenization chapter.
▶︎U02M01 Prerequisites to Chapter 2.mp4Tokenizing Text
Turn raw text into tokens—the basic symbols your model understands.
▶︎U02M02 Tokenizing text.mp4Converting Tokens into Token IDs
Map tokens to integer IDs, the numeric form used by embeddings and models.
▶︎U02M03 Converting tokens into token IDs.mp4Adding Special Context Tokens
Insert markers like BOS/EOS and separators to give structure and intent to sequences.
▶︎U02M04 Adding special context tokens.mp4Byte Pair Encoding (BPE)
Learn subword tokenization to balance vocabulary size and coverage.
▶︎U02M05 Byte pair encoding.mp4Data Sampling with a Sliding Window
Build training sequences efficiently by sliding across long texts.
▶︎U02M06 Data sampling with a sliding window.mp4Creating Token Embeddings
Convert token IDs into dense vectors that capture meaning.
▶︎U02M07 Creating token embeddings.mp4Encoding Word Positions
Add positional information so the model knows where words occur.
▶︎U02M08 Encoding word positions.mp4
Unit 3 — Attention Basics#
Prerequisites to Chapter 3
What to expect before you implement attention mechanisms.
▶︎U03M01 Prerequisites to Chapter 3.mp4A Simple Self-Attention Mechanism (No Trainable Weights) — Part 1
Build intuition for how tokens attend to one another without jumping into full transformer math.
▶︎U03M02 A simple self-attention mechanism without trainable weights Part 1.mp4
Unit 5 — Training & Text Generation#
Prerequisites to Chapter 5
Scope, datasets, and what “training loop” really means here.
▶︎U05M01 Prerequisites to Chapter 5.mp4Using GPT to Generate Text
Wire up a generation function and produce your first outputs.
▶︎U05M02 Using GPT to generate text.mp4Text Generation Loss: Cross-Entropy & Perplexity
Measure how well the model predicts the next token, and interpret the metrics.
▶︎U05M03 Calculating the text generation loss: cross entropy and perplexity.mp4Training & Validation Losses
Track learning progress and catch overfitting early.
▶︎U05M04 Calculating the training and validation set losses.mp4Training an LLM
Put the pieces together: optimizer, batches, checkpoints, and sanity checks.
▶︎U05M05 Training an LLM.mp4Decoding Strategies to Control Randomness
Greedy, sampling, and friends—trade off diversity vs. determinism.
▶︎U05M06 Decoding strategies to control randomness.mp4Temperature Scaling
Tune output randomness with a single, powerful knob.
▶︎U05M07 Temperature scaling.mp4Top-k Sampling
Clip the candidate pool to the k most likely tokens for cleaner generations.
▶︎U05M08 Top-k sampling.mp4Modifying the Text Generation Function
Extend your generator to support new strategies and constraints.
▶︎U05M09 Modifying the text generation function.mp4Loading & Saving Model Weights in PyTorch
Serialize models cleanly; resume training or deploy for inference.
▶︎U05M10 Loading and saving model weights in PyTorch.mp4Loading Pretrained Weights from OpenAI
Plug in existing weights to compare, validate, or bootstrap experiments.
▶︎U05M11 Loading pretrained weights from OpenAI.mp4
