[YouTube Lecture Summary] Andrej Karpathy - Deep Dive into LLMs like ChatGPT

Introduction

Pre-Training

Step 1: Download and preprocess the internet

Step 2: Tokenization

Step 3: Neural network training

Step 4: Inference

Base model

Post-Training: Supervised Finetuning

Conversations

Hallucinations

Knowledge of Self

Models need tokens to think

Things the model cannot do well

Post-Training: Reinforcement Learning

Reinforcement learning

DeepSeek-R1

AlphaGo

Reinforcement learning from human feedback (RLHF)

Preview of things to come

Keeping track of LLMs

Where to find LLMs

Post-Training: Supervised Finetuning

1️⃣ Limitations of the Base Model

  • The pretrained base model is simply a “text predictor” that has learned the statistical characteristics of Internet documents.

  • That is, if you ask a specific question, it is more likely to produce text similar to an Internet document rather than providing a meaningful answer .

  • Using these models directly is inefficient and may not work as desired.


2️⃣ The need for post-training

  • Post-training is needed to evolve it into an “AI assistant” that provides useful answers to user questions, rather than simply generating text .

  • This is the process of refining the base model so that it can be used as a conversational AI, rather than simply simulating documents .

  • Through post-training, the model is adjusted to provide more logical and consistent responses and behave in a way that matches human expectations .