[YouTube Lecture Summary] Andrej Karpathy - Deep Dive into LLMs like ChatGPT

Introduction

Pre-Training

Step 1: Download and preprocess the internet

Step 2: Tokenization

Step 3: Neural network training

Step 4: Inference

Base model

Post-Training: Supervised Finetuning

Conversations

Hallucinations

Knowledge of Self

Models need tokens to think

Things the model cannot do well

Post-Training: Reinforcement Learning

Reinforcement learning

DeepSeek-R1

AlphaGo

Reinforcement learning from human feedback (RLHF)

Preview of things to come

Keeping track of LLMs

Where to find LLMs

Reinforcement learning

📌 1. LLM (Large Language Model) problem solving method

Two factors to consider when solving a problem:

1️⃣ Deriving the correct answer – The top priority is to derive the correct answer
2️⃣ Readability and logical explanation – Explaining in a way that is easy for people to understand

From the perspective of deriving the correct answer, it is difficult for humans to define the optimal answer method in advance because the thinking methods of humans and LLMs are different
. Therefore, LLMs must find the most effective solution method on their own through reinforcement learning.


🔍 2. Finding the optimal solution using reinforcement learning

Reinforcement Learning (RL) is the process by which LLM learns the optimal answer on its own.
The basic learning method is as follows:

🔄 The process of reinforcement learning

1️⃣ The model solves problems in various ways and generates answers.
2️⃣ Evaluates correct answers (✅) and incorrect answers (❌).
3️⃣ Reinforces the solution (token sequence) that leads to the correct answer and adjusts to avoid incorrect ones.
4️⃣ This process is repeated thousands to millions of times to learn the optimal problem-solving pattern.

As a result, LLMs discover the most effective problem-solving methods through their own experimentation and experience .


📚 3. LLM Learning Process (Comparison with Human Learning Style)

The learning method of LLM is similar to the process by which a person acquires knowledge .

① Pre-training: Accumulate knowledge by learning large amounts of text data. Learn concepts and theories by reading textbooks.
② Supervised Fine-tuning (SFT): Learn from expert correct answers (worked solutions) and follow the example solutions provided by the teacher.
③ Reinforcement Learning (RL): Learn optimal solution methods by solving various problems directly. Develop problem-solving skills on your own by solving practice problems.

Supervised learning (SFT) alone allows LLM to simply mimic the correct answer without understanding it deeply.
Through reinforcement learning, it acquires the ability to find the optimal solution on its own.