[YouTube Lecture Summary] Andrej Karpathy - Deep Dive into LLMs like ChatGPT

Introduction

Pre-Training

Step 1: Download and preprocess the internet

Step 2: Tokenization

Step 3: Neural network training

Step 4: Inference

Base model

Post-Training: Supervised Finetuning

Conversations

Hallucinations

Knowledge of Self

Models need tokens to think

Things the model cannot do well

Post-Training: Reinforcement Learning

Reinforcement learning

DeepSeek-R1

AlphaGo

Reinforcement learning from human feedback (RLHF)

Preview of things to come

Keeping track of LLMs

Where to find LLMs

Post-Training: Reinforcement Learning

1. Learning process of large-scale language models

Large-scale language models (LLMs) are typically developed through three main learning stages, sequentially:

📌 1.1 Pre-training – Building the base model

🔹 Training method : Learn Internet documents and predict the next word
🔹 Goal : Learn language patterns and acquire the ability to understand context
🔹 Characteristics
✅ Acquire vast knowledge (learn various topics)
✅ Can generate natural sentences
⚠️ However, simple Internet document prediction model → Low practicality


📌 1.2 Fine-tuning of Supervised Learning (SFT) – Evolving into an AI Assistant

🔹 Training method :

  • Use human-written conversation datasets instead of Internet documents

  • A person provides a question (prompt) and an ideal answer (correct answer data).

🔹 Features
✅ More natural and useful conversations
✅ Improve the ability to perform specific tasks by learning answers from experts
⚠️ However, it is simple imitation learning, so it lacks the ability to solve new problems.


📌 1.3 Reinforcement Learning (RL) – Improving Reliability and Optimizing

🔹 Training method :

  • How the model finds solutions on its own

  • The problem (prompt) and the answer (output value) are given, but the solution process is explored directly.

  • Using reinforcement learning techniques that reflect human feedback (RLHF)

🔹 Features
✅ Creative problem solving (flexible response to new questions)
✅ Reduced hallucination problems (prevents creation of false information)
⚠️ Training costs are high and takes a lot of time


2. Necessity of reinforcement learning and corporate use

📌 2.1 Why reinforcement learning is needed

  • Supervised learning (SFT) models simply mimic experts.

  • Reinforcement learning can provide more reliable answers and creative solutions


📌 2.2 Use in Enterprise (e.g. OpenAI)

Companies like OpenAI operate specialized teams at each learning stage to improve the model.

1️⃣ Pre-training team → Learning from internet documents, building a basic model
2️⃣ Supervised learning fine-tuning team → Learning from data provided by humans, optimizing with AI assistant
3️⃣ Reinforcement learning team → Improving the model's response quality to make it more reliable


3. Textbook Example: Metaphor of the Learning Process

The training process for large-scale language models is similar to the way we study in school .

📖 Pre-training → Reading textbooks

  • The process by which students acquire background knowledge by reading textbooks

  • AI also learns language knowledge by learning Internet documents

📝 Fine-tuning supervised learning → Example solution

  • The process by which students learn problem-solving techniques by looking at model answers.

  • AI also learns from data provided by humans to produce better answers

🎯 Reinforcement Learning → Solving Practice Problems

  • The process in which students solve practice problems on their own and find solutions.

  • AI also tries multiple solutions to find the optimal answer.