[YouTube Lecture Summary] Andrej Karpathy - Deep Dive into LLMs like ChatGPT

Introduction

Pre-Training

Step 1: Download and preprocess the internet

Step 2: Tokenization

Step 3: Neural network training

Step 4: Inference

Base model

Post-Training: Supervised Finetuning

Conversations

Hallucinations

Knowledge of Self

Models need tokens to think

Things the model cannot do well

Post-Training: Reinforcement Learning

Reinforcement learning

DeepSeek-R1

AlphaGo

Reinforcement learning from human feedback (RLHF)

Preview of things to come

Keeping track of LLMs

Where to find LLMs

Conversations

The large language model (LLM) is pre-trained on Internet documents , but in this state, it is only a simple text predictor and cannot function as a natural conversational AI assistant .

To address this, we fine-tune the model using a new conversation dataset during the post-training process.


1. Conversation Data Creation Process

🔹 1.1. Structure of conversation data

The conversation dataset is basically structured in the form of “User Query → AI’s ideal response (Assistant Response).”

  • example:

    • User: "What is 2 + 2?"

    • Assistant: "2 + 2 is 4."

    • User: "What if it's '*' instead of '+'?"

    • Assistant: "2 × 2 is 4."


🔹 1.2. How data is generated

Conversational data is primarily generated by human labelers , although some automatic generation by AI has been used recently.

(1) Utilization of human labelers

  • Professional labelers create conversations directly .

    • Example: Programming questions are answered by developers, scientific questions by scientists.

  • Example conversation data:

    • "5 ways to regain passion for my career?"

    • "Translate the following sentence into Spanish."

    • "What are the five must-see landmarks in Paris?"

(2) Utilization of AI + Human Review (Synthetic Data + Human Review)

  • Nowadays, it is common for AI to first generate responses, and then labelers review and correct them .

  • Most modern conversational datasets are built from “synthetic data” generated by AI.

  • Example: Projects like OpenAssistant use crowdsourcing to allow users to create and review questions and answers.


2. Model Learning Process (Fine-Tuning on Conversations)

  • The conversation is converted into token sequences so that the model can understand it.

  • for example:

<|im_start|>user<|im_sep|> What are the top 5 must-see landmarks in Paris? <|im_end|><|im_start|> assistant <|im_sep|> 1. Eiffel Tower 2. Louvre 3. Notre Dame Cathedral 4. Champs-Elysees 5. Montmartre <|im_end|>

  • The model learns patterns based on this, allowing it to respond in a similar way in the next conversation.


3. ChatGPT's response principle

  • When a user asks a question, the model generates an answer by statistically predicting "how would the labeler answer this question?"

  • That is, rather than the AI ​​thinking on its own, it simulates the labeler's responses .

  • If you have similar questions in your training data, there is a good chance that you will generate nearly identical responses .

  • Even for questions that are not in the training data, it creates “similar-feeling answers” ​​based on existing knowledge .