[YouTube Lecture Summary] Andrej Karpathy - Deep Dive into LLMs like ChatGPT

Introduction

Pre-Training

Step 1: Download and preprocess the internet

Step 2: Tokenization

Step 3: Neural network training

Step 4: Inference

Base model

Post-Training: Supervised Finetuning

Conversations

Hallucinations

Knowledge of Self

Models need tokens to think

Things the model cannot do well

Post-Training: Reinforcement Learning

Reinforcement learning

DeepSeek-R1

AlphaGo

Reinforcement learning from human feedback (RLHF)

Preview of things to come

Keeping track of LLMs

Where to find LLMs

Hallucinations

1. Why Hallucinations Occur

  • LLM is trained in a way that mimics the style of the training data.

  • For example, the question "Who is ~~?" always provides an answer that includes the correct answer .

  • For this reason, when the model is asked a question it doesn't know , it doesn't answer "I don't know" , but instead tries to
    generate the most statistically plausible sentence .

  • Example: "Who is Orson Kovats?" → "He is an American writer" (Not true. Orson Kovats is a fictional name.)


2. Solution for hallucinations

(1) Teaching the answer “I don’t know” ( see Meta paper )

  1. Generate question-answer (QA) data

    • Select specific documents from the training data to generate a set of fact-based questions and answers .

    • Example: "What team did this person play for?" → "Buffalo Sabres"

  2. Check if the model knows

    • Ask the same question three or more times and evaluate whether the model gets it right consistently.

    • Consistently correct → model knows

    • Wrong or changed answer → What the model doesn't know

  3. Learning to answer "I don't know"

    • Add training data to collect questions the model gets wrong "I don’t know"and answers them.

    • In this process, specific neurons are formed to handle uncertainty .

      → If the neuron value is high, the model will answer “I don’t know.”


(2) Added search function

  • Use web search features to directly search for information that the model does not know .

  • method :

    1. When the model needs to search, it generates specific tokens .[SEARCH_START]

      • yes:[SEARCH_START] Orson Kovats [SEARCH_END]

    2. Search engines (Bing, Google, etc.) perform searches and insert the results into the context pane .

    3. The model uses the search results to generate the final answer.

  • Learning Process :

    • Add data so that the model can learn “when to search” and “how to search” .

    • The model performs well even with only a few thousand pieces of data