[YouTube Lecture Summary] Andrej Karpathy - Deep Dive into LLMs like ChatGPT

Introduction

Pre-Training

Step 1: Download and preprocess the internet

Step 2: Tokenization

Step 3: Neural network training

Step 4: Inference

Base model

Post-Training: Supervised Finetuning

Conversations

Hallucinations

Knowledge of Self

Models need tokens to think

Things the model cannot do well

Post-Training: Reinforcement Learning

Reinforcement learning

DeepSeek-R1

AlphaGo

Reinforcement learning from human feedback (RLHF)

Preview of things to come

Keeping track of LLMs

Where to find LLMs

Models need tokens to think

1. Basic operation structure of AI model

✔ AI models think by generating words (tokens) sequentially from left to right
. ✔ The amount of calculations that can be processed in one token generation is limited.
✔ In other words, if you try to solve a complex problem at once, the accuracy is likely to decrease .


2. Good Answers vs. Bad Answers (Example Math Problem)

💡 Problem :
Emily bought 3 apples and 2 oranges. Each orange costs $2, and the total cost is $13. What is the cost of one apple?

🚫 Bad answers ("telling the right answer")

"The answer is 3"

🔴 Reason:

  • The model must perform all calculations at once, resulting in computational overhead.

  • The likelihood of getting the wrong answer increases in complex problems.

✅ Good answer ("with step-by-step calculation process")

"The price of two oranges is $4. Subtract $4 from the total price and you get $9. Since the price of three apples is $3 for one apple."

🟢 Reason:

  • Learning effectiveness is improved by encouraging the model to think step by step .

  • A way for models to help solve complex problems logically.


3. Why models have difficulty with complex calculations

🚨 The model cannot perform too many calculations in one operation (token prediction) .
📉 As the number gets larger, the possibility of a wrong answer increases.

Solution

  • Prompts you to generate answers that include a step-by-step calculation process

  • Guide to a logical approach, including intermediate results


cf) You can get more accurate answers by using the code execution function.

🤖 The computational power of AI models is limited, but their ability to write code is excellent.
💡 More accurate calculations are possible by utilizing programming languages ​​such as Python.

Example :
"Write a Python code that calculates the price of an apple"
➡ The model can run price = (13 - 2*2) / 3the same code and come up with the correct answer .

📌 Conclusion :

  • Accuracy increases when the model uses the code execution feature instead of calculating it directly.

  • For complex computational problems, it is recommended to actively utilize Python code execution.