Introduction
Pre-Training
Step 1: Download and preprocess the internet
Step 2: Tokenization
Step 3: Neural network training
Step 4: Inference
Base model
Post-Training: Supervised Finetuning
Conversations
Hallucinations
Knowledge of Self
Models need tokens to think
Things the model cannot do well
Post-Training: Reinforcement Learning
Reinforcement learning
DeepSeek-R1
AlphaGo
Reinforcement learning from human feedback (RLHF)
Preview of things to come
Keeping track of LLMs
Where to find LLMs
Two factors to consider when solving a problem:
1️⃣ Deriving the correct answer – The top priority is to derive the correct answer
2️⃣ Readability and logical explanation – Explaining in a way that is easy for people to understand
From the perspective of deriving the correct answer, it is difficult for humans to define the optimal answer method in advance because the thinking methods of humans and LLMs are different
. Therefore, LLMs must find the most effective solution method on their own through reinforcement learning.
Reinforcement Learning (RL) is the process by which LLM learns the optimal answer on its own.
The basic learning method is as follows:
1️⃣ The model solves problems in various ways and generates answers.
2️⃣ Evaluates correct answers (✅) and incorrect answers (❌).
3️⃣ Reinforces the solution (token sequence) that leads to the correct answer and adjusts to avoid incorrect ones.
4️⃣ This process is repeated thousands to millions of times to learn the optimal problem-solving pattern.
As a result, LLMs discover the most effective problem-solving methods through their own experimentation and experience .
The learning method of LLM is similar to the process by which a person acquires knowledge .
① Pre-training: Accumulate knowledge by learning large amounts of text data. Learn concepts and theories by reading textbooks.
② Supervised Fine-tuning (SFT): Learn from expert correct answers (worked solutions) and follow the example solutions provided by the teacher.
③ Reinforcement Learning (RL): Learn optimal solution methods by solving various problems directly. Develop problem-solving skills on your own by solving practice problems.
Supervised learning (SFT) alone allows LLM to simply mimic the correct answer without understanding it deeply.
Through reinforcement learning, it acquires the ability to find the optimal solution on its own.