Introduction
Pre-Training
Step 1: Download and preprocess the internet
Step 2: Tokenization
Step 3: Neural network training
Step 4: Inference
Base model
Post-Training: Supervised Finetuning
Conversations
Hallucinations
Knowledge of Self
Models need tokens to think
Things the model cannot do well
Post-Training: Reinforcement Learning
Reinforcement learning
DeepSeek-R1
AlphaGo
Reinforcement learning from human feedback (RLHF)
Preview of things to come
Keeping track of LLMs
Where to find LLMs
Large-scale language models (LLMs) are typically developed through three main learning stages, sequentially:
🔹 Training method : Learn Internet documents and predict the next word
🔹 Goal : Learn language patterns and acquire the ability to understand context
🔹 Characteristics
✅ Acquire vast knowledge (learn various topics)
✅ Can generate natural sentences
⚠️ However, simple Internet document prediction model → Low practicality
🔹 Training method :
Use human-written conversation datasets instead of Internet documents
A person provides a question (prompt) and an ideal answer (correct answer data).
🔹 Features
✅ More natural and useful conversations
✅ Improve the ability to perform specific tasks by learning answers from experts
⚠️ However, it is simple imitation learning, so it lacks the ability to solve new problems.
🔹 Training method :
How the model finds solutions on its own
The problem (prompt) and the answer (output value) are given, but the solution process is explored directly.
Using reinforcement learning techniques that reflect human feedback (RLHF)
🔹 Features
✅ Creative problem solving (flexible response to new questions)
✅ Reduced hallucination problems (prevents creation of false information)
⚠️ Training costs are high and takes a lot of time
Supervised learning (SFT) models simply mimic experts.
Reinforcement learning can provide more reliable answers and creative solutions
Companies like OpenAI operate specialized teams at each learning stage to improve the model.
1️⃣ Pre-training team → Learning from internet documents, building a basic model
2️⃣ Supervised learning fine-tuning team → Learning from data provided by humans, optimizing with AI assistant
3️⃣ Reinforcement learning team → Improving the model's response quality to make it more reliable
The training process for large-scale language models is similar to the way we study in school .
📖 Pre-training → Reading textbooks
The process by which students acquire background knowledge by reading textbooks
AI also learns language knowledge by learning Internet documents
📝 Fine-tuning supervised learning → Example solution
The process by which students learn problem-solving techniques by looking at model answers.
AI also learns from data provided by humans to produce better answers
🎯 Reinforcement Learning → Solving Practice Problems
The process in which students solve practice problems on their own and find solutions.
AI also tries multiple solutions to find the optimal answer.