Introduction
Pre-Training
Step 1: Download and preprocess the internet
Step 2: Tokenization
Step 3: Neural network training
Step 4: Inference
Base model
Post-Training: Supervised Finetuning
Conversations
Hallucinations
Knowledge of Self
Models need tokens to think
Things the model cannot do well
Post-Training: Reinforcement Learning
Reinforcement learning
DeepSeek-R1
AlphaGo
Reinforcement learning from human feedback (RLHF)
Preview of things to come
Keeping track of LLMs
Where to find LLMs
🔍 Key Points
OpenAI does not disclose its LLM training method and model based on reinforcement learning (RL) .
On the other hand, DeepSeek has released a model (DR1) that applies reinforcement learning as open source , allowing researchers to utilize it directly.
This marks a turning point in accelerating reinforcement learning-based LLM research in the AI research community .
📈 Changes after applying reinforcement learning
Significantly improve your math problem-solving skills
Approached in various ways, accuracy increases gradually
The model forms its own inference process
🧐 "Wait a minute, let me check again."
🤔 "Let's try another way to verify that this approach is correct."
✅ "Now you can be sure of the answer!"
🤯 The key is for the model to learn human-like thought processes and naturally develop problem-solving strategies !
💻 Released as an open source model
Direct download and executable (⚠️ High-performance hardware required)
☁️ Cloud services available
DeepSeek Official Website
DeepSearch R1 can be run on Together.ai
🔬 Google's Gemini 2.0 Flash (Thinking Experimental) model also offers similar features
🎯 Which model should I use in which situation?
📚 General Knowledge Questions: Using Existing LLMs (⚡ Quick Answers)
🧠 Problems requiring math and logical thinking: Use reasoning models (📈 High accuracy)