Introduction
Pre-Training
Step 1: Download and preprocess the internet
Step 2: Tokenization
Step 3: Neural network training
Step 4: Inference
Base model
Post-Training: Supervised Finetuning
Conversations
Hallucinations
Knowledge of Self
Models need tokens to think
Things the model cannot do well
Post-Training: Reinforcement Learning
Reinforcement learning
DeepSeek-R1
AlphaGo
Reinforcement learning from human feedback (RLHF)
Preview of things to come
Keeping track of LLMs
Where to find LLMs
LLM is trained in a way that mimics the style of the training data.
For example, the question "Who is ~~?" always provides an answer that includes the correct answer .
For this reason, when the model is asked a question it doesn't know , it doesn't answer "I don't know" , but instead tries to
generate the most statistically plausible sentence .
Example: "Who is Orson Kovats?" → "He is an American writer" (Not true. Orson Kovats is a fictional name.)
Generate question-answer (QA) data
Select specific documents from the training data to generate a set of fact-based questions and answers .
Example: "What team did this person play for?" → "Buffalo Sabres"
Check if the model knows
Ask the same question three or more times and evaluate whether the model gets it right consistently.
✅ Consistently correct → model knows
❌ Wrong or changed answer → What the model doesn't know
Learning to answer "I don't know"
Add training data to collect questions the model gets wrong "I don’t know"
and answers them.
In this process, specific neurons are formed to handle uncertainty .
→ If the neuron value is high, the model will answer “I don’t know.”
Use web search features to directly search for information that the model does not know .
method :
When the model needs to search, it generates specific tokens .[SEARCH_START]
yes:[SEARCH_START] Orson Kovats [SEARCH_END]
Search engines (Bing, Google, etc.) perform searches and insert the results into the context pane .
The model uses the search results to generate the final answer.
Learning Process :
Add data so that the model can learn “when to search” and “how to search” .
The model performs well even with only a few thousand pieces of data