Introduction
Pre-Training
Step 1: Download and preprocess the internet
Step 2: Tokenization
Step 3: Neural network training
Step 4: Inference
Base model
Post-Training: Supervised Finetuning
Conversations
Hallucinations
Knowledge of Self
Models need tokens to think
Things the model cannot do well
Post-Training: Reinforcement Learning
Reinforcement learning
DeepSeek-R1
AlphaGo
Reinforcement learning from human feedback (RLHF)
Preview of things to come
Keeping track of LLMs
Where to find LLMs
The pretrained base model is simply a “text predictor” that has learned the statistical characteristics of Internet documents.
That is, if you ask a specific question, it is more likely to produce text similar to an Internet document rather than providing a meaningful answer .
Using these models directly is inefficient and may not work as desired.
Post-training is needed to evolve it into an “AI assistant” that provides useful answers to user questions, rather than simply generating text .
This is the process of refining the base model so that it can be used as a conversational AI, rather than simply simulating documents .
Through post-training, the model is adjusted to provide more logical and consistent responses and behave in a way that matches human expectations .