Explainer
Pretraining vs Fine-tuning
The two-stage recipe behind most chat models.
Pretraining is the expensive first stage: the model learns general language and world structure from a massive, mostly unlabeled corpus. This is where the bulk of the compute and cost goes.
Fine-tuning is a cheaper second stage that adapts the pretrained model to a narrower goal — following instructions, adopting a tone, or specializing in a domain. It uses far less data and compute.
The mental model: pretraining builds raw capability, fine-tuning shapes behavior. A base model and a chat model can share the same weights underneath and still feel completely different to use.