Skip to content
Explainer

Pretraining vs Fine-tuning

The two-stage recipe behind most chat models.

Pretraining is the expensive first stage: the model learns general language and world structure from a massive, mostly unlabeled corpus. This is where the bulk of the compute and cost goes.

Fine-tuning is a cheaper second stage that adapts the pretrained model to a narrower goal — following instructions, adopting a tone, or specializing in a domain. It uses far less data and compute.

The mental model: pretraining builds raw capability, fine-tuning shapes behavior. A base model and a chat model can share the same weights underneath and still feel completely different to use.