Skip to content
Explainer

The Transformer Architecture

The design that made modern AI possible.

The transformer, introduced in 2017, replaced earlier recurrent designs with a mechanism called attention. Attention lets every token look at every other token in the input and decide which ones matter for predicting the next one.

Because attention processes the whole sequence in parallel rather than one step at a time, transformers train far more efficiently on modern hardware. That efficiency is a big part of why scaling them up was even feasible.

Almost every well-known model today — across text, images, and audio — is a transformer variant. The architecture has proven remarkably general.