InstructGPT & Chinchilla: The Optimization

5 MIN READ

The two 2022 papers that proved raw scale was only half the answer.

By early 2022, GPT-3 had shown what a 175-billion-parameter model could do. It was also deeply frustrating. Ask it to write a recipe and it might write three more questions about recipes. Ask it to summarize a paragraph and it might keep writing tangentially related text. The model completed sequences. It did not understand instructions.

OpenAI's InstructGPT paper, published in January 2022, attacked that problem directly. The approach was called Reinforcement Learning from Human Feedback (RLHF). Human trainers wrote examples of good behavior. The model generated candidate responses. Human rankers compared outputs and scored them. That scoring data trained a separate reward model, which then guided the original LLM through reinforcement learning to produce outputs that scored higher. The result was a model that followed instructions and declined harmful requests. It was 100 times smaller than GPT-3. Human raters still preferred it.

That same month, DeepMind published Chinchilla. Its question was different: given a fixed compute budget, how do you split it between parameters and training data? The prevailing answer said parameters (which represent the AI's "brain size") were the key lever. The common practice was to build as large a brain as possible, even if it had very little information to study. Chinchilla ran systematic experiments and found the field had gotten this badly wrong. Think of it as building a giant, empty brain that only reads a few books. Chinchilla proved it is much better to build a medium-sized brain that reads a massive, comprehensive library (training data). The optimal ratio was roughly 20 tokens (words/parts of words) per parameter. Chinchilla, with a smaller brain size of 70 billion parameters, easily outperformed Gopher at 280 billion because it had read much more data. It was smaller, cheaper, and smarter.

The two papers together changed the frame. Scale still mattered, but the conversation shifted toward optimization: make a model that is right-sized for its training data and trained to follow human intent. ChatGPT, which launched in November 2022, visibly benefited from both. The optimization era had begun.

InstructGPT & Chinchilla: The Optimization

Related Reads

SkillOpt: Self-Evolving Agent Skills

Attention Is All You Need

BERT & GPT-1: The Fork