InstructGPT & Chinchilla: The Optimization
The two 2022 papers that proved raw scale was only half the answer.
By early 2022, GPT-3 had shown what a 175-billion-parameter model could do. It was also deeply frustrating. Ask it to write a recipe and it might write three more questions about recipes. Ask it to summarize a paragraph and it might keep writing tangentially related text. The model completed sequences. It did not understand instructions.
OpenAI's InstructGPT paper, published in January 2022, attacked that problem directly. The approach was called Reinforcement Learning from Human Feedback (RLHF). Human trainers wrote examples of good behavior. The model generated candidate responses. Human rankers compared outputs and scored them. That scoring data trained a separate reward model, which then guided the original LLM through reinforcement learning to produce outputs that scored higher. The result was a model that followed instructions and declined harmful requests. It was 100 times smaller than GPT-3. Human raters still preferred it.
That same month, DeepMind published Chinchilla. Its question was different: given a fixed compute budget, how do you split it between parameters and training data? The prevailing answer said parameters were the key lever. You trained a model as big as compute allowed, then found as much data as you could. Chinchilla ran systematic experiments and found the field had gotten this badly wrong. Models were trained on far too little data relative to their size. The optimal ratio was roughly 20 tokens per parameter. A 70-billion-parameter model needed 1.4 trillion tokens to be properly trained. Chinchilla, at 70 billion parameters, outperformed Gopher at 280 billion. Smaller, cheaper, and smarter.
The two papers together changed the frame. Scale still mattered, but the conversation shifted toward optimization: make a model that is right-sized for its training data and trained to follow human intent. ChatGPT, which launched in November 2022, visibly benefited from both. The optimization era had begun.