The Reasoning Era
How DeepSeek-R1 and s1 moved the AI frontier from training compute to thinking time.
The scaling laws of 2020 made a bold promise. Train bigger models on bigger datasets. Watch intelligence compound. Every lab in the world took this as the operating plan.
By 2024, the walls had appeared. Pre-training data was running thin. The smartest models were hitting diminishing returns. More compute yielded less improvement. The growth machine was stalling.
Then DeepSeek-R1 and s1 arrived in early 2025 and reframed the question.
Both papers asked the same thing differently: what if intelligence was not about how much you train but about how long you think? DeepSeek-R1 came from a Chinese lab and used reinforcement learning to teach a model to reason through problems before answering. s1 came from Stanford and was almost absurdly simple. Take a base model. Force it to spend more tokens thinking. Watch it outperform models ten times its size.
The insight behind both papers is something psychologist Daniel Kahneman described decades ago. System 1 thinking is fast. You recognize a face instantly. System 2 thinking is slow. You work through a problem step by step, checking as you go. Large language models had been optimized for System 1 at scale. DeepSeek-R1 and s1 were about teaching System 2.
In DeepSeek-R1, the model is trained using reinforcement learning. It generates a chain of intermediate reasoning steps before committing to an answer. The reward signal is simple: did the final answer turn out to be correct? Nothing specifies how to think. Only correct thinking gets reinforced. Over time, the model develops an internal monologue. It backtracks. It catches errors before they reach the output. It rewrites its own reasoning when something feels wrong.
s1 used a lighter approach. A Stanford team collected 1,000 hard problems with detailed reasoning traces. They fine-tuned a small model on those examples. Then they added one technique called budget forcing: if the model tried to stop reasoning too early, a token was injected that said "wait" and forced it to keep thinking. With 32 seconds of inference-time compute per problem, s1 matched or beat OpenAI's o1-preview on math benchmarks. The entire experiment cost about $50 in cloud compute.
This is the shift. Pre-training hits a data wall because there is only so much text in existence. Inference-time compute has no such limit. You can always let a model think longer. Compute applied at test time is cheaper per unit, faster to iterate on, and scales with the difficulty of the problem rather than the size of the model. The scaling laws are not dead. They moved from training to thinking.