Bidirectional Evolutionary Search

7 MIN READ

Standard AI search is mathematically trapped inside its own probability shell. BES is the escape route.

Scaling up inference has a physics problem. Run a model 100 times and pick the best answer. You have 100 samples, but they all come from the same probability distribution. The search space never expands. You are rolling the same dice 100 times, hoping for a lucky face that might not exist at that weight.

This is the entropy shell: a mathematical boundary that confines autoregressive generation. Theorem 4.4a in the BES paper establishes that any expansion-only search is bounded within a typical set of size roughly exp(H_T), where H_T is the model's token-level entropy. MCTS, beam search, Best-of-N: none of them escape it. They explore inside the shell more efficiently, but they never break through.

**How BES breaks the shell.**

Bidirectional Evolutionary Search runs two independent processes at once. The forward search uses four genetic operators to recombine existing reasoning trajectories. Combination takes the suffix of one path and attaches it past a shared prefix. Deletion extracts and removes a specific interior segment that failed verification. Translocation transplants a reasoning step from one trajectory into a completely different lineage. Crossover splices the prefix of trajectory A with the tail of trajectory B.

These operations are not sampling. They produce hybrid candidates that would have near-zero probability under the original model distribution. Crossover of two individually unlikely trajectories creates a path that neither lineage could have reached alone. Theorem 4.4b formalizes this: under block total correlation, k-way blockwise evolution yields expected surprise strictly greater than H_T plus a positive correction term. The shell is not a soft barrier. BES proves it can be crossed.

**The backward direction.**

Waiting for a binary right/wrong signal at the end of a long proof is an almost useless training signal for the early steps. BES solves this by building a backward goal tree. Starting from the root goal, the system recursively decomposes it into sub-goals that each have a local verifier.

A node's score blends its own sub-goal verification with the scores of its children, weighted by a parameter alpha. The formula converts a multiplicative search problem (where every step must be correct for the final answer to pass) into an additive evidence collection problem. Partial credit appears immediately, so parent node selection has real gradient to work from, not just end-of-trajectory noise.

**What the benchmarks showed.**

Standard post-training algorithms like GRPO and MaxRL collapse on certain reasoning tasks. They reward-hack: the model learns to game the scoring signal rather than solve the underlying problem. On MuSiQue multi-hop reasoning and logical induction benchmarks, both algorithms show flat or negative accuracy change compared to the base model.

BES shows +3.0% on 3B-parameter models and +3.8% on 8B-parameter models with no reward hacking pathology. The gains are stable because BES is not trying to optimize a reward signal. It is running structured search over a space that standard algorithms never reach.

**What this means for test-time compute.**

The popular framing of test-time compute scaling treats inference budget as a dial. More compute means more samples or longer chains of thought. BES reframes the question. The bottleneck is not compute volume. It is whether the search process is capable of reaching candidates outside the training distribution.

Sampling more from the same distribution does not scale. Evolving across lineages does. The distinction matters because the next generation of reasoning systems will likely require solutions in probability regions no single model trajectory would land in on its own.

BES provides the first theorem-backed mechanism to guarantee that evolution escapes the entropy shell. It also provides the first practical framework where both directions (forward recombination, backward decomposition) work together on the same search problem.

The shell was always a ceiling. Now there is a way out.

Bidirectional Evolutionary Search

Related Reads

SkillOpt: Self-Evolving Agent Skills

Attention Is All You Need

BERT & GPT-1: The Fork