Mastermind: No Strategy Is Best

March 27, 2026

What's the best strategy for Mastermind? The question has a clean answer: there isn't one. Three reasonable strategies form a Pareto frontier, and which you prefer depends entirely on what you're optimizing for. This post works through the strategies, shows why they're non-dominated, and connects the whole setup to Fallout 3's terminal hacking minigame.

The game

Mastermind is a two-player code-breaking game from 1970. The codemaker picks a secret: 4 positions, each one of 6 colors, repetition allowed. That's $6^4 = 1296$ possible codes.

The codebreaker guesses a code and receives two numbers in response:

Keep guessing until you hit (4, 0) -- four exact matches, zero misplaced. The question is how many guesses that takes, averaged over all secrets, in the worst case, or both.

The feedback function is the core of the game. Two codes $g$ (guess) and $s$ (secret) produce:

exact(g,s)=i=141[gi=si]\text{exact}(g, s) = \sum_{i=1}^{4} \mathbb{1}[g_i = s_i] color(g,s)=c=16min(count(c,g),count(c,s))exact(g,s)\text{color}(g, s) = \sum_{c=1}^{6} \min(\text{count}(c, g),\, \text{count}(c, s)) - \text{exact}(g, s)

In Python:

from collections import Counter

def grade(guess, secret):
    exact = sum(g == s for g, s in zip(guess, secret))
    g_counts = Counter(guess)
    s_counts = Counter(secret)
    total_color = sum(min(g_counts[c], s_counts[c]) for c in g_counts)
    color = total_color - exact
    return (exact, color)

The 14 possible feedback values (pairs $(e, c)$ with $e + c \leq 4$ , excluding the impossible $(3, 1)$) partition the candidate set after each guess.

Strategy 1: Random consistent

After each guess and response, eliminate all candidates inconsistent with the feedback. Pick the next guess uniformly at random from the remaining candidates.

import random

def random_consistent(candidates):
    return random.choice(candidates)

This is fast -- no lookahead, no partition computation. In expectation it works fine: the average over all secrets is around 5 guesses. But "in expectation" hides the variance. You can get unlucky. The worst case is unbounded in the limit; in practice with the 1296-code space you'll occasionally need 8 or more guesses.

The strategy has no memory of how it eliminated candidates, only that it did. Each new guess is chosen purely to be consistent with all previous responses, not chosen to be maximally informative.

Strategy 2: Minimax

Minimax, due to Knuth (1977)[1], asks: for each candidate guess $g$ , what is the worst-case number of candidates that could remain after revealing feedback?

Formally, for a candidate set $S$ and a guess $g$ , the feedback partitions $S$ into groups $S_{(e,c)} = \{s \in S : \text{grade}(g, s) = (e, c)\}$ . The worst-case partition size for $g$ is $\max_{(e,c)} |S_{(e,c)}|$ . Minimax picks the guess that minimizes this maximum:

g=argmingGmax(e,c)S(e,c)g^* = \arg\min_{g \in \mathcal{G}} \max_{(e,c)} |S_{(e,c)}|

where $\mathcal{G}$ is the full code space (not just remaining candidates -- sometimes a code already eliminated is the best partition probe).

from itertools import product

ALL_CODES = list(product(range(6), repeat=4))

def minimax_guess(candidates):
    best_guess = None
    best_max_partition = float('inf')
    for g in ALL_CODES:
        partitions = Counter(grade(g, s) for s in candidates)
        max_partition = max(partitions.values())
        if max_partition < best_max_partition:
            best_max_partition = max_partition
            best_guess = g
    return best_guess

Knuth showed this solves any code in at most 5 guesses. The optimal first guess under minimax is 1122 (colors 1, 1, 2, 2 in some labeling), which reduces the 1296-code space to at most 256 candidates regardless of the response.

The cost: evaluating all 1296 possible guesses against all remaining candidates is $O(1296 \cdot |S| \cdot 4)$ per turn. Manageable for the original game, expensive if you want to run this at scale or in real time.

Strategy 3: Entropy maximizer

Instead of minimizing the worst case, maximize expected information gain. Shannon entropy of the partition distribution is:

H(g,S)=(e,c)p(e,c)log2p(e,c)H(g, S) = -\sum_{(e,c)} p_{(e,c)} \log_2 p_{(e,c)}

where $p_{(e,c)} = |S_{(e,c)}| / |S|$ . Pick the guess that maximizes $H$ :

from math import log2

def entropy_guess(candidates):
    best_guess = None
    best_entropy = -1.0
    for g in ALL_CODES:
        partitions = Counter(grade(g, s) for s in candidates)
        n = len(candidates)
        h = -sum((k/n) * log2(k/n) for k in partitions.values())
        if h > best_entropy:
            best_entropy = h
            best_guess = g
    return best_guess

Maximizing entropy means maximizing the expected log-reduction in candidates. If the partition is perfectly uniform across all 14 feedback values, entropy is $\log_2 14 \approx 3.81$ bits per guess. In practice the distribution is never uniform, but entropy maximization pushes toward even splits.

This tends to produce a lower average number of guesses than minimax -- it's better calibrated to the typical case. But "better on average" is not "better everywhere." Entropy maximization can occasionally leave a larger worst-case partition than minimax does, because it's optimizing the expectation, not the tail.

Why there's no single best strategy

These three strategies trace a curve in the (average guesses, worst-case guesses) plane. Roughly:

Strategy Avg guesses Worst case
Random consistent ~5.0 8+
Entropy maximizer ~4.4 6
Minimax ~4.5 5

(Exact values depend on tie-breaking rules and the first guess.)

A strategy $P$ dominates $Q$ if $P.\text{avg} \leq Q.\text{avg}$ and $P.\text{worst} \leq Q.\text{worst}$ , with at least one strict inequality. On the Pareto frontier, no strategy dominates another.

Minimax and entropy maximizer are non-dominated: minimax wins on worst case, entropy wins on average. You can compute the full frontier explicitly by sweeping a family of mixed or parameterized strategies.

def pareto_frontier(strategies):
    """
    strategies: list of (avg, worst, name)
    Returns the non-dominated subset.
    """
    frontier = []
    for s in strategies:
        dominated = any(
            (t[0] <= s[0] and t[1] <= s[1]) and (t[0] < s[0] or t[1] < s[1])
            for t in strategies if t is not s
        )
        if not dominated:
            frontier.append(s)
    return sorted(frontier, key=lambda s: s[0])

The Pareto frontier makes the tradeoff precise: you can't improve worst-case performance without sacrificing average performance, or vice versa, once you're on the frontier. Random consistent isn't on the frontier -- both minimax and entropy dominate it on worst case.

Why does the tradeoff exist at all? Entropy maximization splits the expected remaining candidates most evenly. But some feedback responses (like (0, 0)) can still land on a large partition if the guess happens to share no colors with the secret. Minimax guards against exactly this by refusing to let any partition grow too large, at the cost of sometimes making a guess that's suboptimal on average.

The Fallout 3 connection

Fallout 3's terminal hacking minigame is Mastermind in disguise. The setup: a list of words of equal length, one of which is the password. You guess a word and the terminal reports how many letters are in the correct position. That's it -- exact matches only, no "color" component.

Structurally this is a restriction of Mastermind: the alphabet is 26 letters, words are drawn from a fixed vocabulary (not all $26^n$ possibilities), and feedback is only the exact-match count. The constraint-satisfaction structure is identical. After each guess, eliminate all words inconsistent with the reported match count. The remaining candidates form the search space.

Because the vocabulary is small (typically 5--12 words) and words are long (typically 8--12 characters), information per guess is high. A single guess usually cuts the candidate set drastically. But the core question remains the same: given the remaining candidates, which word is the best probe?

The minimax answer is: guess the word that minimizes the maximum candidates remaining after any possible match-count response. For small word lists this can be computed by inspection. For a list like [ENTRY, GRATE, CRANE, RAVEN, EARTH], you can score each word as a guess by computing the partition sizes across all possible responses $(0, 1, 2, \ldots, n)$ and pick the one with the smallest maximum.

The entropy answer is: guess the word whose match-count distribution is most uniform. Same computation, different objective.

In practice the word lists are small enough that random-consistent works fine -- the game gives you 4 attempts (with a reset trick), and vocabulary size is 5--12, so even random selection terminates quickly. But the structure is identical to the full Mastermind problem, just restricted to a slice of the feedback space.

Implementation notes

A few details matter when implementing Mastermind solvers:

The first guess. For minimax, 1122 is the canonical optimal opener: it partitions the 1296 codes into at most 256, with an average partition size of about 91. For entropy, 1123 (three distinct colors, one repeat) does slightly better on average entropy. Neither is uniquely optimal -- several first guesses tie.

Guesses outside the candidate set. Minimax and entropy both sometimes select guesses from the full 1296-code space rather than just remaining candidates. A guess outside the candidate set can't be the secret, but it can partition the remaining candidates more efficiently. This is standard and correct -- the solver is choosing the best probe, not the best guess subject to it being consistent.

Tie-breaking. When multiple guesses tie on the objective, prefer candidates in the consistent set (they can be the secret), then break ties arbitrarily. This doesn't affect theoretical bounds but matters for reproducibility.

Complexity. Each turn is $O(|\mathcal{G}| \cdot |S|)$ where $|\mathcal{G}| = 1296$ and $|S|$ shrinks quickly (from 1296 to ~256 to ~50 to ~10 ...). Total work per game is dominated by the first two turns and is fast enough for real-time use.


  1. Knuth proved that the minimax strategy solves any code in at most 5 guesses, and that no strategy can guarantee 4 guesses for all 1296 codes. The five-guess bound is tight. D. E. Knuth, "The computer as master mind," Journal of Recreational Mathematics, 9(1), 1977. ↩︎