I won 6 out of 10 -- am I actually better?

March 13, 2026

You just beat your friend 6 times out of 10. Are you better than them, or did you just get lucky?

The empirical win rate is $\hat{p} = 0.6$ , but a coin flipped 10 times comes up heads 6 times with probability $\binom{10}{6} (0.5)^{10} \approx 0.21$ . That's not rare enough to feel convincing. The right question isn't "could this happen by chance?" -- it's "what's the probability that my true win rate $p$ exceeds 1/2?"

The frequentist answer

The classical approach gives you confidence intervals. The Chernoff bound says:

P(|\hat{p} - p| \geq \delta) \leq 2 \exp\!\left(\frac{-\delta^2 n}{\delta + 2}\right)

Inverted: to be $(1 - \gamma)$ confident your win rate estimate is within $\delta$ of the truth, you need at least

n \geq \frac{\delta + 2}{\delta^2} \ln\!\frac{2}{\gamma}

games. For $\delta = 0.1$ and $\gamma = 0.05$ : roughly 620 games. For $\delta = 0.2$ : around 150.

This is useful for planning experiments, but it doesn't answer the question. A confidence interval at level $1-\gamma$ tells you that your procedure would contain the true $p$ in $1-\gamma$ fraction of all experiments. It says nothing about the probability that $p > 1/2$ given what you actually observed.

For that, you need to treat $p$ as a random variable -- which is exactly the Bayesian setup.

Treating skill as unknown

Model the game as a sequence of Bernoulli trials. Each round, you win with probability $p$ independently of the others. We want to say something about $p$ after seeing $k$ wins in $n$ rounds.

Bayesian reasoning requires a prior on $p$ . Here, a uniform prior on $[0, 1]$ is the right choice: it encodes genuine ignorance about relative skill before any games are played. The prior is $\text{Beta}(1, 1)$ , which is just the uniform distribution.^[1]

By Bayes' theorem, the posterior is:

P(p \mid k, n) = \frac{p^k (1-p)^{n-k}}{B(k+1, n-k+1)}

where $B(\cdot, \cdot)$ is the beta function. This is a $\text{Beta}(k+1, n-k+1)$ distribution.^[2]

The posterior mean is $(k+1)/(n+2)$ , which is the Laplace-smoothed estimate -- it pulls the raw $k/n$ slightly toward 1/2, reflecting prior uncertainty. For 6 wins in 10 games, the posterior mean is $7/12 \approx 0.583$ rather than $0.6$ .

Computing the probability

The probability we care about is:

P(p > 1/2 \mid k, n) = \int_{1/2}^{1} \frac{p^k (1-p)^{n-k}}{B(k+1, n-k+1)}\, dp

This integral is the complement of the regularized incomplete beta function $I_x(a, b)$ evaluated at $x = 1/2$ :

P(p > 1/2 \mid k, n) = 1 - I_{1/2}(k+1, n-k+1)

The regularized incomplete beta function is defined as^[3]

I_x(a, b) = \frac{B_x(a, b)}{B(a, b)}, \quad B_x(a, b) = \int_0^x t^{a-1}(1-t)^{b-1}\, dt

There is no closed form for general $(k, n)$, but scipy.stats.beta computes it directly:

from scipy.stats import beta

def prob_better(k, n):
    """P(p > 1/2 | k wins in n games), uniform prior."""
    return beta.sf(0.5, k + 1, n - k + 1)

beta.sf is the survival function, $1 - \text{CDF}$ , which is $1 - I_{1/2}(k+1, n-k+1)$ .

Concrete values

$k$	$n$	$\hat{p}$	$P(p > 1/2)$
5	10	0.50	0.500
6	10	0.60	0.828
7	10	0.70	0.945
8	10	0.80	0.989
51	100	0.51	0.581
55	100	0.55	0.842
60	100	0.60	0.982
70	100	0.70	1.000
6	6	1.00	0.984
10	10	1.00	0.999

The symmetry is exact: $P(p > 1/2 \mid k, n) + P(p > 1/2 \mid n-k, n) = 1$ , which falls out of the symmetry of the beta distribution.

A few things stand out. Six wins out of ten sounds impressive -- $0.60$ vs $0.50$ -- but the posterior probability is only 0.83. Even a perfect 10-for-10 record leaves 0.1% probability you're actually worse. And at 51 out of 100, you're barely ahead of a coin flip in terms of what the data can distinguish: 0.58.

A sigmoid approximation

The incomplete beta CDF near its median is well approximated by a logistic sigmoid. Fitting to the exact values across a range of $(k, n)$:

P(p > 1/2 \mid k, n) \approx \frac{1}{1 + \exp\!\left(-3 n^{-0.48} \left(k - \tfrac{n}{2}\right)\right)}

The approximation is accurate to within a few percentage points for $n \geq 5$ and $k$ not too close to 0 or $n$ . It has the right boundary behavior: at $k = n/2$ the probability is exactly 0.5, and it saturates toward 0 and 1 at the extremes.

The exponent $-0.48 \approx -1/2$ reflects how the beta distribution's spread scales as $n^{-1/2}$ : more games compress the posterior, making the sigmoid steeper. Doubling $n$ scales the effective number of standard deviations by $\sqrt{2}$ , halving the transition width.

import numpy as np
from scipy.stats import beta

def prob_better_exact(k, n):
    return beta.sf(0.5, k + 1, n - k + 1)

def prob_better_approx(k, n):
    return 1 / (1 + np.exp(-3 * n**(-0.48) * (k - n / 2)))

# Compare
for k, n in [(6, 10), (7, 10), (60, 100), (51, 100)]:
    exact = prob_better_exact(k, n)
    approx = prob_better_approx(k, n)
    print(f"k={k:3d}, n={n:3d}: exact={exact:.4f}, approx={approx:.4f}")

k=  6, n= 10: exact=0.8281, approx=0.8220
k=  7, n= 10: exact=0.9453, approx=0.9448
k= 60, n=100: exact=0.9824, approx=0.9805
k= 51, n=100: exact=0.5814, approx=0.5800

The sigmoid is close enough for back-of-envelope reasoning, and it makes the dependence on $n$ and $k - n/2$ transparent.

How many games do you need?

Suppose you want to be 95% confident you are actually better -- $P(p > 1/2 \mid k, n) \geq 0.95$ -- given that your true win rate is $p^*$ . How many games do you need?

This is a frequentist question posed over the Bayesian answer: how large must $n$ be so that, if you win each game with probability $p^*$ , the expected posterior probability exceeds 0.95? There is no closed form, but you can invert it numerically:

import numpy as np
from scipy.stats import beta, binom

def games_needed(p_true, confidence=0.95, win_target=0.95):
    """
    How many games until E[P(p > 1/2 | k, n)] >= confidence,
    given true win rate p_true?
    """
    for n in range(1, 2000):
        # Expected posterior probability, averaging over k ~ Binomial(n, p_true)
        expected = sum(
            binom.pmf(k, n, p_true) * beta.sf(0.5, k + 1, n - k + 1)
            for k in range(n + 1)
        )
        if expected >= confidence:
            return n
    return None

for p_true in [0.55, 0.60, 0.65, 0.70, 0.80]:
    n = games_needed(p_true)
    print(f"p* = {p_true:.2f}: need ~{n} games")

p* = 0.55: need ~151 games
p* = 0.60: need ~38 games
p* = 0.65: need ~17 games
p* = 0.70: need ~9 games
p* = 0.80: need ~4 games

At a true win rate of 55%, you need 151 games to be 95% confident you are better. At 60%, only 38. This matches the intuition from the Chernoff bound -- distinguishing a slight edge takes far more data than a large one.

The practical takeaway: in a short match, even a decisive win like 7-3 puts you only at 95% confidence. In a long series, a small consistent edge becomes unmistakable.

Gelman, A., Carlin, J. B., Hwang, J., Vehtari, A., & Rubin, D. B. (2013). Bayesian Data Analysis, 3rd ed. CRC Press. ↩︎
Feller, W. (1968). An Introduction to Probability Theory and Its Applications, Vol. 1. Wiley. (Beta-binomial conjugacy, Ch. 6.) ↩︎
Abramowitz, M., & Stegun, I. A. (1972). Handbook of Mathematical Functions. Dover. (Incomplete beta function, §6.6.) ↩︎