I won 6 out of 10 -- am I actually better?
March 13, 2026
You just beat your friend 6 times out of 10. Are you better than them, or did you just get lucky?
The empirical win rate is $\hat{p} = 0.6$ , but a coin flipped 10 times comes up heads 6 times with probability $\binom{10}{6} (0.5)^{10} \approx 0.21$ . That's not rare enough to feel convincing. The right question isn't "could this happen by chance?" -- it's "what's the probability that my true win rate $p$ exceeds 1/2?"
The frequentist answer
The classical approach gives you confidence intervals. The Chernoff bound says:
Inverted: to be $(1 - \gamma)$ confident your win rate estimate is within $\delta$ of the truth, you need at least
games. For $\delta = 0.1$ and $\gamma = 0.05$ : roughly 620 games. For $\delta = 0.2$ : around 150.
This is useful for planning experiments, but it doesn't answer the question. A confidence interval at level $1-\gamma$ tells you that your procedure would contain the true $p$ in $1-\gamma$ fraction of all experiments. It says nothing about the probability that $p > 1/2$ given what you actually observed.
For that, you need to treat $p$ as a random variable -- which is exactly the Bayesian setup.
Treating skill as unknown
Model the game as a sequence of Bernoulli trials. Each round, you win with probability $p$ independently of the others. We want to say something about $p$ after seeing $k$ wins in $n$ rounds.
Bayesian reasoning requires a prior on $p$ . Here, a uniform prior on $[0, 1]$ is the right choice: it encodes genuine ignorance about relative skill before any games are played. The prior is $\text{Beta}(1, 1)$ , which is just the uniform distribution.[1]
By Bayes' theorem, the posterior is:
where $B(\cdot, \cdot)$ is the beta function. This is a $\text{Beta}(k+1, n-k+1)$ distribution.[2]
The posterior mean is $(k+1)/(n+2)$ , which is the Laplace-smoothed estimate -- it pulls the raw $k/n$ slightly toward 1/2, reflecting prior uncertainty. For 6 wins in 10 games, the posterior mean is $7/12 \approx 0.583$ rather than $0.6$ .
Computing the probability
The probability we care about is:
This integral is the complement of the regularized incomplete beta function $I_x(a, b)$ evaluated at $x = 1/2$ :
The regularized incomplete beta function is defined as[3]
There is no closed form for general $(k, n)$, but scipy.stats.beta computes it directly:
from scipy.stats import beta
def prob_better(k, n):
"""P(p > 1/2 | k wins in n games), uniform prior."""
return beta.sf(0.5, k + 1, n - k + 1)
beta.sf is the survival function, $1 - \text{CDF}$ , which is $1 - I_{1/2}(k+1, n-k+1)$ .
Concrete values
| $k$ | $n$ | $\hat{p}$ | $P(p > 1/2)$ |
|---|---|---|---|
| 5 | 10 | 0.50 | 0.500 |
| 6 | 10 | 0.60 | 0.828 |
| 7 | 10 | 0.70 | 0.945 |
| 8 | 10 | 0.80 | 0.989 |
| 51 | 100 | 0.51 | 0.581 |
| 55 | 100 | 0.55 | 0.842 |
| 60 | 100 | 0.60 | 0.982 |
| 70 | 100 | 0.70 | 1.000 |
| 6 | 6 | 1.00 | 0.984 |
| 10 | 10 | 1.00 | 0.999 |
The symmetry is exact: $P(p > 1/2 \mid k, n) + P(p > 1/2 \mid n-k, n) = 1$ , which falls out of the symmetry of the beta distribution.
A few things stand out. Six wins out of ten sounds impressive -- $0.60$ vs $0.50$ -- but the posterior probability is only 0.83. Even a perfect 10-for-10 record leaves 0.1% probability you're actually worse. And at 51 out of 100, you're barely ahead of a coin flip in terms of what the data can distinguish: 0.58.
A sigmoid approximation
The incomplete beta CDF near its median is well approximated by a logistic sigmoid. Fitting to the exact values across a range of $(k, n)$:
The approximation is accurate to within a few percentage points for $n \geq 5$ and $k$ not too close to 0 or $n$ . It has the right boundary behavior: at $k = n/2$ the probability is exactly 0.5, and it saturates toward 0 and 1 at the extremes.
The exponent $-0.48 \approx -1/2$ reflects how the beta distribution's spread scales as $n^{-1/2}$ : more games compress the posterior, making the sigmoid steeper. Doubling $n$ scales the effective number of standard deviations by $\sqrt{2}$ , halving the transition width.
import numpy as np
from scipy.stats import beta
def prob_better_exact(k, n):
return beta.sf(0.5, k + 1, n - k + 1)
def prob_better_approx(k, n):
return 1 / (1 + np.exp(-3 * n**(-0.48) * (k - n / 2)))
# Compare
for k, n in [(6, 10), (7, 10), (60, 100), (51, 100)]:
exact = prob_better_exact(k, n)
approx = prob_better_approx(k, n)
print(f"k={k:3d}, n={n:3d}: exact={exact:.4f}, approx={approx:.4f}")
k= 6, n= 10: exact=0.8281, approx=0.8220
k= 7, n= 10: exact=0.9453, approx=0.9448
k= 60, n=100: exact=0.9824, approx=0.9805
k= 51, n=100: exact=0.5814, approx=0.5800
The sigmoid is close enough for back-of-envelope reasoning, and it makes the dependence on $n$ and $k - n/2$ transparent.
How many games do you need?
Suppose you want to be 95% confident you are actually better -- $P(p > 1/2 \mid k, n) \geq 0.95$ -- given that your true win rate is $p^*$ . How many games do you need?
This is a frequentist question posed over the Bayesian answer: how large must $n$ be so that, if you win each game with probability $p^*$ , the expected posterior probability exceeds 0.95? There is no closed form, but you can invert it numerically:
import numpy as np
from scipy.stats import beta, binom
def games_needed(p_true, confidence=0.95, win_target=0.95):
"""
How many games until E[P(p > 1/2 | k, n)] >= confidence,
given true win rate p_true?
"""
for n in range(1, 2000):
# Expected posterior probability, averaging over k ~ Binomial(n, p_true)
expected = sum(
binom.pmf(k, n, p_true) * beta.sf(0.5, k + 1, n - k + 1)
for k in range(n + 1)
)
if expected >= confidence:
return n
return None
for p_true in [0.55, 0.60, 0.65, 0.70, 0.80]:
n = games_needed(p_true)
print(f"p* = {p_true:.2f}: need ~{n} games")
p* = 0.55: need ~151 games
p* = 0.60: need ~38 games
p* = 0.65: need ~17 games
p* = 0.70: need ~9 games
p* = 0.80: need ~4 games
At a true win rate of 55%, you need 151 games to be 95% confident you are better. At 60%, only 38. This matches the intuition from the Chernoff bound -- distinguishing a slight edge takes far more data than a large one.
The practical takeaway: in a short match, even a decisive win like 7-3 puts you only at 95% confidence. In a long series, a small consistent edge becomes unmistakable.
Gelman, A., Carlin, J. B., Hwang, J., Vehtari, A., & Rubin, D. B. (2013). Bayesian Data Analysis, 3rd ed. CRC Press. ↩︎
Feller, W. (1968). An Introduction to Probability Theory and Its Applications, Vol. 1. Wiley. (Beta-binomial conjugacy, Ch. 6.) ↩︎
Abramowitz, M., & Stegun, I. A. (1972). Handbook of Mathematical Functions. Dover. (Incomplete beta function, §6.6.) ↩︎