Passphrases Optimized for Typing Speed

April 3, 2026

What makes a good passphrase? The standard answer is entropy: pick words uniformly at random from a large dictionary, and each word contributes $\log_2 |\text{dict}|$ bits. Diceware gives you 12.9 bits per word from a 7776-word list. Six words, 77 bits, done.

But entropy measures how hard it is to guess, not how hard it is to type. correct horse battery staple and which music water though have the same entropy at the same dictionary size, but one takes measurably longer to type than the other. The difference is in the digraphs -- the letter-to-letter transitions your fingers actually execute.

This post describes phrasegen, a passphrase generator that optimizes for entropy per unit of typing time, using a digraph timing model fitted to real keystroke data.

The typing model

Every two-character transition has a characteristic latency. th is fast (common sequence, practiced). zq is slow (awkward reach, rare). If you measure enough keystroke data, you can build a model where each digraph $(a, b)$ has an expected latency $\mu_{a,b}$ in milliseconds.

The predicted total typing time for a string $s = x_1 x_2 \ldots x_L$ is:

T^(s)=i=1L1μxi,xi+1\widehat{T}(s) = \sum_{i=1}^{L-1} \mu_{x_i, x_{i+1}}

This is a ranking model, not a physics model. It orders phrases by expected difficulty and spots obviously slow transitions. It does not model error rate, higher-order chunking, or the time lost to corrections.

To give a concrete sense of scale: in the fitted model, e|l (same hand, rolling inward) has a mean of 90 ms, while s|e (pinky to middle finger) comes in at 180 ms -- a 2x difference on a single digraph. A four-word passphrase has 20-30 digraphs, so the cumulative spread between a fast and slow phrase at the same word count can exceed 1-2 seconds.

The dataset

The base model is fitted from a union of public keystroke-dynamics datasets:

All datasets are converted to a common row format: {"phrase":"hello","digraph_dt_ms":[120,90,110,130]}. Rows with negative or non-finite timings are discarded. Rows containing backspace events are excluded from fitting (they reflect correction behavior, not clean motor latency).

Fitting applies a minimum observation threshold (default: 3) per digraph. Digraphs below that fall back to the global mean.

The optimization problem

Given a dictionary $D$ and a target entropy $H$ bits, find a subset $S \subseteq D$ with $|S| = \lfloor 2^{H/k} \rfloor$ words that minimizes expected typing time:

minSD,S=mEwUniform(S)[T^(w)]\min_{S \subseteq D,\, |S| = m} \mathbb{E}_{w \sim \text{Uniform}(S)}[\widehat{T}(w)]

This reduces to: score every word in $D$ with the timing model, sort by predicted time, take the top $m$ . Entropy is preserved because sampling is uniform over the subset.

The memorability tradeoff

The speed-optimal wordset produces fast but obscure words. The EFF large wordlist (7776 words, designed to be unambiguous when spoken aloud) trades some speed for memorability:

Wordset Bits (4 words) Example
Speed-optimal 60 hebete-nidus-bunny-sarus
EFF large 52 jinx-ligament-banter-jokester

The EFF wordset gives up ~8 bits at 4 words -- recover them by using 5 words instead (65 bits). This is the recommended starting point: the speed-optimal wordset's gains aren't worth the memorability loss for most use cases.

How much faster?

The analyze-generator command runs Monte Carlo sampling to characterize the distribution of typing times. From 50,000 samples with 4 words, numbers-symbols style, and --pick-best-of 10 (choosing the fastest of 10 candidates):

For context, a 4-word passphrase from a 4096-word wordset has $4 \times 12 = 48$ bits nominal entropy. Showing 10 alternatives and picking the fastest costs ~1.3 bits -- a modest penalty.

Under tight --max-chars 16 constraints, rejection sampling concentrates the distribution further: only 4% of draws pass the length filter, with an additional -322 ms mean improvement. The shorter strings are faster, but most of the gain comes from the length constraint, not the speed model.

Fitting the model to your hands

The base model averages over hundreds of people. A left-handed Dvorak user has systematically different fast transitions than a right-handed QWERTY touch-typist.

The adaptation uses a Bayesian posterior mean update. For each digraph $(a, b)$, if the base model has prior mean $\mu_0$ and the user provides $n$ observations with sum $S$ :

μ=c0μ0+Sc0+n\mu' = \frac{c_0 \mu_0 + S}{c_0 + n}

where $c_0 = 50$ is the default pseudo-count (the strength of the base prior). Digraphs absent from the base model are added if the user has at least 3 observations.

The personalization loop:

# Record 20 typing samples (prompts you to type phrases)
just record 20

# Build personalized model from base + your recordings
just personalize data/base data/user

# Sample with your model
cargo run -- sample-passphrases \
  --model data/user/model_personalized.json \
  --wordlist data/user/wordset_user.txt \
  --words 5 --style hyphens

Security

Does optimizing for typing speed weaken security? Only if the attacker knows your personalized dictionary and can exploit it.

The attacker model: the attacker knows you used phrasegen, your style preset, and your wordset size. The attacker does not know your RNG draws.

Under that model, sampling $k$ words uniformly from a wordset of $N$ words gives $H = k \log_2 N$ bits. This is the same combinatorial guarantee as standard Diceware -- the typing model affects word ranking, not sampling uniformity.

Three scenarios:

  1. Attacker doesn't know your subset: they search the full dictionary. Full entropy.
  2. Attacker knows your subset $S$ of size $N$ : they search $N^k$ combinations. Exactly the entropy you designed for.
  3. Attacker has your timing model: they know which words are fast for you, so they could try those first. But the search space is still $N^k$ -- knowing the rank ordering doesn't reduce the number of guesses needed.

Two practices that actually reduce security: using --seed for real passwords (makes generation deterministic) and using --pick-best-of without accounting for the entropy penalty. The tool warns about both.

Using it

# Build the base timing model from public data
just base data/base

# Fetch the EFF large wordlist
just eff-fetch

# Hyphen-separated, 5 words, ~65 bits
just eff-sample
# e.g. jinx-ligament-banter-jokester-glory

# Login-box style (TitleCase + 2 trailing digits)
just eff-sample-login
# e.g. JinxLigamentBanterJokesterGlory42

# Score an existing phrase
cargo run -- score \
  --model data/base/model_union.json \
  --phrase "correct-horse-battery-staple"