Which Boggle Words Are Actually Worth Memorizing?
April 17, 2026
The competitive Boggle player’s question is not “what words are on this board?” but “which words should I have memorized before the game started?” Monte Carlo sampling over 15,000 random boards gives frequency estimates accurate to one percentage point (by Hoeffding). Rescoring by TF-IDF against the American National Corpus surfaces words like toea (Papua New Guinea’s monetary unit) that appear on 1 in 24 boards and zero times in ordinary English.
The competitive Boggle player’s question is not “what words are on this board?” but “which words should I have memorized before the game started?” The two questions have different answers. A word like toes appears on a lot of boards, but so does every other player, so the marginal value of knowing it is small. The words worth memorizing are the ones that appear often and that most opponents won’t find.
# The search problem
A standard Boggle set has 16 cubes, each with six faces. The cube faces are fixed by the manufacturer – the standard distribution is:
CUBES = [
('a','a','e','e','g','n'), ('a','b','b','j','o','o'),
('a','c','h','o','p','s'), ('a','f','f','k','p','s'),
('a','o','o','t','t','w'), ('c','i','m','o','t','u'),
('d','e','i','l','r','x'), ('d','e','l','r','v','y'),
('d','i','s','t','t','y'), ('e','e','g','h','n','w'),
('e','e','i','n','s','u'), ('e','h','r','t','v','w'),
('e','i','o','s','s','t'), ('e','l','r','t','t','y'),
('h','i','m','n','u','q'), ('h','l','n','n','r','z'),
]
To enumerate every possible board exactly, you would need to account for all arrangements of 16 cubes in 16 positions ( $16!$ orderings), and for each cube, one of 6 faces showing ( $6^{16}$ face combinations). That product exceeds $10^{28}$. Exhaustive enumeration is not feasible. The alternative is to treat board generation as a random process and sample.
# Finding words on a board
A Boggle board is a 4x4 grid where adjacent cells share an edge or corner (8-connectivity). A word is valid if its letters can be traced as a path through the grid with no cell visited twice. The standard approach is depth-first search guided by a trie built from the dictionary.1
class TrieNode:
def __init__(self):
self.children = {}
self.is_word = False
def build_trie(words):
root = TrieNode()
for word in words:
node = root
for ch in word:
node = node.children.setdefault(ch, TrieNode())
node.is_word = True
return root
The trie prunes the DFS aggressively: if no word in the dictionary starts with the prefix accumulated so far, the search backtracks immediately rather than continuing deeper into the grid. Without this pruning, the DFS explores $O(8^k)$ paths for words of length $k$ – and counting self-avoiding walks on 2D grid subgraphs is #P-complete (Liśkiewicz, Ogihara & Toda 2003), so no polynomial-time exact count exists. Production Boggle solvers typically use a DAWG (directed acyclic word graph) rather than a trie: a DAWG merges common suffixes as well as prefixes, making it 5–10× smaller on a standard tournament word list.
def find_words(board, trie_root, min_length=3):
found = set()
rows, cols = 4, 4
def dfs(r, c, node, path, visited):
ch = board[r][c]
if ch not in node.children:
return
node = node.children[ch]
path.append(ch)
visited.add((r, c))
if len(path) >= min_length and node.is_word:
found.add(''.join(path))
for dr in (-1, 0, 1):
for dc in (-1, 0, 1):
if dr == 0 and dc == 0:
continue
nr, nc = r + dr, c + dc
if 0 <= nr < rows and 0 <= nc < cols and (nr, nc) not in visited:
dfs(nr, nc, node, path, visited)
path.pop()
visited.remove((r, c))
for r in range(rows):
for c in range(cols):
dfs(r, c, trie_root, [], set())
return found
To generate a random board, shuffle the 16 cubes into a random order, then pick a random face from each:
import random
def random_board(cubes):
order = list(range(16))
random.shuffle(order)
board = [random.choice(cubes[i]) for i in order]
return board
# How many boards to sample
The frequency of a word across boards is a probability $p$ estimated from $n$ samples. Each board is an independent Bernoulli trial: the word either appears or it does not. We want the estimate $\hat{p}$ to be within $\epsilon$ of the true frequency $p$ with high probability.
Hoeffding’s inequality gives a bound for bounded random variables. For $n$ independent Bernoulli observations with mean $\hat{p}$:
Setting this bound equal to $\alpha$ and solving for $n$:
import math
def hoeffding_n(epsilon=0.01, alpha=0.1):
return math.ceil(math.log(2 / alpha) / (2 * epsilon**2))
n = hoeffding_n(epsilon=0.01, alpha=0.1)
print(n) # 14979
With $\varepsilon = 0.01$ (we want frequency estimates accurate to one percentage point) and $\alpha = 0.1$ (90% confidence), the bound requires 14,979 boards. This is an overestimate in practice – Hoeffding assumes the worst case – but it is easy to compute and gives a principled stopping criterion. Rounding up to 15,000 boards is safe. For rare words (say $p \approx 0.04$), Bernstein’s inequality, which accounts for the actual variance $p(1-p)$, tightens the required sample size by roughly 6×; Hoeffding remains the simpler choice here since it requires no estimate of $p$ up front.
# Raw results
Running the Monte Carlo with 15,000 boards against a standard dictionary2 produces frequency estimates for every word that appeared on at least one board. The most common words are short, vowel-rich, and phonetically obvious:
| Word | Frequency | Notes |
|---|---|---|
| teen | 6.4% | |
| tees | 6.4% | |
| toes | 5.8% | |
| note | 5.8% | |
| teat | 5.7% | |
| tone | 5.6% | |
| eons | 5.5% | |
| ones | 5.4% | |
| nose | 5.3% | |
| rote | 5.1% |
These words appear often because their component letters – T, E, S, O, N – appear on multiple cubes in the standard set. Teen requires T, E, E, N; all four appear on cubes that show up in a large fraction of boards and are common enough that adjacency conditions are frequently satisfied.
The problem with this list for competitive purposes is obvious: every player in the room knows toes and note. Knowing them gives no advantage.
# The insight: common != useful
What we actually want is a word that satisfies two conditions simultaneously: it appears often enough that it is reliably present on boards, and it is obscure enough that most opponents will not find it. TF-IDF captures both conditions: a word scores high when it is frequent within the “document” (Boggle boards) but rare across the “corpus” (general English usage).3 A word’s Boggle term frequency is its board appearance rate. Its inverse document frequency is derived from its frequency in ordinary written English – a word that appears rarely in the American National Corpus is one that most opponents will not recognize or recall mid-game.
For each word $w$, define:
where $N$ is the total token count in the American National Corpus and $f(w)$ is the count of $w$ in the corpus. The TF-IDF score is their product:
Words with zero corpus count – valid dictionary words that never appear in the ANC – receive the maximum IDF value ( $\log N$). These are legal Boggle words that the ANC, representing ordinary written English, never uses. They are exactly the words worth memorizing.
import math
from collections import Counter
def tfidf_scores(board_freqs, corpus_counts, corpus_total):
scores = {}
for word, tf in board_freqs.items():
f = corpus_counts.get(word, 0)
if f == 0:
idf = math.log(corpus_total)
else:
idf = math.log(corpus_total / f)
scores[word] = tf * idf
return scores
# Results after rescoring
The top words by TF-IDF score are a different list entirely:
| Word | Board freq | Corpus freq | Score | Meaning |
|---|---|---|---|---|
| toea | 4.2% | 0 | highest | monetary unit of Papua New Guinea |
| seta | 3.8% | rare | high | bristle or stiff hair (biology) |
| nett | 3.6% | rare | high | variant spelling of “net” (chiefly British) |
| teat | 5.7% | low | high | nipple |
| stet | 3.4% | rare | high | proofreading mark: “let it stand” |
| tret | 2.9% | 0 | high | historical allowance for waste in weighing goods |
| rete | 3.1% | rare | high | anatomical network of nerve fibers or blood vessels |
| sett | 3.2% | rare | high | badger’s burrow; also a pattern in tartan |
Toea is the standout result. It appears on roughly 1 in 24 boards – a high rate, reflecting that T, O, E, and A are among the most common letters in the standard cube set. But it appears zero times in the ANC. It is a valid Scrabble dictionary word4 almost no English speaker outside Papua New Guinea knows. On any board where it appears, a player who knows it scores unopposed.
The rest of the list has a consistent character: technical terms (rete, seta), archaic commercial vocabulary (tret), British variant spellings (nett, sett), and proofreading jargon (stet). These words are not obscure because they are unusual letter combinations – they are obscure because the concepts they name are domain-specific or historical. The letters appear all the time; the words do not.
Words like teat appear high on both lists – it scores well on TF-IDF because its corpus frequency is genuinely low relative to its board frequency, even though the word is not obscure. It is worth knowing regardless.
# The 40% vowel claim
A persistent piece of Boggle folk wisdom holds that boards with roughly 40% vowels produce the most scoreable words. The claim is testable from the same Monte Carlo data.
For each sampled board, count the number of vowels (A, E, I, O, U) among the 16 tiles and record how many valid words were found. Averaging across boards with the same vowel count gives the expected word count as a function of vowel density.
The data confirms the claim. The distribution of word counts peaks at 6 vowels out of 16 tiles – 37.5%, close to the 40% figure cited informally. At that count, the average board yields approximately 80–90 findable words of three or more letters. Boards with fewer than 4 vowels or more than 9 produce substantially fewer words, as the letter balance becomes too skewed to allow the short vowel-consonant alternations that English words require.
The peak at 6 rather than 7 or 8 is consistent with how English words are distributed. Common consonant clusters (ST, TR, TS, NT) require consonant adjacency; too many vowels crowd out the consonant-heavy tiles needed to complete those clusters. The 37–40% range is where the two requirements balance. Dan Vanderkam’s 2025 exhaustive search for the globally optimal Boggle board independently confirms this range: the highest-scoring boards cluster in the same vowel-fraction window.
# References
[1] The standard Boggle dictionary used in competitive play is SOWPODS (the combined Official Scrabble Players Dictionary and Official Scrabble Words list). For this analysis, the TWL06 (Tournament Word List 2006) was used, which includes all words of 3 to 8 letters since most Boggle scoring caps at 8 letters. ↩
[2] The Hoeffding bound used here is tight in the sense that no distribution-free bound can do better at this level of generality. In practice, word frequencies are far from worst-case Bernoulli (many words appear with probability near 0 or near some small $p \ll 0.5$), so the actual sampling variance is lower and fewer boards would suffice for most estimates. 15,000 is conservative. ↩
[3] Salton, G. and McGill, M.J., Introduction to Modern Information Retrieval, McGraw-Hill, 1983. IDF was introduced by Spärck Jones (1972); Salton and Yang (1973) subsequently combined it with term frequency to form TF-IDF as used here. ↩
[4] Toea is valid in the Official Scrabble Players Dictionary (OSPD5) and TWL06. Worth 7 points in Scrabble before position bonuses. ↩