What Differences Reveal

December 9, 2024

Repeatedly subtracting consecutive terms of an integer sequence produces a difference table that functions as exact discrete calculus: the forward difference operator Δ\Delta mirrors d/dxd/dx, falling factorials replace monomials, and Newton’s formula reconstructs any polynomial from the table’s left edge using binomial coefficients. The binomial transform of the left-edge values sends Fibonacci to itself with alternating signs – a fixed-point property that falls out of φ1=1/φ\varphi - 1 = 1/\varphi.

Take any sequence of integers. Subtract each term from the next. You get a new sequence. Do it again. And again. What emerges is a difference table, and it reveals far more about your sequence than you might expect.

This is not numerical differentiation. There is no approximation, no truncation error, no floating-point noise (on integer sequences). What follows is an exact discrete calculus where Δ\Delta plays the role of d/dxd/dx, falling factorials replace monomials, and every theorem of continuous calculus has a discrete twin.

# A Simple Game

Let’s play with the triangular numbers: 1, 3, 6, 10, 15, 21, …

These are figurate numbers: they count dots in regular geometric arrangements. The triangular numbers count dots in triangles:

    •          •          •
              • •        • •
                        • • •
    1          3          6

Now build the difference table. Write the sequence, then below each pair write their difference:

Difference table of the triangular numbers
The triangular numbers differenced: first differences are the counting numbers, second differences are constant at 1, third differences are zero. The left edge reconstructs the sequence via Newton

The first differences are the counting numbers. The second differences are all 1. The third differences are all 0.

This is no accident. The triangular numbers are given by Tn=n(n+1)2T_n = \frac{n(n+1)}{2}, a quadratic polynomial – and every quadratic has constant second differences.

# Background

The idea of systematically differencing a sequence is old. Newton used it in the 1660s for interpolation, but the roots go back further, to James Gregory (1668) and Henry Briggs (1624), who built logarithm tables by the method of finite differences. Boole7 gave the first systematic treatment in 1860; Jordan2 wrote the classical reference that most 20th-century work builds on. Thomas Harriot had the essential idea even earlier, in unpublished manuscripts from the 1610s (recovered in Beery and Stedall, 200912).

The practical motive was always numerical tables. Briggs computed his Arithmetica Logarithmica (1624) by exploiting the fact that consecutive values of a polynomial can be computed by addition alone (no multiplication needed) if you maintain a running difference table. The same idea powered the Nautical Almanac computations for two centuries, and it’s what Babbage later mechanized in his Difference Engine.6

Newton’s contribution was to see beyond the computational trick: he formulated the general interpolation formula and recognized the connection to the binomial series. In a 1676 letter to Leibniz (the Epistola Prior), Newton described how any function tabulated at integer points could be approximated by its differences, extending finite differences from a computational device to a theoretical tool.1

# Operators: Δ\Delta, EE, and DD

The analogy with continuous calculus is not a coincidence. Define the forward difference operator Δ\Delta by

Δf(x)=f(x+1)f(x)\Delta f(x) = f(x+1) - f(x)

and the shift operator EE by

Ef(x)=f(x+1).E f(x) = f(x+1).

Then Δ=EI\Delta = E - I, where II is the identity operator. This is the discrete analogue of the derivative. Where ddx\frac{d}{dx} measures instantaneous rate of change, Δ\Delta measures the change over a unit step.

The parallel runs deep. In continuous calculus, ddxxn=nxn1\frac{d}{dx} x^n = n x^{n-1}. In discrete calculus, Δ\Delta acts cleanly not on ordinary powers xnx^n but on falling factorials,

xn=x(x1)(x2)(xn+1)x^{\underline{n}} = x(x-1)(x-2)\cdots(x-n+1)

The rule is Δxn=nxn1\Delta x^{\underline{n}} = n \cdot x^{\underline{n-1}}, a perfect mirror of the power rule. The falling factorial plays the role of xnx^n in the discrete world, and the binomial coefficient (xn)=xnn!\binom{x}{n} = \frac{x^{\underline{n}}}{n!} plays the role of xnn!\frac{x^n}{n!}.

Once you accept falling factorials as the natural basis, every theorem of continuous calculus has an exact discrete twin. The correspondence is not a loose analogy. It is a dictionary:

Continuous Discrete
Derivative ddx\frac{d}{dx} Forward difference Δ\Delta
xnx^n Falling factorial xnx^{\underline{n}}
ddxxn=nxn1\frac{d}{dx} x^n = n x^{n-1} Δxn=nxn1\Delta x^{\underline{n}} = n \cdot x^{\underline{n-1}}
xndx=xn+1n+1+C\int x^n dx = \frac{x^{n+1}}{n+1} + C Δ1xn=xn+1n+1+C\Delta^{-1} x^{\underline{n}} = \frac{x^{\underline{n+1}}}{n+1} + C
Taylor series: f(x)=f(k)(0)k!xkf(x) = \sum \frac{f^{(k)}(0)}{k!} x^k Newton series: f(n)=Δkf(0)(nk)f(n) = \sum \Delta^k f(0) \cdot \binom{n}{k}
xnn!\frac{x^n}{n!} (xn)=xnn!\binom{x}{n} = \frac{x^{\underline{n}}}{n!}
exe^x (eigenfunction of ddx\frac{d}{dx}) 2x2^x (eigenfunction of Δ\Delta: Δ2x=2x\Delta 2^x = 2^x)

The last row explains why powers of 2 have self-similar difference tables: Δ\Delta acts on 2x2^x the way ddx\frac{d}{dx} acts on exe^x: it returns the function unchanged. More precisely, any axa^x is an eigenfunction of Δ\Delta with eigenvalue a1a - 1; the choice a=2a = 2 is special because it gives eigenvalue 1.

Higher-order differences compose exactly as you’d expect:

Δnf(x)=k=0n(nk)(1)nkf(x+k)\Delta^n f(x) = \sum_{k=0}^{n} \binom{n}{k} (-1)^{n-k} f(x+k)

This follows from expanding Δn=(EI)n\Delta^n = (E - I)^n by the binomial theorem – which works because EE and II commute. The formula says: to compute the nn-th difference, take an alternating weighted sum of n+1n+1 consecutive values. This is exactly what building the difference table row by row computes, just expressed in closed form.

The operator-theoretic viewpoint extends further.1 The formal relation between Δ\Delta and the continuous derivative D=ddxD = \frac{d}{dx} is Δ=eD1\Delta = e^D - 1, since Ef(x)=eDf(x)=f(x+1)Ef(x) = e^D f(x) = f(x+1) by Taylor’s theorem. Inverting: D=ln(1+Δ)=ΔΔ22+Δ33D = \ln(1 + \Delta) = \Delta - \frac{\Delta^2}{2} + \frac{\Delta^3}{3} - \cdots. These formal power series in operators converge when applied to polynomials (which are annihilated by sufficiently high powers of Δ\Delta or DD), and they form the basis of the operator method in combinatorics.

The backward difference operator f(x)=f(x)f(x1)\nabla f(x) = f(x) - f(x-1) and the central difference operator δf(x)=f(x+12)f(x12)\delta f(x) = f(x + \tfrac{1}{2}) - f(x - \tfrac{1}{2}) give rise to their own families of formulas (Stirling’s interpolation, Gauss’s formulas), but the forward difference is the most natural starting point.

# The General Pattern

For any polynomial of degree dd, the dd-th differences are constant, and the (d+1)(d+1)-th differences are zero.

Try the cubes: 0, 1, 8, 27, 64, 125…

Difference table of cubes with left edge and constant row highlighted
The difference table of n^3. Blue: left edge (Newton coefficients 0, 1, 6, 6, 0). Yellow: constant third differences at 6 = 3!. Fourth differences are zero – confirming degree 3.

Third differences constant at 6. Fourth differences zero. The cubes are cubic polynomials – degree 3.

The figurate number pattern continues: the square pyramidal numbers 1, 5, 14, 30, 55, 91 count balls stacked in a square pyramid. Each layer is a square number, so the pyramidal numbers are partial sums of squares: Pn=k=1nk2P_n = \sum_{k=1}^{n} k^2. Their third differences are constant at 2, confirming degree 3 (the sum of a degree-2 sequence is degree 3, exactly as the discrete antidifference predicts).

This works in reverse too. If someone hands you a mystery sequence and its third differences are constant, you know it’s generated by a cubic polynomial. You can even reconstruct the formula.

Why does the dd-th difference of a degree-dd polynomial come out constant? Because the operator Δ\Delta reduces degree by exactly 1. To see this: if f(x)=cxd+lower termsf(x) = cx^d + \text{lower terms}, then Δf(x)=c[(x+1)dxd]=c[dxd1+lower terms]\Delta f(x) = c[(x+1)^d - x^d] = c[dx^{d-1} + \text{lower terms}], so deg(Δf)=d1\deg(\Delta f) = d - 1 with leading coefficient cdcd. Applying this dd times, Δdf(x)=cd!\Delta^d f(x) = c \cdot d! , a constant. One more application gives Δd+1f(x)=0\Delta^{d+1} f(x) = 0. For the cubes, f(x)=x3f(x) = x^3 with c=1c = 1, so Δ3f(x)=3!=6\Delta^3 f(x) = 3! = 6.

This gives a practical test: if the kk-th row of a difference table is constant, the sequence is generated by a polynomial of degree exactly kk. The converse holds too: non-polynomial sequences (exponentials, factorials, primes) never terminate. (For primes: the nn-th prime pnp_n grows as nlnnn \ln n by the prime number theorem, which is not a polynomial in nn, so no finite-degree polynomial can reproduce the prime sequence.)

The calculus analogy extends to summation. Just as integration undoes differentiation, the antidifference Δ1\Delta^{-1} (also called the indefinite sum) undoes Δ\Delta. Since Δxn=nxn1\Delta x^{\underline{n}} = n \cdot x^{\underline{n-1}}, we get Δ1xn=xn+1n+1+C\Delta^{-1} x^{\underline{n}} = \frac{x^{\underline{n+1}}}{n+1} + C – the discrete analogue of xndx=xn+1n+1+C\int x^n dx = \frac{x^{n+1}}{n+1} + C. This gives closed forms for discrete sums: k=0n1k2=k=0n1k(k1)=n(n1)(n2)3\sum_{k=0}^{n-1} k^{\underline{2}} = \sum_{k=0}^{n-1} k(k-1) = \frac{n(n-1)(n-2)}{3}, which you can verify is x330n\frac{x^{\underline{3}}}{3}\big|_0^n.

Here is a short Python function that computes the difference table:

def difference_table(seq):
    """Return the difference table as a list of rows."""
    table = [list(seq)]
    while len(table[-1]) > 1:
        row = table[-1]
        table.append([row[i+1] - row[i] for i in range(len(row) - 1)])
    return table

# Triangular numbers
for row in difference_table([1, 3, 6, 10, 15, 21, 28]):
    print(row)
# [1, 3, 6, 10, 15, 21, 28]
# [2, 3, 4, 5, 6, 7]
# [1, 1, 1, 1, 1]
# [0, 0, 0, 0]

# Newton’s Forward Difference Formula

Any sequence can be reconstructed from its difference table. Specifically:

an=k=0n(nk)Δka0a_n = \sum_{k=0}^{n} \binom{n}{k} \Delta^k a_0

where Δka0\Delta^k a_0 is the kk-th entry on the left edge of the difference table.

This is Newton’s forward difference formula. It says: to reconstruct a sequence, take the left edge of its difference table and weight by binomial coefficients.

# Derivation from the operator identity

The formula follows directly from the relation E=I+ΔE = I + \Delta. Applying EnE^n to ff at x=0x = 0:

f(n)=Enf(0)=(I+Δ)nf(0)=k=0n(nk)Δkf(0)f(n) = E^n f(0) = (I + \Delta)^n f(0) = \sum_{k=0}^{n} \binom{n}{k} \Delta^k f(0)

That’s it. The binomial theorem for operators gives Newton’s interpolation formula in one line.

# Example: reconstruct a polynomial

For the triangular numbers, the left edge is 1, 2, 1, 0, 0, … So:

Tn=(n0)1+(n1)2+(n2)1=1+2n+n(n1)2T_n = \binom{n}{0} \cdot 1 + \binom{n}{1} \cdot 2 + \binom{n}{2} \cdot 1 = 1 + 2n + \frac{n(n-1)}{2}

Simplify and you get n2+3n+22=(n+1)(n+2)2\frac{n^2 + 3n + 2}{2} = \frac{(n+1)(n+2)}{2}, which is Tn+1T_{n+1}. The +1+1 shift is because our sequence starts at a0=T1=1a_0 = T_1 = 1, not T0=0T_0 = 0.

For the cubes, the table starts at a0=1=13a_0 = 1 = 1^3, so our sequence is really f(n)=(n+1)3f(n) = (n+1)^3. The left edge is 1, 7, 12, 6, 0, 0, … and Newton’s formula gives:

f(n)=1(n0)+7(n1)+12(n2)+6(n3)f(n) = 1 \cdot \binom{n}{0} + 7 \cdot \binom{n}{1} + 12 \cdot \binom{n}{2} + 6 \cdot \binom{n}{3}

Expanding: 1+7n+6n(n1)+n(n1)(n2)=1+7n+6n26n+n33n2+2n=n3+3n2+3n+1=(n+1)31 + 7n + 6n(n-1) + n(n-1)(n-2) = 1 + 7n + 6n^2 - 6n + n^3 - 3n^2 + 2n = n^3 + 3n^2 + 3n + 1 = (n+1)^3. Correct.

# Connection to polynomial interpolation

Newton’s formula is an interpolation formula: given d+1d+1 values of a degree-dd polynomial, it reconstructs the polynomial exactly. The representation is in the Newton basis {(x0),(x1),(x2),}\{\binom{x}{0}, \binom{x}{1}, \binom{x}{2}, \ldots\} rather than the monomial basis {1,x,x2,}\{1, x, x^2, \ldots\}.

The Newton basis has a computational advantage: adding one more data point only requires computing one more difference, appending one more term. With the monomial basis (Vandermonde system), you’d need to re-solve the entire system. This is why Newton’s formula was the workhorse of numerical tables for three centuries, from Briggs’s logarithms to the Nautical Almanac.

There is a subtlety: Newton’s formula interpolates exactly through the given points a0,a1,,ada_0, a_1, \ldots, a_d using a polynomial of degree dd. But if the sequence is not polynomial, the formula still produces a value for ana_n at non-integer nn; it just gives the unique polynomial that passes through the first d+1d+1 data points. For non-polynomial sequences, the approximation degrades as nn moves far from the data. This is the discrete version of the Runge phenomenon in polynomial interpolation: high-degree fits to equispaced points can oscillate wildly between the data points, and more data can make the interpolant worse.Switching to the Newton basis does not cure this. Lagrange, Newton, and Hermite interpolation all produce the same polynomial through the same points; Runge instability is a property of the interpolating polynomial itself, driven by node spacing and degree, not by which basis is used to represent it. See Trefethen, Approximation Theory and Approximation Practice, ch. 11.

For polynomial sequences, the formula is exact for all nn, not just the tabulated values. This is the basis of the difference table test: if the dd-th differences are constant, Newton’s formula with d+1d+1 terms reproduces the sequence exactly and extends it to all integers (and even to non-integer arguments).

Here is a Python function that reconstructs a sequence from its left edge:

from math import comb

def newton_interpolate(left_edge, n):
    """Evaluate Newton's forward difference formula at n."""
    return sum(c * comb(n, k) for k, c in enumerate(left_edge))

# Recover triangular numbers from left edge [1, 2, 1]
print([newton_interpolate([1, 2, 1], n) for n in range(7)])
# [1, 3, 6, 10, 15, 21, 28]

# The Binomial Transform

The operation of “take all the differences and read down the left edge” has a name: the binomial transform.

Given a sequence (a0,a1,a2,)(a_0, a_1, a_2, \ldots), its binomial transform is the sequence of values at the left edge of the difference table:

bn=Δna0=k=0n(nk)(1)nkakb_n = \Delta^n a_0 = \sum_{k=0}^{n} \binom{n}{k} (-1)^{n-k} a_k

# Formal definition and inverse

The binomial transform can be written in matrix form. Let b=Ba\mathbf{b} = B \mathbf{a}, where BB is the lower-triangular matrix with entries Bnk=(nk)(1)nkB_{nk} = \binom{n}{k}(-1)^{n-k}. The inverse is clean: to recover the original sequence, apply the unsigned binomial transform:

an=k=0n(nk)bka_n = \sum_{k=0}^{n} \binom{n}{k} b_k

This is Newton’s forward difference formula: the left edge of the difference table (the bkb_k) reconstructs the original sequence when weighted by binomial coefficients. The forward (signed) and inverse (unsigned) transforms are related by Bnk1=(nk)B^{-1}_{nk} = \binom{n}{k}. To verify the inversion, compose the two. Apply the signed transform to the unsigned:

k(nk)(1)nkj(kj)cj=jcjk(nk)(kj)(1)nkδnj\sum_k \binom{n}{k}(-1)^{n-k}\sum_j \binom{k}{j} c_j = \sum_j c_j \underbrace{\sum_k \binom{n}{k}\binom{k}{j}(-1)^{n-k}}_{\delta_{nj}}

The inner sum vanishes for njn \ne j because k(njkj)(1)nk\sum_k \binom{n-j}{k-j}(-1)^{n-k} is the alternating row sum of Pascal’s triangle, which is zero unless the row has length 1.

(Some authors define the binomial transform without the signs: bn=k(nk)akb_n = \sum_k \binom{n}{k} a_k. Under that convention, the signed version is the inverse. The two conventions are conjugates of each other, differing by whether you call the “forward” direction signed or unsigned.)

# Examples

Powers of 2. The sequence an=2na_n = 2^n gives bn=k=0n(nk)(1)nk2k=(21)n=1b_n = \sum_{k=0}^{n} \binom{n}{k}(-1)^{n-k} 2^k = (2-1)^n = 1. So the binomial transform of (1,2,4,8,16,)(1, 2, 4, 8, 16, \ldots) is (1,1,1,1,1,)(1, 1, 1, 1, 1, \ldots).

Inversely, (1,1,1,)(1, 1, 1, \ldots) transforms to (1,2,4,8,)(1, 2, 4, 8, \ldots). The all-ones sequence and the powers of 2 are binomial transform pairs.

Powers of 3. Similarly, bn=(31)n=2nb_n = (3-1)^n = 2^n. So the transform of (1,3,9,27,)(1, 3, 9, 27, \ldots) is (1,2,4,8,)(1, 2, 4, 8, \ldots). Chains form: all-ones \leftrightarrow powers of 2 \leftrightarrow powers of 3 \leftrightarrow … where each application of the binomial transform subtracts 1 from the base. More generally, the binomial transform of ((a+1)n)((a+1)^n) is (an)(a^n), since k(nk)(1)nk(a+1)k=((a+1)1)n=an\sum_k \binom{n}{k}(-1)^{n-k}(a+1)^k = ((a+1)-1)^n = a^n.

Bell numbers. The Bell numbers BnB_n (1, 1, 2, 5, 15, 52, 203, …) count the total number of set partitions of {1,,n}\lbrace 1, \ldots, n \rbrace. They satisfy Bn+1=k=0n(nk)BkB_{n+1} = \sum_{k=0}^{n} \binom{n}{k} B_k – the unsigned binomial transform of (Bn)(B_n) is the shifted sequence (Bn+1)(B_{n+1}). This recurrence says: to partition {1,,n+1}\lbrace 1, \ldots, n+1 \rbrace, choose which kk elements share a block with n+1n+1 (the (nk)\binom{n}{k} factor) and partition the remaining nkn-k elements (the BnkB_{n-k} factor, which after reindexing gives BkB_k). The same recurrence emerges from the exponential generating function: the Bell EGF is B(x)=eex1B(x) = e^{e^x - 1}, and differentiating gives B(x)=exB(x)B'(x) = e^x B(x); expanding ex=xk/k!e^x = \sum x^k/k! and equating coefficients recovers exactly the binomial-transform recurrence.

Catalan numbers. The Catalan numbers Cn=1n+1(2nn)C_n = \frac{1}{n+1}\binom{2n}{n} (1, 1, 2, 5, 14, 42, …) have a binomial transform that gives the central Delannoy numbers, which count lattice paths with diagonal steps allowed. Riordan4 is the working reference for these identities and their combinatorial proofs.

# Connection to generating functions

If A(x)=anxnA(x) = \sum a_n x^n is the ordinary generating function (OGF) of (an)(a_n), and (bn)(b_n) is its binomial transform, then

bnxn=11+xA ⁣(x1+x).\sum b_n x^n = \frac{1}{1+x} A\!\left(\frac{x}{1+x}\right).

For exponential generating functions (EGFs), the relationship is cleaner. If A^(x)=anxnn!\hat{A}(x) = \sum a_n \frac{x^n}{n!}, then

bnxnn!=exA^(x).\sum b_n \frac{x^n}{n!} = e^{-x} \hat{A}(x).

Multiplication by exe^{-x} in the EGF world corresponds to the binomial transform in the coefficient world. This is one reason EGFs are so natural for combinatorics: many transforms that look complicated on OGFs become simple operations on EGFs.

In the OGF world, the forward transform is the substitution xx1+xx \mapsto \frac{x}{1+x} combined with a factor of 11+x\frac{1}{1+x}, and the inverse substitution xx1xx \mapsto \frac{x}{1-x} with factor 11x\frac{1}{1-x} undoes it.

# Fibonacci’s Self-Similarity

Most sequences lose their identity under the binomial transform: powers of 2 become all-ones, cubes become something unrecognizable. The Fibonacci sequence does something stranger:

Fibonacci difference table with left edge highlighted
The left edge of Fibonacci

Look at the left edge: 0, 1, -1, 2, -3, 5, -8, 13, -21…

It’s the Fibonacci numbers again, with alternating signs! The binomial transform of Fibonacci is (essentially) Fibonacci.

Why? The Fibonacci recurrence Fn=Fn1+Fn2F_n = F_{n-1} + F_{n-2} has characteristic roots ϕ=1+52\phi = \frac{1+\sqrt{5}}{2} and ψ=152\psi = \frac{1-\sqrt{5}}{2}, with the closed form Fn=ϕnψn5F_n = \frac{\phi^n - \psi^n}{\sqrt{5}}. The binomial transform sends αn(α1)n\alpha^n \mapsto (\alpha - 1)^n. The key step is computing ϕ1=512=ψ\phi - 1 = \frac{\sqrt{5}-1}{2} = -\psi and ψ1=512=ϕ\psi - 1 = \frac{-\sqrt{5}-1}{2} = -\phi. (Both follow from the defining property ϕ+ψ=1\phi + \psi = 1.) So:

bn=(ϕ1)n(ψ1)n5=(ψ)n(ϕ)n5=(1)n+1ϕnψn5=(1)n+1Fnb_n = \frac{(\phi-1)^n - (\psi-1)^n}{\sqrt{5}} = \frac{(-\psi)^n - (-\phi)^n}{\sqrt{5}} = (-1)^{n+1} \frac{\phi^n - \psi^n}{\sqrt{5}} = (-1)^{n+1} F_n

The binomial transform of FnF_n is (1)n+1Fn(-1)^{n+1} F_n. Applying the transform twice returns the original sequence: Fn(1)n+1Fn(1)n+1(1)n+1Fn=FnF_n \mapsto (-1)^{n+1} F_n \mapsto (-1)^{n+1}(-1)^{n+1} F_n = F_n. The Fibonacci sequence is a fixed point of the double binomial transform – a self-inverse up to sign alternation. This algebraic self-similarity is a shadow of the golden ratio’s defining property ϕ2=ϕ+1\phi^2 = \phi + 1, which gives ϕ1=1/ϕ\phi - 1 = 1/\phi.

More precisely: any sequence whose Binet-style closed form involves roots α\alpha satisfying α(α1)=±1\alpha(\alpha - 1) = \pm 1 will be an eigensequence (or near-eigensequence) of the binomial transform. The golden ratio satisfies ϕ(ϕ1)=1\phi(\phi - 1) = 1, which is why Fibonacci works. The Lucas numbers Ln=ϕn+ψnL_n = \phi^n + \psi^n have the same property: their binomial transform is (1)nLn(-1)^n L_n.

# OEIS Treasures

The On-Line Encyclopedia of Integer Sequences catalogs over 380,000 sequences. Many entries note when one sequence is the binomial transform of another.10

One favorite: A000079 (Powers of 2), 1, 2, 4, 8, 16, 32…

1    2    4    8   16   32
   1    2    4    8   16
      1    2    4    8
         1    2    4
            1    2
               1

Every row is the same sequence. The difference table is self-similar because 2n2n1=2n12^n - 2^{n-1} = 2^{n-1} – differencing just shifts the sequence. This is the table-level manifestation of the binomial transform pair we derived above: the all-ones sequence and the powers of 2 are transform partners.

# Stirling Numbers

The connection between ordinary powers xnx^n and the Newton basis (xk)\binom{x}{k} runs through the Stirling numbers.9

# Stirling numbers of the second kind

The Stirling number of the second kind, written S(n,k)S(n,k) or {nk}\lbrace{n \atop k}\rbrace, counts the number of ways to partition a set of nn elements into exactly kk non-empty subsets.

For example, S(4,2)=7S(4,2) = 7: the set {a,b,c,d}\lbrace a,b,c,d \rbrace can be split into two non-empty parts in 7 ways ({abcd}\lbrace a | bcd \rbrace, {bacd}\lbrace b | acd \rbrace, {cabd}\lbrace c | abd \rbrace, {dabc}\lbrace d | abc \rbrace, {abcd}\lbrace ab | cd \rbrace, {acbd}\lbrace ac | bd \rbrace, {adbc}\lbrace ad | bc \rbrace).

The connection to difference tables: applying Δk\Delta^k to xnx^n and evaluating at x=0x = 0 gives

Δkxnx=0=j=0k(kj)(1)kjjn=S(n,k)k!\Delta^k x^n \big|_{x=0} = \sum_{j=0}^{k} \binom{k}{j}(-1)^{k-j} j^n = S(n,k) \cdot k!

The left side is the kk-th entry on the left edge of the difference table of the sequence 0n,1n,2n,3n,0^n, 1^n, 2^n, 3^n, \ldots. Why does the right side equal k!S(n,k)k! \cdot S(n,k)? Count surjections from [n][n] to [k][k] by inclusion-exclusion: j=0k(kj)(1)kjjn\sum_{j=0}^{k} \binom{k}{j}(-1)^{k-j} j^n counts functions minus those missing at least one target, plus those missing at least two, and so on. Each surjection is an ordered partition into kk non-empty blocks, and there are k!k! orderings per partition. So the alternating sum equals k!S(n,k)k! \cdot S(n,k), and the difference table of nn-th powers encodes Stirling numbers.

This is why the change-of-basis formula works:

xn=k=0nS(n,k)k!(xk)=k=0nS(n,k)xkx^n = \sum_{k=0}^{n} S(n,k) \cdot k! \cdot \binom{x}{k} = \sum_{k=0}^{n} S(n,k) \cdot x^{\underline{k}}

Taking differences of a polynomial converts from the “power basis” to the “falling factorial basis,” and the conversion coefficients are exactly the Stirling numbers of the second kind.

# Stirling numbers of the first kind

The Stirling numbers of the first kind, written s(n,k)s(n,k) or [nk]\left[{n \atop k}\right] (unsigned), perform the reverse conversion.Notation is not standardized: Stanley’s Enumerative Combinatorics uses signed s(n,k)s(n,k) with the sign encoded in the change-of-basis formula; Knuth’s Concrete Mathematics uses the unsigned bracket [nk]\left[{n \atop k}\right], absorbing the sign into the convention. Knuth documents this fragmentation in “Two Notes on Notation” (1992).

xn=k=0ns(n,k)(1)nkxkx^{\underline{n}} = \sum_{k=0}^{n} s(n,k) \cdot (-1)^{n-k} \cdot x^k

Their combinatorial meaning: [nk]\left[{n \atop k}\right] counts the number of permutations of nn elements with exactly kk cycles.

For instance, [42]=11\left[{4 \atop 2}\right] = 11: among the 24 permutations of {1,2,3,4}\lbrace 1,2,3,4 \rbrace, exactly 11 have two cycles. (Examples: (1)(234)(1)(234), (12)(34)(12)(34), etc.)

The two kinds of Stirling numbers are inverses as change-of-basis matrices:

jS(n,j)s(j,k)(1)jk=δnk\sum_{j} S(n,j) \cdot s(j,k) \cdot (-1)^{j-k} = \delta_{nk}

This is the discrete analogue of the fact that differentiation and integration are inverse operations. The first kind converts from falling factorials to powers; the second kind converts from powers to falling factorials. Together they mediate between the two natural bases for polynomial sequences.

For fixed kk, S(n,k)knk!S(n,k) \sim \frac{k^n}{k!} as nn \to \infty. The number of ways to partition a large set into kk blocks is approximately the number of surjections from nn to kk divided by k!k! (correcting for block ordering). The Stirling numbers of the first kind have a more delicate asymptotic: [nk]n!n(lnn)k1(k1)!\left[{n \atop k}\right] \sim \frac{n!}{n} \cdot \frac{(\ln n)^{k-1}}{(k-1)!} for fixed kk, reflecting the logarithmic growth of the number of cycles in a random permutation.

# A small table

The Stirling numbers of the second kind for small nn, kk:

n\k   0   1   2   3   4   5
 0    1
 1    0   1
 2    0   1   1
 3    0   1   3   1
 4    0   1   7   6   1
 5    0   1  15  25  10   1

Each entry satisfies the recurrence S(n,k)=kS(n1,k)+S(n1,k1)S(n,k) = k \cdot S(n-1,k) + S(n-1,k-1): either the nn-th element joins one of the kk existing subsets (kk choices, hence the factor of kk), or it starts a new subset of its own.

# Worked example: x3x^3 in the Newton basis

Let’s verify the change-of-basis formula for x3x^3. We need the Stirling numbers S(3,1)=1S(3,1) = 1, S(3,2)=3S(3,2) = 3, S(3,3)=1S(3,3) = 1 from the table above. The formula gives:

x3=S(3,1)1!(x1)+S(3,2)2!(x2)+S(3,3)3!(x3)x^3 = S(3,1) \cdot 1! \cdot \binom{x}{1} + S(3,2) \cdot 2! \cdot \binom{x}{2} + S(3,3) \cdot 3! \cdot \binom{x}{3}

=1x+6x(x1)2+6x(x1)(x2)6=x+3x(x1)+x(x1)(x2)= 1 \cdot x + 6 \cdot \frac{x(x-1)}{2} + 6 \cdot \frac{x(x-1)(x-2)}{6} = x + 3x(x-1) + x(x-1)(x-2)

Expanding: x+3x23x+x33x2+2x=x3x + 3x^2 - 3x + x^3 - 3x^2 + 2x = x^3. Correct.

Notice that the coefficients 1!S(3,1)=11! \cdot S(3,1) = 1, 2!S(3,2)=62! \cdot S(3,2) = 6, 3!S(3,3)=63! \cdot S(3,3) = 6 are exactly the left edge values of the difference table of 03,13,23,33,0^3, 1^3, 2^3, 3^3, \ldots (which is 0,1,6,6,0,0, 1, 6, 6, 0, \ldots starting from a0=03=0a_0 = 0^3 = 0). The Stirling numbers are encoded in the difference table, waiting to be read off.

# Applications

# Gregory-Newton interpolation for numerical tables

Before computers, mathematical tables (logarithms, trigonometric functions, actuarial tables) were computed by hand and checked by differencing. If you have a table of sin(x)\sin(x) at equally spaced points and the fourth differences are nearly constant, you know the interpolation is accurate to the corresponding degree. Any entry that produces an anomalous difference is a transcription error.

Charles Babbage designed his Difference Engine (1822) specifically to automate this: it computed polynomial functions by repeated addition, implementing Newton’s formula mechanically. The machine had no multiplication unit: it didn’t need one, because the difference method reduces polynomial evaluation to addition.

The method was still in active use well into the 20th century. The Handbook of Mathematical Functions (Abramowitz and Stegun, 1964) includes extensive difference tables for standard functions, and its introduction describes differencing as the primary method for verifying table accuracy. The British Nautical Almanac Office employed human “computers” who differenced tables by hand until the 1950s (Croarken, 199013).

A single transcription error in a table entry produces a localized spike in the difference table – the error propagates through subsequent differences but with a characteristic signature (binomial coefficients with alternating signs). An experienced table-maker could spot and localize an error by inspecting the difference columns, a technique that predates error-correcting codes by two centuries.

# Detecting polynomial sequences

The most immediate application: if you encounter a sequence and want to know whether a polynomial generates it, compute the difference table. If the kk-th row is constant, the sequence is a polynomial of degree kk and Newton’s formula gives you the closed form immediately.

This is not limited to pure mathematics. Physical quantities that depend polynomially on a parameter (projectile height vs. time, area vs. length) can be identified by differencing a table of measurements. The method is more numerically stable than fitting a polynomial by least squares when the data points are equally spaced, because it avoids solving a Vandermonde system.

A concrete use case: the sum of kk-th powers, Sk(n)=1k+2k++nkS_k(n) = 1^k + 2^k + \cdots + n^k, is always a polynomial of degree k+1k+1 in nn. You can discover this experimentally by computing Sk(n)S_k(n) for n=0,1,2,n = 0, 1, 2, \ldots and differencing. For k=2k = 2: the sequence 0,1,5,14,30,55,910, 1, 5, 14, 30, 55, 91 has constant third differences (equal to 2), confirming degree 3. The left edge is 0,1,3,20, 1, 3, 2, giving S2(n)=(n1)+3(n2)+2(n3)=n+3n(n1)2+n(n1)(n2)3=n(n+1)(2n+1)6S_2(n) = \binom{n}{1} + 3\binom{n}{2} + 2\binom{n}{3} = n + \frac{3n(n-1)}{2} + \frac{n(n-1)(n-2)}{3} = \frac{n(n+1)(2n+1)}{6}.

In competitive programming and puzzle mathematics, the “difference table test” is a standard first move when encountering an unfamiliar integer sequence.

# Euler-Maclaurin summation

The operator identity Δ=eD1\Delta = e^D - 1 bridges discrete and continuous calculus directly. Expanding Δ1\Delta^{-1} as a formal series in DD yields the Euler-Maclaurin formula,11 which converts discrete sums into integrals plus computable corrections involving Bernoulli numbers. It gives the asymptotic expansion of the harmonic numbers (1/klnn+γ+12n\sum 1/k \approx \ln n + \gamma + \frac{1}{2n} - \cdots), Stirling’s approximation for n!n!, and sharp estimates of partial sums of ζ(s)\zeta(s).

# The Norlund-Rice integral

Many alternating sums in combinatorics have the form k=0n(nk)(1)kf(k)\sum_{k=0}^{n} \binom{n}{k} (-1)^k f(k), which is exactly Δnf(0)\Delta^n f(0). When ff extends to a meromorphic function with polynomial growth, the Norlund-Rice integral converts the discrete sum into a contour integral.A meromorphic function is analytic (has a convergent Taylor series) everywhere except at isolated poles – rational functions, Γ(z)\Gamma(z), and 1/(z+1)1/(z+1) are typical examples.

k=0n(nk)(1)kf(k)=(1)n2πin!f(z)z(z1)(zn)dz\sum_{k=0}^{n} \binom{n}{k} (-1)^k f(k) = \frac{(-1)^n}{2\pi i} \oint \frac{n!\, f(z)}{z(z-1)\cdots(z-n)}\,dz

The contour encloses 0,1,,n0, 1, \ldots, n. Besides these poles (which reconstruct the original sum by residues), the integrand may have other poles from f(z)f(z). Deforming the contour to infinity and picking up only those other poles often gives a closed form; the polynomial growth condition on ff is exactly what ensures the integral over the large contour vanishes as the radius grows. For f(k)=1/(k+1)f(k) = 1/(k+1), the extra pole at z=1z = -1 yields (nk)(1)kk+1=1n+1\sum \binom{n}{k} \frac{(-1)^k}{k+1} = \frac{1}{n+1} directly.

Flajolet and Sedgewick use this extensively in Analytic Combinatorics5; average-case complexity of search trees, hashing, and digital structures often reduces to alternating sums over binomial coefficients, which are entries on the left edge of the difference table of (f(0),f(1),f(2),)(f(0), f(1), f(2), \ldots).

# Umbral Calculus

There is a strange algebraic trick that has been making mathematicians uncomfortable since the 19th century. Write BnB^n as a formal symbol, and impose the rule that after expanding any expression, you “lower the index”: replace BnB^n with BnB_n, the nn-th Bernoulli number. Then identities like

(B+1)n=Bn(n2)(B+1)^n = B^n \quad (n \geq 2)

hold, where both sides are evaluated by the lowering rule. This is the umbral calculus: treat sequence indices as if they were exponents. The name comes from the Latin umbra (shadow) – coined by Sylvester in the 1850s because the technique seemed like dark magic. You pretend indices are exponents, apply the binomial theorem as if it were legitimate, lower the indices back, and the answer is correct. For a century it worked without anyone knowing why. Blissard formalized the notation in 1861; Rota finally explained the mechanism in the 1970s.

To see the lowering rule in action, verify the identity (B+1)3=B3(B+1)^3 = B^3. Expand the left side by the binomial theorem: (B+1)3=B3+3B2+3B1+B0(B+1)^3 = B^3 + 3B^2 + 3B^1 + B^0. Now lower indices: B3+3B2+3B1+B0B_3 + 3B_2 + 3B_1 + B_0. The Bernoulli numbers are B0=1B_0 = 1, B1=1/2B_1 = -1/2, B2=1/6B_2 = 1/6, B3=0B_3 = 0. So the left side becomes 0+316+3(12)+1=1232+1=0=B30 + 3 \cdot \frac{1}{6} + 3 \cdot (-\frac{1}{2}) + 1 = \frac{1}{2} - \frac{3}{2} + 1 = 0 = B_3. It works because the Bernoulli numbers satisfy the recurrence k=0n(nk)Bk=Bn\sum_{k=0}^{n} \binom{n}{k} B_k = B_n for n2n \ge 2 – which is exactly what (B+1)n=Bn(B+1)^n = B^n says after lowering.

Why does this work systematically? Rota, Kahaner, and Odlyzko14 gave the foundational treatment of the operator-theoretic framework (delta operators, Sheffer sequences) that explains why; Roman3 made it fully rigorous. Rota observed that the lowering rule is a linear functional LL on polynomials defined by L[xn]=BnL[x^n] = B_n. The “umbral identity” (B+1)n=Bn(B+1)^n = B^n is really L[(x+1)n]=L[xn]L[(x+1)^n] = L[x^n], which holds because the Bernoulli numbers are defined by exactly this shift relation. The trick generalizes: any sequence (an)(a_n) defines a linear functional L[xn]=anL[x^n] = a_n, and identities among the ana_n become polynomial identities under LL.

The connection to difference tables is direct. The forward difference operator Δ\Delta satisfies Δxn=nxn1\Delta x^{\underline{n}} = n \cdot x^{\underline{n-1}}, mirroring ddxxn=nxn1\frac{d}{dx} x^n = n x^{n-1}. Newton’s forward difference formula is the expansion of a polynomial in the basis (xn)=xn/n!\binom{x}{n} = x^{\underline{n}}/n!, just as Taylor’s theorem expands in the basis xn/n!x^n/n!. This is not a coincidence: both Δ\Delta and d/dxd/dx are examples of operators that reduce polynomial degree by 1 and commute with shifts, and any such operator produces its own “Taylor expansion” with its own natural basis. The difference table is the discrete instance of this structure.

# A Puzzle

Here’s a sequence: 2, 5, 10, 17, 26, 37, …

Build its difference table. What degree polynomial generates it? Can you find the formula?

Solution
2    5   10   17   26   37
   3    5    7    9   11
      2    2    2    2
         0    0    0

Second differences are constant, so it’s quadratic.

Left edge: 2, 3, 2. Using Newton’s formula:

an=2(n0)+3(n1)+2(n2)=2+3n+n(n1)=n2+2n+2a_n = 2\binom{n}{0} + 3\binom{n}{1} + 2\binom{n}{2} = 2 + 3n + n(n-1) = n^2 + 2n + 2

Check: a0=2a_0 = 2, a1=5a_1 = 5, a2=10a_2 = 10. Correct.

# Open Connections

# p-adic analysis and Mahler’s theorem

Mahler (1958)8 proved that every continuous function f:ZpQpf: \mathbb{Z}_p \to \mathbb{Q}_p on the p-adic integersThe pp-adic integers Zp\mathbb{Z}_p are a completion of the ordinary integers under a different notion of “closeness”: two numbers are pp-adically close if their difference is divisible by a high power of pp. So 11 and 1+2101 + 2^{10} are very close 2-adically, even though they’re far apart on the number line. has a unique expansion

f(x)=n=0Δnf(0)(xn)f(x) = \sum_{n=0}^{\infty} \Delta^n f(0) \cdot \binom{x}{n}

and this series converges for all xZpx \in \mathbb{Z}_p if and only if Δnf(0)0\Delta^n f(0) \to 0 in the pp-adic absolute value. This is the same Newton series formula, but the convergence condition is pp-adic rather than archimedean.

A concrete example: consider f(x)=(1)xf(x) = (-1)^x as a function on Z2\mathbb{Z}_2 (the 2-adic integers). Its difference table has Δnf(0)=k(nk)(1)nk(1)k=(1)nk(nk)=(1)n2n=(2)n\Delta^n f(0) = \sum_k \binom{n}{k}(-1)^{n-k}(-1)^k = (-1)^n \sum_k \binom{n}{k} = (-1)^n \cdot 2^n = (-2)^n. Concretely: Δf(0)=f(1)f(0)=2\Delta f(0) = f(1) - f(0) = -2, Δ2f(0)=f(2)2f(1)+f(0)=1+2+1=4\Delta^2 f(0) = f(2) - 2f(1) + f(0) = 1 + 2 + 1 = 4, Δ3f(0)=8\Delta^3 f(0) = -8, and in general Δnf(0)=(2)n\Delta^n f(0) = (-2)^n. In the real numbers, (2)n|(-2)^n| \to \infty, so the Newton series diverges. But 2-adically, (2)n2=2n0|(-2)^n|_2 = 2^{-n} \to 0, so the series converges. The function (1)x(-1)^x, which makes no sense for non-integer real arguments, extends continuously to all of Z2\mathbb{Z}_2 via its difference table.

# Difference tables over finite fields

Over Fp\mathbb{F}_p, difference tables behave differently. Since (pk)0(modp)\binom{p}{k} \equiv 0 \pmod{p} for 0<k<p0 < k < p, the pp-th difference operator reduces to Δp=EpI\Delta^p = E^p - I: a single shift. Difference tables over finite fields are eventually periodic rather than eventually zero, which means the “constant differences imply polynomial” test breaks down at the wrap-around scale.

# Why differencing amplifies noise

Under the discrete Fourier transform, Δ\Delta is multiplication by ωk1\omega^k - 1 in frequency space (where ω=e2πi/N\omega = e^{2\pi i/N}). Low-frequency modes have ωk1<1|\omega^k - 1| < 1 and decay under differencing – this is why polynomials (dominated by low frequencies) have terminating difference tables. But at the Nyquist frequency k=N/2k = N/2, the factor is ωN/21=2|\omega^{N/2} - 1| = 2. After nn applications of Δ\Delta, the highest-frequency component is amplified by 2n2^n.

This is why differencing a noisy empirical sequence more than a few times produces garbage – and why Babbage’s Difference Engine was designed for exact integer arithmetic, not floating-point approximation.

# References

[1] Graham, R.L., Knuth, D.E., and Patashnik, O. Concrete Mathematics: A Foundation for Computer Science. 2nd ed. Addison-Wesley, 1994. Chapters 2 and 5 cover difference operators, Stirling numbers, and the Newton basis systematically.

[2] Jordan, C. Calculus of Finite Differences. 3rd ed. Chelsea, 1965. The classical reference. Encyclopedic coverage of difference operators, interpolation, summation, and their applications.

[3] Roman, S. The Umbral Calculus. Academic Press, 1984. Rota’s program made rigorous. Sheffer sequences, delta operators, and the algebraic foundations of the calculus of finite differences.

[4] Riordan, J. Combinatorial Identities. Wiley, 1968. The working reference for binomial transform identities and their combinatorial proofs.

[5] Flajolet, P. and Sedgewick, R. Analytic Combinatorics. Cambridge University Press, 2009. Chapter III covers the Norlund-Rice integral and its applications to the analysis of algorithms.

[6] Knuth, D.E. The Art of Computer Programming, Vol. 1. 3rd ed. Addison-Wesley, 1997. Section 1.2.6 covers finite differences and their use in numerical computation.

[7] Boole, G. A Treatise on the Calculus of Finite Differences. 2nd ed. Macmillan, 1872. The first systematic treatment. Available on the Internet Archive.

[8] Mahler, K. “An interpolation series for continuous functions of a pp-adic variable.” Journal fur die reine und angewandte Mathematik, 199:23–34, 1958.

[9] Stanley, R.P. Enumerative Combinatorics, Vol. 1. 2nd ed. Cambridge University Press, 2012. Chapter 1 covers Stirling numbers and their combinatorial interpretations in full generality.

[10] OEIS Wiki: Transforms. Living reference for binomial transforms and their relationships across the OEIS.

[11] Spivey, M.Z. “The Euler-Maclaurin formula and sums of powers.” Mathematics Magazine, 79(1):61–65, 2006. A clean derivation of Euler-Maclaurin via the operator formalism.

[12] Beery, J. & Stedall, J. Thomas Harriot’s Doctrine of Triangular Numbers: The “Magisteria Magna”. European Mathematical Society, 2009.

[13] Croarken, M. Early Scientific Computing in Britain. Oxford University Press, 1990.

[14] Rota, G.-C., Kahaner, D., and Odlyzko, A. “Finite Operator Calculus.” Journal of Mathematical Analysis and Applications, 42:684–760, 1973. The foundational paper introducing delta operators and Sheffer sequences, explaining the algebraic structure underlying umbral calculus.