What Differences Reveal

December 9, 2024

Repeatedly subtracting consecutive terms of an integer sequence produces a difference table that functions as exact discrete calculus: the forward difference operator $\Delta$ mirrors $d/dx$ , falling factorials replace monomials, and Newton’s formula reconstructs any polynomial from the table’s left edge using binomial coefficients. The binomial transform of the left-edge values sends Fibonacci to itself with alternating signs – a fixed-point property that falls out of $\varphi - 1 = 1/\varphi$ .

Take any sequence of integers. Subtract each term from the next. You get a new sequence. Do it again. And again. What emerges is a difference table, and it reveals far more about your sequence than you might expect.

This is not numerical differentiation. There is no approximation, no truncation error, no floating-point noise (on integer sequences). What follows is an exact discrete calculus where $\Delta$ plays the role of $d/dx$ , falling factorials replace monomials, and every theorem of continuous calculus has a discrete twin.

# A Simple Game

Let’s play with the triangular numbers: 1, 3, 6, 10, 15, 21, …

These are figurate numbers: they count dots in regular geometric arrangements. The triangular numbers count dots in triangles:

    •          •          •
              • •        • •
                        • • •
    1          3          6

Now build the difference table. Write the sequence, then below each pair write their difference:

Difference table of the triangular numbers — The triangular numbers differenced: first differences are the counting numbers, second differences are constant at 1, third differences are zero. The left edge reconstructs the sequence via Newton

The first differences are the counting numbers. The second differences are all 1. The third differences are all 0.

This is no accident. The triangular numbers are given by $T_n = \frac{n(n+1)}{2}$ , a quadratic polynomial – and every quadratic has constant second differences.

# Background

The idea of systematically differencing a sequence is old. Newton used it in the 1660s for interpolation, but the roots go back further, to James Gregory (1668) and Henry Briggs (1624), who built logarithm tables by the method of finite differences. Boole7 gave the first systematic treatment in 1860; Jordan2 wrote the classical reference that most 20th-century work builds on. Thomas Harriot had the essential idea even earlier, in unpublished manuscripts from the 1610s (recovered in Beery and Stedall, 200912).

The practical motive was always numerical tables. Briggs computed his Arithmetica Logarithmica (1624) by exploiting the fact that consecutive values of a polynomial can be computed by addition alone (no multiplication needed) if you maintain a running difference table. The same idea powered the Nautical Almanac computations for two centuries, and it’s what Babbage later mechanized in his Difference Engine.6

Newton’s contribution was to see beyond the computational trick: he formulated the general interpolation formula and recognized the connection to the binomial series. In a 1676 letter to Leibniz (the Epistola Prior), Newton described how any function tabulated at integer points could be approximated by its differences, extending finite differences from a computational device to a theoretical tool.1

# Operators: $\Delta$ , $E$ , and $D$

The analogy with continuous calculus is not a coincidence. Define the forward difference operator $\Delta$ by

\Delta f(x) = f(x+1) - f(x)

and the shift operator $E$ by

E f(x) = f(x+1).

Then $\Delta = E - I$ , where $I$ is the identity operator. This is the discrete analogue of the derivative. Where $\frac{d}{dx}$ measures instantaneous rate of change, $\Delta$ measures the change over a unit step.

The parallel runs deep. In continuous calculus, $\frac{d}{dx} x^n = n x^{n-1}$ . In discrete calculus, $\Delta$ acts cleanly not on ordinary powers $x^n$ but on falling factorials,

x^{\underline{n}} = x(x-1)(x-2)\cdots(x-n+1)

The rule is $\Delta x^{\underline{n}} = n \cdot x^{\underline{n-1}}$ , a perfect mirror of the power rule. The falling factorial plays the role of $x^n$ in the discrete world, and the binomial coefficient $\binom{x}{n} = \frac{x^{\underline{n}}}{n!}$ plays the role of $\frac{x^n}{n!}$ .

Once you accept falling factorials as the natural basis, every theorem of continuous calculus has an exact discrete twin. The correspondence is not a loose analogy. It is a dictionary:

Continuous	Discrete
Derivative $\frac{d}{dx}$	Forward difference $\Delta$
$x^n$	Falling factorial $x^{\underline{n}}$
$\frac{d}{dx} x^n = n x^{n-1}$	$\Delta x^{\underline{n}} = n \cdot x^{\underline{n-1}}$
$\int x^n dx = \frac{x^{n+1}}{n+1} + C$	$\Delta^{-1} x^{\underline{n}} = \frac{x^{\underline{n+1}}}{n+1} + C$
Taylor series: $f(x) = \sum \frac{f^{(k)}(0)}{k!} x^k$	Newton series: $f(n) = \sum \Delta^k f(0) \cdot \binom{n}{k}$
$\frac{x^n}{n!}$	$\binom{x}{n} = \frac{x^{\underline{n}}}{n!}$
$e^x$ (eigenfunction of $\frac{d}{dx}$ )	$2^x$ (eigenfunction of $\Delta$ : $\Delta 2^x = 2^x$ )

The last row explains why powers of 2 have self-similar difference tables: $\Delta$ acts on $2^x$ the way $\frac{d}{dx}$ acts on $e^x$ : it returns the function unchanged. More precisely, any $a^x$ is an eigenfunction of $\Delta$ with eigenvalue $a - 1$ ; the choice $a = 2$ is special because it gives eigenvalue 1.

Higher-order differences compose exactly as you’d expect:

\Delta^n f(x) = \sum_{k=0}^{n} \binom{n}{k} (-1)^{n-k} f(x+k)

This follows from expanding $\Delta^n = (E - I)^n$ by the binomial theorem – which works because $E$ and $I$ commute. The formula says: to compute the $n$ -th difference, take an alternating weighted sum of $n+1$ consecutive values. This is exactly what building the difference table row by row computes, just expressed in closed form.

The operator-theoretic viewpoint extends further.1 The formal relation between $\Delta$ and the continuous derivative $D = \frac{d}{dx}$ is $\Delta = e^D - 1$ , since $Ef(x) = e^D f(x) = f(x+1)$ by Taylor’s theorem. Inverting: $D = \ln(1 + \Delta) = \Delta - \frac{\Delta^2}{2} + \frac{\Delta^3}{3} - \cdots$ . These formal power series in operators converge when applied to polynomials (which are annihilated by sufficiently high powers of $\Delta$ or $D$ ), and they form the basis of the operator method in combinatorics.

The backward difference operator $\nabla f(x) = f(x) - f(x-1)$ and the central difference operator $\delta f(x) = f(x + \tfrac{1}{2}) - f(x - \tfrac{1}{2})$ give rise to their own families of formulas (Stirling’s interpolation, Gauss’s formulas), but the forward difference is the most natural starting point.

# The General Pattern

For any polynomial of degree $d$ , the $d$ -th differences are constant, and the $(d+1)$ -th differences are zero.

Try the cubes: 0, 1, 8, 27, 64, 125…

Difference table of cubes with left edge and constant row highlighted — The difference table of n^3. Blue: left edge (Newton coefficients 0, 1, 6, 6, 0). Yellow: constant third differences at 6 = 3!. Fourth differences are zero – confirming degree 3.

Third differences constant at 6. Fourth differences zero. The cubes are cubic polynomials – degree 3.

The figurate number pattern continues: the square pyramidal numbers 1, 5, 14, 30, 55, 91 count balls stacked in a square pyramid. Each layer is a square number, so the pyramidal numbers are partial sums of squares: $P_n = \sum_{k=1}^{n} k^2$ . Their third differences are constant at 2, confirming degree 3 (the sum of a degree-2 sequence is degree 3, exactly as the discrete antidifference predicts).

This works in reverse too. If someone hands you a mystery sequence and its third differences are constant, you know it’s generated by a cubic polynomial. You can even reconstruct the formula.

Why does the $d$ -th difference of a degree- $d$ polynomial come out constant? Because the operator $\Delta$ reduces degree by exactly 1. To see this: if $f(x) = cx^d + \text{lower terms}$ , then $\Delta f(x) = c[(x+1)^d - x^d] = c[dx^{d-1} + \text{lower terms}]$ , so $\deg(\Delta f) = d - 1$ with leading coefficient $cd$ . Applying this $d$ times, $\Delta^d f(x) = c \cdot d!$ , a constant. One more application gives $\Delta^{d+1} f(x) = 0$ . For the cubes, $f(x) = x^3$ with $c = 1$ , so $\Delta^3 f(x) = 3! = 6$ .

This gives a practical test: if the $k$ -th row of a difference table is constant, the sequence is generated by a polynomial of degree exactly $k$ . The converse holds too: non-polynomial sequences (exponentials, factorials, primes) never terminate. (For primes: the $n$ -th prime $p_n$ grows as $n \ln n$ by the prime number theorem, which is not a polynomial in $n$ , so no finite-degree polynomial can reproduce the prime sequence.)

The calculus analogy extends to summation. Just as integration undoes differentiation, the antidifference $\Delta^{-1}$ (also called the indefinite sum) undoes $\Delta$ . Since $\Delta x^{\underline{n}} = n \cdot x^{\underline{n-1}}$ , we get $\Delta^{-1} x^{\underline{n}} = \frac{x^{\underline{n+1}}}{n+1} + C$ – the discrete analogue of $\int x^n dx = \frac{x^{n+1}}{n+1} + C$ . This gives closed forms for discrete sums: $\sum_{k=0}^{n-1} k^{\underline{2}} = \sum_{k=0}^{n-1} k(k-1) = \frac{n(n-1)(n-2)}{3}$ , which you can verify is $\frac{x^{\underline{3}}}{3}\big|_0^n$ .

Here is a short Python function that computes the difference table:

def difference_table(seq):
    """Return the difference table as a list of rows."""
    table = [list(seq)]
    while len(table[-1]) > 1:
        row = table[-1]
        table.append([row[i+1] - row[i] for i in range(len(row) - 1)])
    return table

# Triangular numbers
for row in difference_table([1, 3, 6, 10, 15, 21, 28]):
    print(row)
# [1, 3, 6, 10, 15, 21, 28]
# [2, 3, 4, 5, 6, 7]
# [1, 1, 1, 1, 1]
# [0, 0, 0, 0]

# Newton’s Forward Difference Formula

Any sequence can be reconstructed from its difference table. Specifically:

a_n = \sum_{k=0}^{n} \binom{n}{k} \Delta^k a_0

where $\Delta^k a_0$ is the $k$ -th entry on the left edge of the difference table.

This is Newton’s forward difference formula. It says: to reconstruct a sequence, take the left edge of its difference table and weight by binomial coefficients.

# Derivation from the operator identity

The formula follows directly from the relation $E = I + \Delta$ . Applying $E^n$ to $f$ at $x = 0$ :

f(n) = E^n f(0) = (I + \Delta)^n f(0) = \sum_{k=0}^{n} \binom{n}{k} \Delta^k f(0)

That’s it. The binomial theorem for operators gives Newton’s interpolation formula in one line.

# Example: reconstruct a polynomial

For the triangular numbers, the left edge is 1, 2, 1, 0, 0, … So:

T_n = \binom{n}{0} \cdot 1 + \binom{n}{1} \cdot 2 + \binom{n}{2} \cdot 1 = 1 + 2n + \frac{n(n-1)}{2}

Simplify and you get $\frac{n^2 + 3n + 2}{2} = \frac{(n+1)(n+2)}{2}$ , which is $T_{n+1}$ . The $+1$ shift is because our sequence starts at $a_0 = T_1 = 1$ , not $T_0 = 0$ .

For the cubes, the table starts at $a_0 = 1 = 1^3$ , so our sequence is really $f(n) = (n+1)^3$ . The left edge is 1, 7, 12, 6, 0, 0, … and Newton’s formula gives:

f(n) = 1 \cdot \binom{n}{0} + 7 \cdot \binom{n}{1} + 12 \cdot \binom{n}{2} + 6 \cdot \binom{n}{3}

Expanding: $1 + 7n + 6n(n-1) + n(n-1)(n-2) = 1 + 7n + 6n^2 - 6n + n^3 - 3n^2 + 2n = n^3 + 3n^2 + 3n + 1 = (n+1)^3$ . Correct.

# Connection to polynomial interpolation

Newton’s formula is an interpolation formula: given $d+1$ values of a degree- $d$ polynomial, it reconstructs the polynomial exactly. The representation is in the Newton basis $\{\binom{x}{0}, \binom{x}{1}, \binom{x}{2}, \ldots\}$ rather than the monomial basis $\{1, x, x^2, \ldots\}$ .

The Newton basis has a computational advantage: adding one more data point only requires computing one more difference, appending one more term. With the monomial basis (Vandermonde system), you’d need to re-solve the entire system. This is why Newton’s formula was the workhorse of numerical tables for three centuries, from Briggs’s logarithms to the Nautical Almanac.

There is a subtlety: Newton’s formula interpolates exactly through the given points $a_0, a_1, \ldots, a_d$ using a polynomial of degree $d$ . But if the sequence is not polynomial, the formula still produces a value for $a_n$ at non-integer $n$ ; it just gives the unique polynomial that passes through the first $d+1$ data points. For non-polynomial sequences, the approximation degrades as $n$ moves far from the data. This is the discrete version of the Runge phenomenon in polynomial interpolation: high-degree fits to equispaced points can oscillate wildly between the data points, and more data can make the interpolant worse.Switching to the Newton basis does not cure this. Lagrange, Newton, and Hermite interpolation all produce the same polynomial through the same points; Runge instability is a property of the interpolating polynomial itself, driven by node spacing and degree, not by which basis is used to represent it. See Trefethen, Approximation Theory and Approximation Practice, ch. 11.

For polynomial sequences, the formula is exact for all $n$ , not just the tabulated values. This is the basis of the difference table test: if the $d$ -th differences are constant, Newton’s formula with $d+1$ terms reproduces the sequence exactly and extends it to all integers (and even to non-integer arguments).

Here is a Python function that reconstructs a sequence from its left edge:

from math import comb

def newton_interpolate(left_edge, n):
    """Evaluate Newton's forward difference formula at n."""
    return sum(c * comb(n, k) for k, c in enumerate(left_edge))

# Recover triangular numbers from left edge [1, 2, 1]
print([newton_interpolate([1, 2, 1], n) for n in range(7)])
# [1, 3, 6, 10, 15, 21, 28]

# The Binomial Transform

The operation of “take all the differences and read down the left edge” has a name: the binomial transform.

Given a sequence $(a_0, a_1, a_2, \ldots)$ , its binomial transform is the sequence of values at the left edge of the difference table:

b_n = \Delta^n a_0 = \sum_{k=0}^{n} \binom{n}{k} (-1)^{n-k} a_k

# Formal definition and inverse

The binomial transform can be written in matrix form. Let $\mathbf{b} = B \mathbf{a}$ , where $B$ is the lower-triangular matrix with entries $B_{nk} = \binom{n}{k}(-1)^{n-k}$ . The inverse is clean: to recover the original sequence, apply the unsigned binomial transform:

a_n = \sum_{k=0}^{n} \binom{n}{k} b_k

This is Newton’s forward difference formula: the left edge of the difference table (the $b_k$ ) reconstructs the original sequence when weighted by binomial coefficients. The forward (signed) and inverse (unsigned) transforms are related by $B^{-1}_{nk} = \binom{n}{k}$ . To verify the inversion, compose the two. Apply the signed transform to the unsigned:

\sum_k \binom{n}{k}(-1)^{n-k}\sum_j \binom{k}{j} c_j = \sum_j c_j \underbrace{\sum_k \binom{n}{k}\binom{k}{j}(-1)^{n-k}}_{\delta_{nj}}

The inner sum vanishes for $n \ne j$ because $\sum_k \binom{n-j}{k-j}(-1)^{n-k}$ is the alternating row sum of Pascal’s triangle, which is zero unless the row has length 1.

(Some authors define the binomial transform without the signs: $b_n = \sum_k \binom{n}{k} a_k$ . Under that convention, the signed version is the inverse. The two conventions are conjugates of each other, differing by whether you call the “forward” direction signed or unsigned.)

# Examples

Powers of 2. The sequence $a_n = 2^n$ gives $b_n = \sum_{k=0}^{n} \binom{n}{k}(-1)^{n-k} 2^k = (2-1)^n = 1$ . So the binomial transform of $(1, 2, 4, 8, 16, \ldots)$ is $(1, 1, 1, 1, 1, \ldots)$ .

Inversely, $(1, 1, 1, \ldots)$ transforms to $(1, 2, 4, 8, \ldots)$ . The all-ones sequence and the powers of 2 are binomial transform pairs.

Powers of 3. Similarly, $b_n = (3-1)^n = 2^n$ . So the transform of $(1, 3, 9, 27, \ldots)$ is $(1, 2, 4, 8, \ldots)$ . Chains form: all-ones $\leftrightarrow$ powers of 2 $\leftrightarrow$ powers of 3 $\leftrightarrow$ … where each application of the binomial transform subtracts 1 from the base. More generally, the binomial transform of $((a+1)^n)$ is $(a^n)$ , since $\sum_k \binom{n}{k}(-1)^{n-k}(a+1)^k = ((a+1)-1)^n = a^n$ .

Bell numbers. The Bell numbers $B_n$ (1, 1, 2, 5, 15, 52, 203, …) count the total number of set partitions of $\lbrace 1, \ldots, n \rbrace$ . They satisfy $B_{n+1} = \sum_{k=0}^{n} \binom{n}{k} B_k$ – the unsigned binomial transform of $(B_n)$ is the shifted sequence $(B_{n+1})$ . This recurrence says: to partition $\lbrace 1, \ldots, n+1 \rbrace$ , choose which $k$ elements share a block with $n+1$ (the $\binom{n}{k}$ factor) and partition the remaining $n-k$ elements (the $B_{n-k}$ factor, which after reindexing gives $B_k$ ). The same recurrence emerges from the exponential generating function: the Bell EGF is $B(x) = e^{e^x - 1}$ , and differentiating gives $B'(x) = e^x B(x)$ ; expanding $e^x = \sum x^k/k!$ and equating coefficients recovers exactly the binomial-transform recurrence.

Catalan numbers. The Catalan numbers $C_n = \frac{1}{n+1}\binom{2n}{n}$ (1, 1, 2, 5, 14, 42, …) have a binomial transform that gives the central Delannoy numbers, which count lattice paths with diagonal steps allowed. Riordan4 is the working reference for these identities and their combinatorial proofs.

# Connection to generating functions

If $A(x) = \sum a_n x^n$ is the ordinary generating function (OGF) of $(a_n)$ , and $(b_n)$ is its binomial transform, then

\sum b_n x^n = \frac{1}{1+x} A\!\left(\frac{x}{1+x}\right).

For exponential generating functions (EGFs), the relationship is cleaner. If $\hat{A}(x) = \sum a_n \frac{x^n}{n!}$ , then

\sum b_n \frac{x^n}{n!} = e^{-x} \hat{A}(x).

Multiplication by $e^{-x}$ in the EGF world corresponds to the binomial transform in the coefficient world. This is one reason EGFs are so natural for combinatorics: many transforms that look complicated on OGFs become simple operations on EGFs.

In the OGF world, the forward transform is the substitution $x \mapsto \frac{x}{1+x}$ combined with a factor of $\frac{1}{1+x}$ , and the inverse substitution $x \mapsto \frac{x}{1-x}$ with factor $\frac{1}{1-x}$ undoes it.

# Fibonacci’s Self-Similarity

Most sequences lose their identity under the binomial transform: powers of 2 become all-ones, cubes become something unrecognizable. The Fibonacci sequence does something stranger:

Fibonacci difference table with left edge highlighted — The left edge of Fibonacci

Look at the left edge: 0, 1, -1, 2, -3, 5, -8, 13, -21…

It’s the Fibonacci numbers again, with alternating signs! The binomial transform of Fibonacci is (essentially) Fibonacci.

Why? The Fibonacci recurrence $F_n = F_{n-1} + F_{n-2}$ has characteristic roots $\phi = \frac{1+\sqrt{5}}{2}$ and $\psi = \frac{1-\sqrt{5}}{2}$ , with the closed form $F_n = \frac{\phi^n - \psi^n}{\sqrt{5}}$ . The binomial transform sends $\alpha^n \mapsto (\alpha - 1)^n$ . The key step is computing $\phi - 1 = \frac{\sqrt{5}-1}{2} = -\psi$ and $\psi - 1 = \frac{-\sqrt{5}-1}{2} = -\phi$ . (Both follow from the defining property $\phi + \psi = 1$ .) So:

b_n = \frac{(\phi-1)^n - (\psi-1)^n}{\sqrt{5}} = \frac{(-\psi)^n - (-\phi)^n}{\sqrt{5}} = (-1)^{n+1} \frac{\phi^n - \psi^n}{\sqrt{5}} = (-1)^{n+1} F_n

The binomial transform of $F_n$ is $(-1)^{n+1} F_n$ . Applying the transform twice returns the original sequence: $F_n \mapsto (-1)^{n+1} F_n \mapsto (-1)^{n+1}(-1)^{n+1} F_n = F_n$ . The Fibonacci sequence is a fixed point of the double binomial transform – a self-inverse up to sign alternation. This algebraic self-similarity is a shadow of the golden ratio’s defining property $\phi^2 = \phi + 1$ , which gives $\phi - 1 = 1/\phi$ .

More precisely: any sequence whose Binet-style closed form involves roots $\alpha$ satisfying $\alpha(\alpha - 1) = \pm 1$ will be an eigensequence (or near-eigensequence) of the binomial transform. The golden ratio satisfies $\phi(\phi - 1) = 1$ , which is why Fibonacci works. The Lucas numbers $L_n = \phi^n + \psi^n$ have the same property: their binomial transform is $(-1)^n L_n$ .

# OEIS Treasures

The On-Line Encyclopedia of Integer Sequences catalogs over 380,000 sequences. Many entries note when one sequence is the binomial transform of another.10

One favorite: A000079 (Powers of 2), 1, 2, 4, 8, 16, 32…

1    2    4    8   16   32
   1    2    4    8   16
      1    2    4    8
         1    2    4
            1    2
               1

Every row is the same sequence. The difference table is self-similar because $2^n - 2^{n-1} = 2^{n-1}$ – differencing just shifts the sequence. This is the table-level manifestation of the binomial transform pair we derived above: the all-ones sequence and the powers of 2 are transform partners.

# Stirling Numbers

The connection between ordinary powers $x^n$ and the Newton basis $\binom{x}{k}$ runs through the Stirling numbers.9

# Stirling numbers of the second kind

The Stirling number of the second kind, written $S(n,k)$ or $\lbrace{n \atop k}\rbrace$ , counts the number of ways to partition a set of $n$ elements into exactly $k$ non-empty subsets.

The connection to difference tables: applying $\Delta^k$ to $x^n$ and evaluating at $x = 0$ gives

\Delta^k x^n \big|_{x=0} = \sum_{j=0}^{k} \binom{k}{j}(-1)^{k-j} j^n = S(n,k) \cdot k!

The left side is the $k$ -th entry on the left edge of the difference table of the sequence $0^n, 1^n, 2^n, 3^n, \ldots$ . Why does the right side equal $k! \cdot S(n,k)$ ? Count surjections from $[n]$ to $[k]$ by inclusion-exclusion: $\sum_{j=0}^{k} \binom{k}{j}(-1)^{k-j} j^n$ counts functions minus those missing at least one target, plus those missing at least two, and so on. Each surjection is an ordered partition into $k$ non-empty blocks, and there are $k!$ orderings per partition. So the alternating sum equals $k! \cdot S(n,k)$ , and the difference table of $n$ -th powers encodes Stirling numbers.

This is why the change-of-basis formula works:

x^n = \sum_{k=0}^{n} S(n,k) \cdot k! \cdot \binom{x}{k} = \sum_{k=0}^{n} S(n,k) \cdot x^{\underline{k}}

Taking differences of a polynomial converts from the “power basis” to the “falling factorial basis,” and the conversion coefficients are exactly the Stirling numbers of the second kind.

# Stirling numbers of the first kind

The Stirling numbers of the first kind, written $s(n,k)$ or $\left[{n \atop k}\right]$ (unsigned), perform the reverse conversion.Notation is not standardized: Stanley’s Enumerative Combinatorics uses signed $s(n,k)$ with the sign encoded in the change-of-basis formula; Knuth’s Concrete Mathematics uses the unsigned bracket $\left[{n \atop k}\right]$ , absorbing the sign into the convention. Knuth documents this fragmentation in “Two Notes on Notation” (1992).

x^{\underline{n}} = \sum_{k=0}^{n} s(n,k) \cdot (-1)^{n-k} \cdot x^k

Their combinatorial meaning: $\left[{n \atop k}\right]$ counts the number of permutations of $n$ elements with exactly $k$ cycles.

For instance, $\left[{4 \atop 2}\right] = 11$ : among the 24 permutations of $\lbrace 1,2,3,4 \rbrace$ , exactly 11 have two cycles. (Examples: $(1)(234)$ , $(12)(34)$ , etc.)

The two kinds of Stirling numbers are inverses as change-of-basis matrices:

\sum_{j} S(n,j) \cdot s(j,k) \cdot (-1)^{j-k} = \delta_{nk}

This is the discrete analogue of the fact that differentiation and integration are inverse operations. The first kind converts from falling factorials to powers; the second kind converts from powers to falling factorials. Together they mediate between the two natural bases for polynomial sequences.

For fixed $k$ , $S(n,k) \sim \frac{k^n}{k!}$ as $n \to \infty$ . The number of ways to partition a large set into $k$ blocks is approximately the number of surjections from $n$ to $k$ divided by $k!$ (correcting for block ordering). The Stirling numbers of the first kind have a more delicate asymptotic: $\left[{n \atop k}\right] \sim \frac{n!}{n} \cdot \frac{(\ln n)^{k-1}}{(k-1)!}$ for fixed $k$ , reflecting the logarithmic growth of the number of cycles in a random permutation.

# A small table

The Stirling numbers of the second kind for small $n$ , $k$ :

n\k   0   1   2   3   4   5
 0    1
 1    0   1
 2    0   1   1
 3    0   1   3   1
 4    0   1   7   6   1
 5    0   1  15  25  10   1

Each entry satisfies the recurrence $S(n,k) = k \cdot S(n-1,k) + S(n-1,k-1)$ : either the $n$ -th element joins one of the $k$ existing subsets ( $k$ choices, hence the factor of $k$ ), or it starts a new subset of its own.

# Worked example: $x^3$ in the Newton basis

Let’s verify the change-of-basis formula for $x^3$ . We need the Stirling numbers $S(3,1) = 1$ , $S(3,2) = 3$ , $S(3,3) = 1$ from the table above. The formula gives:

x^3 = S(3,1) \cdot 1! \cdot \binom{x}{1} + S(3,2) \cdot 2! \cdot \binom{x}{2} + S(3,3) \cdot 3! \cdot \binom{x}{3}

= 1 \cdot x + 6 \cdot \frac{x(x-1)}{2} + 6 \cdot \frac{x(x-1)(x-2)}{6} = x + 3x(x-1) + x(x-1)(x-2)

Expanding: $x + 3x^2 - 3x + x^3 - 3x^2 + 2x = x^3$ . Correct.

Notice that the coefficients $1! \cdot S(3,1) = 1$ , $2! \cdot S(3,2) = 6$ , $3! \cdot S(3,3) = 6$ are exactly the left edge values of the difference table of $0^3, 1^3, 2^3, 3^3, \ldots$ (which is $0, 1, 6, 6, 0, \ldots$ starting from $a_0 = 0^3 = 0$ ). The Stirling numbers are encoded in the difference table, waiting to be read off.

# Applications

# Gregory-Newton interpolation for numerical tables

Before computers, mathematical tables (logarithms, trigonometric functions, actuarial tables) were computed by hand and checked by differencing. If you have a table of $\sin(x)$ at equally spaced points and the fourth differences are nearly constant, you know the interpolation is accurate to the corresponding degree. Any entry that produces an anomalous difference is a transcription error.

Charles Babbage designed his Difference Engine (1822) specifically to automate this: it computed polynomial functions by repeated addition, implementing Newton’s formula mechanically. The machine had no multiplication unit: it didn’t need one, because the difference method reduces polynomial evaluation to addition.

The method was still in active use well into the 20th century. The Handbook of Mathematical Functions (Abramowitz and Stegun, 1964) includes extensive difference tables for standard functions, and its introduction describes differencing as the primary method for verifying table accuracy. The British Nautical Almanac Office employed human “computers” who differenced tables by hand until the 1950s (Croarken, 199013).

A single transcription error in a table entry produces a localized spike in the difference table – the error propagates through subsequent differences but with a characteristic signature (binomial coefficients with alternating signs). An experienced table-maker could spot and localize an error by inspecting the difference columns, a technique that predates error-correcting codes by two centuries.

# Detecting polynomial sequences

The most immediate application: if you encounter a sequence and want to know whether a polynomial generates it, compute the difference table. If the $k$ -th row is constant, the sequence is a polynomial of degree $k$ and Newton’s formula gives you the closed form immediately.

This is not limited to pure mathematics. Physical quantities that depend polynomially on a parameter (projectile height vs. time, area vs. length) can be identified by differencing a table of measurements. The method is more numerically stable than fitting a polynomial by least squares when the data points are equally spaced, because it avoids solving a Vandermonde system.

A concrete use case: the sum of $k$ -th powers, $S_k(n) = 1^k + 2^k + \cdots + n^k$ , is always a polynomial of degree $k+1$ in $n$ . You can discover this experimentally by computing $S_k(n)$ for $n = 0, 1, 2, \ldots$ and differencing. For $k = 2$ : the sequence $0, 1, 5, 14, 30, 55, 91$ has constant third differences (equal to 2), confirming degree 3. The left edge is $0, 1, 3, 2$ , giving $S_2(n) = \binom{n}{1} + 3\binom{n}{2} + 2\binom{n}{3} = n + \frac{3n(n-1)}{2} + \frac{n(n-1)(n-2)}{3} = \frac{n(n+1)(2n+1)}{6}$ .

In competitive programming and puzzle mathematics, the “difference table test” is a standard first move when encountering an unfamiliar integer sequence.

# Euler-Maclaurin summation

The operator identity $\Delta = e^D - 1$ bridges discrete and continuous calculus directly. Expanding $\Delta^{-1}$ as a formal series in $D$ yields the Euler-Maclaurin formula,11 which converts discrete sums into integrals plus computable corrections involving Bernoulli numbers. It gives the asymptotic expansion of the harmonic numbers ( $\sum 1/k \approx \ln n + \gamma + \frac{1}{2n} - \cdots$ ), Stirling’s approximation for $n!$ , and sharp estimates of partial sums of $\zeta(s)$ .

# The Norlund-Rice integral

Many alternating sums in combinatorics have the form $\sum_{k=0}^{n} \binom{n}{k} (-1)^k f(k)$ , which is exactly $\Delta^n f(0)$ . When $f$ extends to a meromorphic function with polynomial growth, the Norlund-Rice integral converts the discrete sum into a contour integral.A meromorphic function is analytic (has a convergent Taylor series) everywhere except at isolated poles – rational functions, $\Gamma(z)$ , and $1/(z+1)$ are typical examples.

\sum_{k=0}^{n} \binom{n}{k} (-1)^k f(k) = \frac{(-1)^n}{2\pi i} \oint \frac{n!\, f(z)}{z(z-1)\cdots(z-n)}\,dz

The contour encloses $0, 1, \ldots, n$ . Besides these poles (which reconstruct the original sum by residues), the integrand may have other poles from $f(z)$ . Deforming the contour to infinity and picking up only those other poles often gives a closed form; the polynomial growth condition on $f$ is exactly what ensures the integral over the large contour vanishes as the radius grows. For $f(k) = 1/(k+1)$ , the extra pole at $z = -1$ yields $\sum \binom{n}{k} \frac{(-1)^k}{k+1} = \frac{1}{n+1}$ directly.

Flajolet and Sedgewick use this extensively in Analytic Combinatorics5; average-case complexity of search trees, hashing, and digital structures often reduces to alternating sums over binomial coefficients, which are entries on the left edge of the difference table of $(f(0), f(1), f(2), \ldots)$ .

# Umbral Calculus

There is a strange algebraic trick that has been making mathematicians uncomfortable since the 19th century. Write $B^n$ as a formal symbol, and impose the rule that after expanding any expression, you “lower the index”: replace $B^n$ with $B_n$ , the $n$ -th Bernoulli number. Then identities like

(B+1)^n = B^n \quad (n \geq 2)

hold, where both sides are evaluated by the lowering rule. This is the umbral calculus: treat sequence indices as if they were exponents. The name comes from the Latin umbra (shadow) – coined by Sylvester in the 1850s because the technique seemed like dark magic. You pretend indices are exponents, apply the binomial theorem as if it were legitimate, lower the indices back, and the answer is correct. For a century it worked without anyone knowing why. Blissard formalized the notation in 1861; Rota finally explained the mechanism in the 1970s.

To see the lowering rule in action, verify the identity $(B+1)^3 = B^3$ . Expand the left side by the binomial theorem: $(B+1)^3 = B^3 + 3B^2 + 3B^1 + B^0$ . Now lower indices: $B_3 + 3B_2 + 3B_1 + B_0$ . The Bernoulli numbers are $B_0 = 1$ , $B_1 = -1/2$ , $B_2 = 1/6$ , $B_3 = 0$ . So the left side becomes $0 + 3 \cdot \frac{1}{6} + 3 \cdot (-\frac{1}{2}) + 1 = \frac{1}{2} - \frac{3}{2} + 1 = 0 = B_3$ . It works because the Bernoulli numbers satisfy the recurrence $\sum_{k=0}^{n} \binom{n}{k} B_k = B_n$ for $n \ge 2$ – which is exactly what $(B+1)^n = B^n$ says after lowering.

Why does this work systematically? Rota, Kahaner, and Odlyzko14 gave the foundational treatment of the operator-theoretic framework (delta operators, Sheffer sequences) that explains why; Roman3 made it fully rigorous. Rota observed that the lowering rule is a linear functional $L$ on polynomials defined by $L[x^n] = B_n$ . The “umbral identity” $(B+1)^n = B^n$ is really $L[(x+1)^n] = L[x^n]$ , which holds because the Bernoulli numbers are defined by exactly this shift relation. The trick generalizes: any sequence $(a_n)$ defines a linear functional $L[x^n] = a_n$ , and identities among the $a_n$ become polynomial identities under $L$ .

The connection to difference tables is direct. The forward difference operator $\Delta$ satisfies $\Delta x^{\underline{n}} = n \cdot x^{\underline{n-1}}$ , mirroring $\frac{d}{dx} x^n = n x^{n-1}$ . Newton’s forward difference formula is the expansion of a polynomial in the basis $\binom{x}{n} = x^{\underline{n}}/n!$ , just as Taylor’s theorem expands in the basis $x^n/n!$ . This is not a coincidence: both $\Delta$ and $d/dx$ are examples of operators that reduce polynomial degree by 1 and commute with shifts, and any such operator produces its own “Taylor expansion” with its own natural basis. The difference table is the discrete instance of this structure.

# A Puzzle

Here’s a sequence: 2, 5, 10, 17, 26, 37, …

Build its difference table. What degree polynomial generates it? Can you find the formula?

Solution

2    5   10   17   26   37
   3    5    7    9   11
      2    2    2    2
         0    0    0

Second differences are constant, so it’s quadratic.

Left edge: 2, 3, 2. Using Newton’s formula:

a_n = 2\binom{n}{0} + 3\binom{n}{1} + 2\binom{n}{2} = 2 + 3n + n(n-1) = n^2 + 2n + 2

Check: $a_0 = 2$ , $a_1 = 5$ , $a_2 = 10$ . Correct.

# Open Connections

# p-adic analysis and Mahler’s theorem

Mahler (1958)8 proved that every continuous function $f: \mathbb{Z}_p \to \mathbb{Q}_p$ on the p-adic integersThe $p$ -adic integers $\mathbb{Z}_p$ are a completion of the ordinary integers under a different notion of “closeness”: two numbers are $p$ -adically close if their difference is divisible by a high power of $p$ . So $1$ and $1 + 2^{10}$ are very close 2-adically, even though they’re far apart on the number line. has a unique expansion

f(x) = \sum_{n=0}^{\infty} \Delta^n f(0) \cdot \binom{x}{n}

and this series converges for all $x \in \mathbb{Z}_p$ if and only if $\Delta^n f(0) \to 0$ in the $p$ -adic absolute value. This is the same Newton series formula, but the convergence condition is $p$ -adic rather than archimedean.

A concrete example: consider $f(x) = (-1)^x$ as a function on $\mathbb{Z}_2$ (the 2-adic integers). Its difference table has $\Delta^n f(0) = \sum_k \binom{n}{k}(-1)^{n-k}(-1)^k = (-1)^n \sum_k \binom{n}{k} = (-1)^n \cdot 2^n = (-2)^n$ . Concretely: $\Delta f(0) = f(1) - f(0) = -2$ , $\Delta^2 f(0) = f(2) - 2f(1) + f(0) = 1 + 2 + 1 = 4$ , $\Delta^3 f(0) = -8$ , and in general $\Delta^n f(0) = (-2)^n$ . In the real numbers, $|(-2)^n| \to \infty$ , so the Newton series diverges. But 2-adically, $|(-2)^n|_2 = 2^{-n} \to 0$ , so the series converges. The function $(-1)^x$ , which makes no sense for non-integer real arguments, extends continuously to all of $\mathbb{Z}_2$ via its difference table.

# Difference tables over finite fields

Over $\mathbb{F}_p$ , difference tables behave differently. Since $\binom{p}{k} \equiv 0 \pmod{p}$ for $0 < k < p$ , the $p$ -th difference operator reduces to $\Delta^p = E^p - I$ : a single shift. Difference tables over finite fields are eventually periodic rather than eventually zero, which means the “constant differences imply polynomial” test breaks down at the wrap-around scale.

# Why differencing amplifies noise

Under the discrete Fourier transform, $\Delta$ is multiplication by $\omega^k - 1$ in frequency space (where $\omega = e^{2\pi i/N}$ ). Low-frequency modes have $|\omega^k - 1| < 1$ and decay under differencing – this is why polynomials (dominated by low frequencies) have terminating difference tables. But at the Nyquist frequency $k = N/2$ , the factor is $|\omega^{N/2} - 1| = 2$ . After $n$ applications of $\Delta$ , the highest-frequency component is amplified by $2^n$ .

This is why differencing a noisy empirical sequence more than a few times produces garbage – and why Babbage’s Difference Engine was designed for exact integer arithmetic, not floating-point approximation.

# References

[1] Graham, R.L., Knuth, D.E., and Patashnik, O. Concrete Mathematics: A Foundation for Computer Science. 2nd ed. Addison-Wesley, 1994. Chapters 2 and 5 cover difference operators, Stirling numbers, and the Newton basis systematically. ↩

[2] Jordan, C. Calculus of Finite Differences. 3rd ed. Chelsea, 1965. The classical reference. Encyclopedic coverage of difference operators, interpolation, summation, and their applications. ↩

[3] Roman, S. The Umbral Calculus. Academic Press, 1984. Rota’s program made rigorous. Sheffer sequences, delta operators, and the algebraic foundations of the calculus of finite differences. ↩

[4] Riordan, J. Combinatorial Identities. Wiley, 1968. The working reference for binomial transform identities and their combinatorial proofs. ↩

[5] Flajolet, P. and Sedgewick, R. Analytic Combinatorics. Cambridge University Press, 2009. Chapter III covers the Norlund-Rice integral and its applications to the analysis of algorithms. ↩

[6] Knuth, D.E. The Art of Computer Programming, Vol. 1. 3rd ed. Addison-Wesley, 1997. Section 1.2.6 covers finite differences and their use in numerical computation. ↩

[7] Boole, G. A Treatise on the Calculus of Finite Differences. 2nd ed. Macmillan, 1872. The first systematic treatment. Available on the Internet Archive. ↩

[8] Mahler, K. “An interpolation series for continuous functions of a $p$ -adic variable.” Journal fur die reine und angewandte Mathematik, 199:23–34, 1958. ↩

[9] Stanley, R.P. Enumerative Combinatorics, Vol. 1. 2nd ed. Cambridge University Press, 2012. Chapter 1 covers Stirling numbers and their combinatorial interpretations in full generality. ↩

[10] OEIS Wiki: Transforms. Living reference for binomial transforms and their relationships across the OEIS. ↩

[11] Spivey, M.Z. “The Euler-Maclaurin formula and sums of powers.” Mathematics Magazine, 79(1):61–65, 2006. A clean derivation of Euler-Maclaurin via the operator formalism. ↩

[12] Beery, J. & Stedall, J. Thomas Harriot’s Doctrine of Triangular Numbers: The “Magisteria Magna”. European Mathematical Society, 2009. ↩

[13] Croarken, M. Early Scientific Computing in Britain. Oxford University Press, 1990. ↩

[14] Rota, G.-C., Kahaner, D., and Odlyzko, A. “Finite Operator Calculus.” Journal of Mathematical Analysis and Applications, 42:684–760, 1973. The foundational paper introducing delta operators and Sheffer sequences, explaining the algebraic structure underlying umbral calculus. ↩