Your Hardware Has Been Lying About Sine Since 1983

April 7, 2026

In 1983, William Kahan sat down with an Intel 8087 floating-point coprocessor and asked it for $\sin(\pi \times 10^{14})$ .

The 8087 answered: $+0.79905\ldots$

The correct answer is $-0.78387\ldots$

Not a rounding error. Wrong sign. Off by more than 1.5 in absolute terms, for a function bounded between $-1$ and $1$ . Kahan documented this in his paper "Mathematics Written in Sand", a systematic catalog of what the early floating-point coprocessors were actually doing versus what their designers claimed. The 8087 manual said the sine instruction had less than 1 ULP of error. The actual error on this single input was not a rounding anomaly -- it was a fundamental design choice that Intel would not fully acknowledge for another three decades.

This is the story of that choice, why it persists, and what exact arithmetic actually looks like for trigonometric functions.

The 66-bit lie

The core of the problem is argument reduction.

FSIN on the 8087, and on every x87-compatible processor since, does not compute $\sin(x)$ directly for arbitrary $x$ . It first reduces $x$ modulo $2\pi$ to a small argument in $[-\pi/4, \pi/4]$ , then evaluates a polynomial approximation on that small argument. The polynomial evaluation is fine -- that part really is accurate to nearly 1 ULP.

The reduction is the problem.

To reduce $x$ modulo $\pi/2$ , you need to compute $x \cdot (2/\pi)$ and extract the integer part. For this, you need a high-precision representation of $2/\pi$ . Intel's x87 uses 66 bits. This was a reasonable choice for 1980: 66 bits is more than the 64-bit extended precision the coprocessor normally works in, so for small arguments the reduction is accurate.

For large arguments it is a catastrophe. When $x = \pi \times 10^{14} \approx 3.14 \times 10^{14}$ , the computation $x \cdot (2/\pi)$ produces a number around $2 \times 10^{14}$ . To get the fractional part of that -- the only part that matters for the sine -- you need to subtract a 47-bit integer from a number that has consumed 47 bits just to represent its integer part. In double precision (53-bit mantissa), the fractional part has 53 - 47 = 6 bits of precision left. Six bits. Your reduction error is not in the last bit; it is in the fifth significant bit of the reduced argument.

Bruce Dawson analyzed this systematically in 2014 and found the actual worst case. An Intel employee on the Intel community forum confirmed in 2010: "For large arguments, the FSIN/FCOS/FTAN instructions in the x87 FPU... may give incorrect results." The recommendation was to use libm instead. The hardware documentation claiming "less than 1 ULP" error was, to be precise, wrong about its own error bound by a factor of approximately 1.37 quintillion.

That is the number Dawson computed for the worst-case ULP error of x87 FSIN: about $1.37 \times 10^{18}$ ULPs. The manual said $< 1$ .

How argument reduction actually works

The correct approach to computing $\sin(x)$ for large $x$ is known, well-understood, and implemented in every quality math library. The problem is not unsolvable -- it was just not solved in the 8087.

For small arguments ( $|x| \lesssim 10^4$ or so), a technique called Cody-Waite reduction works: represent $\pi/2$ as a sum of two or three floating-point numbers whose sum is accurate to well beyond working precision, then subtract them from $x$ in sequence. Each subtraction peels off the exact high bits, leaving the fractional part with full precision intact.

For large arguments, Cody-Waite is not enough. The correct approach is Payne-Hanek reduction: precompute a table of $2/\pi$ to roughly 190 bits of precision, indexed by exponent. For a given $x$ , look up the relevant chunk of those bits, multiply, and extract the integer and fractional parts. Because the table is far more precise than any possible input, the fractional part comes out with full working precision regardless of how large $x$ is.

This is what glibc implements in k_rem_pio2.c. It is what musl implements in __rem_pio2_large.c. The algorithm is in Payne and Hanek's 1983 technical report and has been standard practice in quality libm implementations for forty years. It is not exotic. It is not expensive. The 8087 just did not do it.

The key insight: those extra bits of $2/\pi$ only need to exist in the reduction table, not in the floating-point registers. You carry 190 bits through the argument reduction and then discard them, keeping only the 53-bit result. The hardware never needs to be modified.

When the fix gets bypassed

Modern glibc and libm on reasonably current Linux use Payne-Hanek. For most programs on most systems, the hardware FSIN instruction is not used directly -- the software library wraps it and does its own reduction. But there are several common paths that bypass this fix silently.

-ffast-math: GCC's -ffast-math flag (and its Windows equivalent /fp:fast) permits the compiler to substitute the FSIN instruction directly for calls to sin(). This is documented, but the documentation does not mention that you are trading away the argument reduction. It produces measurably wrong results for large arguments, with no diagnostic.

Windows MSVCRT: The Microsoft Visual C++ runtime historically used x87 FSIN without Payne-Hanek. The behavior improved in later releases but the history of Windows programs computing visibly wrong trigonometric values is long.

NumPy with MKL/SVML: Intel's Math Kernel Library ships a vectorized math library called SVML. On systems where NumPy is installed via Anaconda with MKL, array operations exceeding approximately 8192 elements are routed through SVML rather than libm. SVML uses its own argument reduction, which differs from glibc in the last few digits for some inputs. NumPy issue #12499 documents cases where np.sin() produces different last digits depending on array size: below the threshold, you get one answer; above it, you get another. Both are within spec. Neither matches the other. No warning is emitted.

In each case, the same source code produces different numerical results depending on which build flags were set, which runtime is installed, or how many elements are in the array. There is no compiler warning. There is no runtime error.

The unfixable part

Everything above is, in principle, fixable. Better argument reduction, correct libm, avoid -ffast-math. But there is a category of failure that no algorithm can address, and Kahan identified it in the same 1983 paper.

Consider $\sin(0.5 + 2\pi \times 10^{15})$ . The mathematically correct answer is $\sin(0.5) \approx 0.4794...$ , because $\sin$ has period $2\pi$ and $2\pi \times 10^{15}$ is an exact multiple of the period.

The problem: IEEE 754 double precision cannot represent this input. The closest representable double to $0.5 + 2\pi \times 10^{15}$ is not the number you wrote down. The value $0.5$ is lost in representational rounding before sin is ever called. A perfect Payne-Hanek implementation, given this input, will compute a correct sine of the wrong number.

This is not an algorithm failure. The argument to sin is not $0.5 + 2\pi \times 10^{15}$ . The argument to sin is some nearby double, and the sine of that nearby double is computed correctly. The error happened at the representation boundary, not inside sin.

Kahan made this distinction carefully in 1983. The numpy maintainer's response in issue #12499 restates it: the function did not fail; the input failed to be representable. These are different problems with different solutions.

Feynman described the same phenomenon from a different angle: "The tangent of 10 to the 100th. I was sunk: you have to divide by $\pi$ to 100 decimal places!" He was not complaining about his calculator's implementation of tangent. He was observing that even asking the question requires more precision than any fixed-width float can carry.

For inputs of this form -- small offset plus large multiple of $\pi$ -- no floating-point sine function can help you. The precision budget for representing the argument is exhausted before any computation begins. If you need $\sin(x)$ for $x$ near a large multiple of $\pi$ , you need to carry the input in a higher-precision representation, or restructure the problem to avoid large cancellation.

The cross-language landscape

The practical situation across common environments:

Environment	Behavior	Notes
Python (`math.sin`)	Delegates to C runtime	OS-dependent; CPython on Linux with glibc gets Payne-Hanek
Python with `-ffast-math`	May use x87 `FSIN`	Depends on how the extension was compiled
JavaScript `Math.sin`	Implementation-defined	ECMAScript spec sets no precision requirement
Java `Math.sin`	Requires 1 ULP	Compliance took years; early JVMs varied
NumPy (small array)	`libm`	Platform `libm` via `glibc` on Linux
NumPy (large array, MKL)	SVML	Different last digits; no warning
C with `gcc -ffast-math`	May use `FSIN`	Hardware instruction with 66-bit $\pi$

The JavaScript case is worth pausing on. The ECMAScript specification does not specify the required precision of Math.sin. Different JavaScript engines have historically returned different results for the same input. Tom MacWright documented cases where Math.sin produced different last digits between Node.js versions on the same machine, because V8 changed its implementation. A program that relied on the exact bit pattern of a Math.sin result would break silently on an engine upgrade.

A concrete example. On a Linux system with glibc versus a system using MSVCRT, the same C code:

#include <math.h>
#include <stdio.h>
int main(void) {
    printf("%.20f\n", sin(1e18));
    return 0;
}

can produce different results in the last several digits. Both implementations satisfy their documented contracts. The contracts say nothing useful about agreement.

Coda: what exact arithmetic looks like

There is a context where sine can be computed exactly for divisibility questions, and it produces a qualitatively different picture.

The identity $\sin(\pi n / d) = 0$ if and only if $d \mid n$ encodes divisibility as a zero-finding problem. For integer inputs, we want to know whether the result is exactly zero or nonzero -- not approximately zero up to floating-point noise.

The fix is modular reduction before the call to sin. Write $n = qd + r$ where $r = n \bmod d$ . Then:

\sin\left(\frac{\pi n}{d}\right) = \sin\left(\pi q + \frac{\pi r}{d}\right) = (-1)^q \sin\left(\frac{\pi r}{d}\right)

When $d \mid n$ , $r = 0$ and the result is $\sin(0) = 0.0$ -- exactly, regardless of the magnitude of $n$ . No argument reduction, no precision budget, no ULP error. The integer remainder does the work that floating-point cannot.

When $d \nmid n$ , the argument $\pi r / d$ lies in $(0, \pi)$ , a small well-conditioned range, and the sine is computed on that reduced argument.

pub fn sin_pi_n_over_d(n: u64, d: u64) -> f64 {
    let q = n / d;
    let r = n % d;
    if r == 0 {
        return 0.0;  // exact
    }
    let val = (std::f64::consts::PI * r as f64 / d as f64).sin();
    if q % 2 == 0 { val } else { -val }
}

For $n = 10^{17} + 3$ (prime) and $d = 7$ , the naive (PI * n as f64 / 7.0).sin() returns garbage -- the argument has magnitude $\sim 4.5 \times 10^{16}$ and the reduced argument has essentially no precision. The modular version computes $r = (10^{17} + 3) \bmod 7 = 3$ , then evaluates $\sin(3\pi/7)$ exactly in range.

This is what it looks like. Fix $n$ and plot $\sin(\pi n / d)$ for $d = 1 \ldots n$ , computed with modular reduction:

Plot of sin(pi*30/d) for d=1..30, with zeros at divisors 1,2,3,5,6,10,15,30 marked — The zeros of sin(pi*30/d) land exactly on the divisors of 30: {1, 2, 3, 5, 6, 10, 15, 30}. Every true divisor produces exactly 0.0 -- not floating-point noise near zero, but the bit pattern for zero.

Plot of sin(pi*97/d) for d=1..97, with zeros only at d=1 and d=97 — 97 is prime, so the curve touches zero only at the two endpoints. Every d in (1, 97) is a non-divisor, and every non-divisor produces a strictly nonzero value. The structure is exact -- there are no false near-zeros.

The composite case produces a thicket of zero crossings. The prime case produces a long uninterrupted oscillation. The pattern is exact because the arithmetic is exact: every zero is the bit pattern 0x0000000000000000, not a small float rounded down by an epsilon test.

This is not a trick, and it is not restricted to small $n$ . The modular reduction works at any scale where u64 arithmetic does not overflow. It is the proof that the floating-point problem Kahan identified in 1983 is not inherent to the question -- it is inherent to naive argument passing. Carry the right information in integer form, reduce before you evaluate, and sine is a perfectly well-behaved function.

The hardware has been lying. But you can stop listening to it for inputs of this form.