Probability and Statistics: Making Sense of Randomness
What probability means, how the bell curve works, what z-scores and confidence intervals actually tell you, and why sample size matters.
Open any maps app, drop two pins, and you'll see a distance. That number looks simple, but calculating it is surprisingly subtle. The Earth is not flat, not a perfect sphere, and not even a perfect ellipsoid. Yet a formula published in 1805 — the Haversine formula — still gives answers accurate to within 0.5% for most practical purposes, using nothing more than basic trigonometry.
On a flat surface, the shortest path between two points is a straight line, and the Pythagorean theorem gives you the distance. But the Earth's surface is curved. A "straight line" between London and Tokyo would tunnel through the planet's interior — not very useful for navigation.
What we actually want is the great-circle distance: the shortest path along the surface of a sphere. Imagine stretching a string taut between two points on a globe — it follows the arc of a "great circle," a circle whose centre coincides with the centre of the Earth. Every great circle divides the sphere into two equal halves, which is why these paths are the shortest possible surface routes.
The word "haversine" is short for "half versed sine." The versine of an angle θ is 1 − cos(θ), and the haversine is half of that: hav(θ) = (1 − cos(θ)) / 2. This function was popular in navigation because it avoids the subtraction of nearly-equal numbers that plagued earlier cosine-based formulas on mechanical calculators.
Given two points with latitude and longitude in radians — (φ₁, λ₁) and (φ₂, λ₂) — and Earth's radius R ≈ 6,371 km, the distance is:
a = sin²((φ₂ − φ₁) / 2)
+ cos(φ₁) · cos(φ₂) · sin²((λ₂ − λ₁) / 2)
c = 2 · atan2(√a, √(1 − a))
distance = R · cThe intermediate value a is the square of half the chord length between the two points. The value c is the angular distance in radians. Multiplying by Earth's radius converts that angle into a surface distance in kilometres.
Let's calculate the distance from New York (40.7128°N, 74.0060°W) to London (51.5074°N, 0.1278°W). First, convert to radians:
φ₁ = 40.7128° × π/180 = 0.7106 rad
φ₂ = 51.5074° × π/180 = 0.8989 rad
Δφ = 0.8989 − 0.7106 = 0.1883 rad
Δλ = (−0.1278 − (−74.006)) × π/180 = 1.2892 rad
a = sin²(0.1883/2)
+ cos(0.7106) · cos(0.8989) · sin²(1.2892/2)
= 0.00886 + 0.7597 · 0.6253 · 0.3538
= 0.00886 + 0.1681
= 0.1769
c = 2 · atan2(√0.1769, √0.8231) = 0.8667 rad
distance = 6371 × 0.8667 ≈ 5,524 kmThe actual great-circle distance is about 5,570 km. Our result is within 1% — the small error comes from the Earth being an oblate spheroid rather than a perfect sphere.
The spherical law of cosines gives the same result in a single line:
d = R · acos(sin(φ₁)·sin(φ₂) + cos(φ₁)·cos(φ₂)·cos(Δλ))This is simpler, but it has a numerical stability problem. When two points are very close together, the value inside acos() approaches 1.0, and floating-point rounding errors can push it above 1.0 — which makes acos() return NaN. The Haversine formula avoids this because it uses atan2 and squared sines, which remain numerically stable even for tiny distances.
The Haversine formula was originally designed for navigators doing arithmetic by hand with logarithm tables. Its numerical stability turned out to be just as valuable for computers doing IEEE 754 floating-point math two centuries later.
Modern GPS receivers don't use the Haversine formula directly — they use the WGS 84 ellipsoid model, which accounts for the Earth being flattened at the poles (the equatorial radius is about 21 km longer than the polar radius). Vincenty's formulae and the more modern Karney algorithm handle this ellipsoidal geometry with sub-millimetre accuracy.
But the underlying idea is identical: you have two points defined by latitude and longitude, and you need the shortest surface path between them. The Haversine is the spherical approximation, and it remains the go-to choice when you need a fast distance estimate and sub-kilometre precision is acceptable.
WHERE clause.A formula from the age of sextants and star charts remains one of the most practical tools in modern software engineering — a testament to the durability of good math.
What probability means, how the bell curve works, what z-scores and confidence intervals actually tell you, and why sample size matters.
Area and volume formulas for every common shape, the Pythagorean theorem, Law of Sines and Cosines, and slope of a line.
How fractions work, why prime factorisation matters, the GCF and LCM connection, ratios, proportions, and percentages as fractions of 100.