Unix Timestamps and How Computers Track Time
What January 1, 1970 means, why counting seconds beats calendars, the Y2K38 problem, and how time zones complicate everything.
Every time you run git diff, review a pull request, or compare two versions of a document, an algorithm is solving one of computer science's classic problems: given two sequences of text, what is the minimum set of changes that transforms one into the other? The answer is more subtle than it sounds — and the algorithms behind it shaped the tools developers use every day.
At the heart of every diff tool is the Longest Common Subsequence (LCS) problem. Given two sequences, find the longest subsequence present in both. Unlike a substring, a subsequence doesn't need to be contiguous — it just needs to appear in the same relative order.
Sequence A: A B C D F
Sequence B: A C B C F
LCS: A B C F (length 4)
The characters A, B, C, F appear in both
sequences in the same relative order.Once you know the LCS, everything not in it represents a change. Characters in A but not in the LCS were deleted. Characters in B but not in the LCS were inserted. This is why diff output shows lines prefixed with - (removed) and + (added).
The textbook approach uses dynamic programming. Build an (m+1) × (n+1) table where m and n are the lengths of the two sequences. Each cell dp[i][j] stores the length of the LCS of the first i characters of A and the first j characters of B. The time complexity is O(mn), and so is the space.
For two files with 10,000 lines each, that's a 100-million-cell table. It works, but it's slow. Real-world diff tools need something faster.
In 1986, Eugene Myers published “An O(ND) Difference Algorithm and Its Variations” — the algorithm that powers diff, Git, and most modern diff tools. Instead of filling an entire m × n grid, Myers' algorithm works in terms of the edit distance D (the number of insertions plus deletions).
Myers reframes diffing as a shortest-path problem on an edit graph. Imagine a grid where moving right is a deletion, moving down is an insertion, and moving diagonally (when characters match) is free. The goal is to reach the bottom-right corner with the fewest non-diagonal moves.
The key insight: Myers' algorithm is O(ND) where D is the edit distance. When two files are mostly similar (small D), the algorithm is nearly linear. Only when files are completely different does it degrade to O(N²). Since most diffs in practice involve small changes to large files, this is exactly the right trade-off.
Myers' algorithm finds a shortest edit script, but not necessarily the most readable one. When two functions are reordered in a file, Myers might match closing braces from one function with opening braces of another, producing a confusing diff.
Patience diff (invented by Bram Cohen of BitTorrent fame) takes a different approach:
The result is structurally cleaner. Git supports it via git diff --patience, and many developers set it as their default because the output aligns with how humans think about code changes: function-by-function, not line-by-line.
The most common format, used by git diff and patch files. Lines starting with - are removed, + are added, and a space prefix means unchanged context. The header shows line ranges:
--- a/hello.txt
+++ b/hello.txt
@@ -1,4 +1,4 @@
Hello
-World
+Universe
This is
a testThe @@ -1,4 +1,4 @@ means: starting at line 1 of the old file (showing 4 lines) and line 1 of the new file (showing 4 lines). This compact format is what makes git apply and patch workflows possible.
Shows the old and new versions in parallel columns. Easier for humans to read when reviewing large changes, but uses twice the horizontal space. Most code review tools (GitHub, GitLab) offer both views because neither is universally better — unified is more compact, side-by-side is more visual.
Computing the exact minimum edit distance is fundamentally expensive. The key complexity classes:
git diff --histogram. An optimisation of patience diff that handles repeated lines better.In 2023, the Bram Cohen and Myers approaches remain dominant. No fundamentally faster general algorithm has been found — the LCS problem has a proven lower bound of O(N² / log N) under certain computational models.
Diff algorithms are a beautiful example of theory meeting practice. The textbook solution is too slow, the fast solution only works because real-world changes are small, and the “best” algorithm depends on whether you optimise for speed or readability.
What January 1, 1970 means, why counting seconds beats calendars, the Y2K38 problem, and how time zones complicate everything.
What UUIDs are, v4 vs v7, collision probability, hashing fundamentals, broken vs secure algorithms, and how JWTs carry authentication.
Core SQL operations, JOINs explained simply, GROUP BY and aggregates, why formatting matters, SQL dialects, and when to choose NoSQL.