Unix Timestamps and How Computers Track Time
What January 1, 1970 means, why counting seconds beats calendars, the Y2K38 problem, and how time zones complicate everything.
Regular expressions (regex) are often described as a "write-only" language. You can write a complex pattern in seconds, but reading it back a week later feels like deciphering an ancient, cryptic script. Yet, behind the wall of backslashes and brackets lies one of the most powerful tools in a developer's arsenal.
At its core, a regular expression is a domain-specific language for describing patterns in text. Instead of searching for a literal string like "hello", regex allows you to search for "a word that starts with 'h', ends with 'o', and has exactly three letters in between."
Regex is based on formal language theory, specifically regular languages. Most modern regex engines, however, have evolved far beyond the original mathematical definition, adding features like backreferences and lookarounds that make them significantly more powerful (and computationally expensive).
When you run a regex, a "regex engine" takes your pattern and your text and tries to find a match. There are two primary types of engines:
Regex patterns are built from a mix of literal characters and metacharacters. Here are the most common ones:
| Symbol | Meaning | Example |
|---|---|---|
| . | Any single character (except newline) | a.c matches abc |
| \d | Any digit (0-9) | \d\d matches 42 |
| + | One or more of the preceding element | \d+ matches 123 |
| * | Zero or more of the preceding element | ab* matches a, ab, abb |
| ? | Zero or one (optional) | colou?r matches color/colour |
Parentheses () are used to create capturing groups. This allows you to treat a part of the pattern as a single unit and "remember" what it matched.
// Pattern to match repeated words
const pattern = /(\w+)\s+\1/;
"hello hello".match(pattern); // Matches "hello hello"In the example above, (\w+) captures a word, and \1 is a backreference that says "match exactly what the first group matched."
Sometimes you want to match something only if it is followed (or preceded) by something else, without including that "something else" in the match. These are called lookarounds.
Lookarounds are incredibly useful for complex validation, like ensuring a password contains at least one number and one special character.
By default, regex quantifiers like + and * are greedy. They will match as much text as possible.
Consider the string <em>Hello</em> World and the pattern <.*>. A greedy match will return the entire string because it starts with < and ends with the last >.
To make a quantifier lazy (or non-greedy), you add a ? after it. The pattern <.*?> will match only <em>.
Because NFA engines use a trial-and-error approach, certain patterns can cause the engine to try an exponential number of combinations when a match fails. This is known as catastrophic backtracking.
A classic example is (a+)+b against the string "aaaaaaaaaaaaaaaaX". The engine will try every possible way to group those "a"s before finally giving up, which can freeze your application or even crash a server (a "ReDoS" attack).
While the core syntax is similar, regex "flavors" vary between programming languages.
| Flavor | Environment | Key Characteristic |
|---|---|---|
| PCRE | PHP, Apache, R | The "gold standard" for features. |
| JavaScript | Browsers, Node.js | Fast, but historically lacked lookbehinds. |
| Python (re) | Python Standard Lib | Strict, readable, supports named groups. |
The most famous advice regarding regex comes from Jamie Zawinski: "Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems."
You should generally avoid regex for:
To keep your regex from becoming a nightmare for your future self:
\1, use named groups like (?<year>\d4) to make your code more readable."Regex is a scalpel. In the right hands, it's a precision instrument. In the wrong hands, it's a mess."
What January 1, 1970 means, why counting seconds beats calendars, the Y2K38 problem, and how time zones complicate everything.
What UUIDs are, v4 vs v7, collision probability, hashing fundamentals, broken vs secure algorithms, and how JWTs carry authentication.
Core SQL operations, JOINs explained simply, GROUP BY and aggregates, why formatting matters, SQL dialects, and when to choose NoSQL.