Markdown: The Writing Format That Took Over the Web
How John Gruber invented a plain-text syntax that converts to HTML, the flavors that evolved, and why developers love it.
Have you ever pasted an image directly into a CSS file? Or looked at a JWT token and wondered what that long string of letters was? Both use Base64 — an encoding scheme that converts binary data into plain text. It's one of the most widely used encoding formats on the internet, yet most developers use it without knowing how it actually works.
Computers store everything as binary — sequences of 1s and 0s. A photo, a PDF, an encryption key: they're all just bytes. But many communication channels were designed for text only. Email (SMTP) was built for 7-bit ASCII. JSON can't contain raw binary. HTML attributes expect text strings. If you tried to embed a raw JPEG in an email, the binary bytes would get corrupted by text-processing systems that strip high bits or interpret control characters.
Base64 solves this by translating binary data into a set of 64 “safe” characters that survive any text-based channel intact.
Base64 uses exactly 64 characters to represent data:
A-Z (26 characters) → values 0–25
a-z (26 characters) → values 26–51
0-9 (10 characters) → values 52–61
+ (1 character) → value 62
/ (1 character) → value 63
= (padding character, not part of the 64)Every character in this alphabet is safe in ASCII, safe in URLs (with minor variants), and safe in email headers. That's the entire point — no special characters, no control bytes, nothing a text system would misinterpret.
Base64 takes every 3 bytes (24 bits) of input and splits them into 4 groups of 6 bits. Each 6-bit group maps to one of the 64 characters. Since 6 bits can represent 0–63, the mapping is exact.
Input text: "Hi"
ASCII bytes: 72, 105
Binary: 01001000 01101001
Split into 6-bit groups:
010010 000110 1001xx
Pad remaining bits with zeros:
010010 000110 100100
Look up each value in the Base64 alphabet:
18 → S 6 → G 36 → k
Add padding (input was 2 bytes, not 3):
Result: "SGk="Three input bytes become four output characters. That's a 4/3 ratio, meaning Base64 output is always about 33% larger than the original binary. A 1 MB image becomes roughly 1.33 MB when Base64-encoded. This is the cost of text-safety — you trade size for compatibility.
When the input length isn't divisible by 3, Base64 adds = padding characters to make the output length a multiple of 4. One leftover byte produces ==, two leftover bytes produce =. Some modern implementations (like Base64url) drop the padding entirely, since the decoder can infer it from the output length.
Instead of linking to an external image file, you can embed it directly in your markup:
<img src="data:image/png;base64,iVBORw0KGgo..." />
/* Or in CSS */
background-image: url(data:image/svg+xml;base64,PHN2Zy...);This eliminates an HTTP request but increases the HTML/CSS file size. It's worth it for tiny icons and SVGs but counterproductive for large images.
When you attach a file to an email, your mail client encodes it as Base64 and embeds it in the message body using MIME (Multipurpose Internet Mail Extensions). The receiving client decodes it back to the original file. This is why forwarding large attachments bloats email size.
JWTs consist of three Base64url-encoded segments separated by dots. The header and payload are just JSON objects encoded in Base64url so they can travel safely in HTTP headers and URLs.
eyJhbGciOiJIUzI1NiJ9.eyJ1c2VyIjoiYWxpY2UifQ.signature
│ │ │
└─ header (Base64url) └─ payload (Base64url) └─ signatureStandard Base64 uses + and /, which are special characters in URLs. Base64url swaps them for - and _ to make the output URL-safe without percent-encoding. JWTs and many modern APIs use Base64url by default.
+, /, and = padding-, _, and often omits paddingBase64 is the duct tape of the internet — it was never meant to be elegant, but it holds everything together when binary data needs to travel through text-only channels.
How John Gruber invented a plain-text syntax that converts to HTML, the flavors that evolved, and why developers love it.
Why camelCase and snake_case exist, how Unicode and UTF-8 encode every language, and the surprising history of Lorem Ipsum.
Why screens mix red, green, and blue light, what HEX shorthand really encodes, and when HSL makes your life easier.