TC
pdf7 min read

How PDF Works: Pages, Fonts, and Security

PDF was designed to solve a deceptively hard problem: make a document look identical on every screen, printer, and operating system. To do that, Adobe created a file format that embeds its own fonts, defines exact page geometry, and can even carry its own encryption. Here's what happens inside the file.


PDF's internal structure

A PDF file is built from four sections, each serving a distinct role:

  1. Header — A version identifier like %PDF-1.7 that tells readers which features to expect.
  2. Body — A collection of numbered objects: text streams, images, fonts, annotations, and page descriptions. Each object has a unique ID.
  3. Cross-reference table (xref) — An index that maps each object ID to its byte offset in the file. This lets readers jump directly to any object without scanning the entire file — critical for fast random access in large documents.
  4. Trailer — Points to the xref table and the root catalog object, giving the reader an entry point.
%PDF-1.7                          ← Header

1 0 obj                           ← Object 1 (catalog)
<< /Type /Catalog /Pages 2 0 R >>
endobj

2 0 obj                           ← Object 2 (page tree)
<< /Type /Pages /Kids [3 0 R] /Count 1 >>
endobj

3 0 obj                           ← Object 3 (page)
<< /Type /Page /Parent 2 0 R
   /MediaBox [0 0 612 792]        ← Letter size in points
   /Contents 4 0 R >>
endobj

4 0 obj                           ← Object 4 (content stream)
<< /Length 44 >>
stream
BT /F1 24 Tf 100 700 Td (Hello) Tj ET
endstream
endobj

xref                              ← Cross-reference table
0 5
0000000000 65535 f
0000000009 00000 n
0000000058 00000 n
0000000115 00000 n
0000000206 00000 n

trailer << /Size 5 /Root 1 0 R >>
%%EOF

How fonts work in PDF

PDF solves the “missing font” problem two ways:

Embedded fonts

The font program itself (or a subset of it) is stored inside the PDF as a binary stream object. The reader doesn't need the font installed — it uses the embedded copy. This is why PDFs look identical everywhere, but also why they can be surprisingly large. A single embedded font can add 50–500 KB to the file.

Font subsetting

Instead of embedding the entire font (which might contain 5,000+ glyphs), modern PDF creators embed only the glyphs actually used in the document. A document using just “Hello World” in a 500 KB font might embed only 2 KB of glyph data — a 250× reduction.

Why copy-paste from PDFs sometimes produces garbage: When fonts are subsetted or re-encoded, the mapping from glyph shapes back to Unicode characters can be lost. The PDF renders perfectly (it only needs shapes), but there's no character mapping for your clipboard. This is a trade-off between file size and text extractability.

Why PDFs look the same everywhere

PDF achieves visual consistency through three mechanisms:

  • Absolute positioning — Every text character, line, and image is placed at exact coordinates in points (1/72 of an inch). There's no reflow, no responsive layout. The page is a fixed canvas.
  • Embedded resources — Fonts, images, and color profiles travel inside the file. No external dependencies.
  • Defined page geometry — The MediaBox specifies exact page dimensions. A4 is always 595.28 × 841.89 points.

This is also PDF's biggest limitation for the web. Unlike HTML, a PDF can't adapt to different screen sizes. It's a digital piece of paper — by design.


How PDF encryption works

PDF supports two levels of password protection, and understanding the difference is critical:

User password vs owner password

Password TypePurposeCan Open?Permissions
User passwordRequired to open the documentYes, with restrictionsSet by owner
Owner passwordGrants full admin accessYes, full accessAll permissions

Permission flags

The owner can set granular permission flags that control what users can do: printing, copying text, editing content, filling forms, and extracting pages. These flags are enforced by compliant PDF readers, but the underlying content is still encrypted — the flags aren't just advisory.

PDF encryption uses either RC4 (older, weaker) or AES-128/AES-256 (modern, strong). The encryption key is derived from the password, so a strong password is essential for meaningful security.


PDF/A: archiving for eternity

PDF/A is a strict subset of PDF designed for long-term archiving. It mandates:

  • All fonts must be embedded (no system font references)
  • No JavaScript, audio, or video (they depend on external software)
  • No encryption (must be permanently readable)
  • Color profiles must be embedded (no device-dependent colors)

Government agencies, legal firms, and libraries use PDF/A because it guarantees the document will be readable in 50 years, regardless of which software or operating system exists then.

PDF isn't just a file format — it's a self-contained rendering engine in a box. Everything needed to display the document exactly as intended is sealed inside the file itself.

Try it yourself

Put what you learned into practice with our HTML to PDF.