AI6 min readMarch 20, 2026

How OCR Reads Text from Images

You snap a photo of a receipt, and seconds later the text is on your screen, editable, searchable, and copy-pasteable. Behind that simple interaction is decades of research in Optical Character Recognition — the technology that teaches computers to read. How does a machine look at pixels and see letters?

A brief history

OCR began in the 1950s when postal services needed to sort mail automatically. Early systems could only read specially designed fonts printed in magnetic ink (the blocky numbers on the bottom of checks are a relic of this era). By the 1990s, scanners and desktop OCR software made document digitisation practical. Today, OCR runs in real time on phone cameras, reading signs, menus, and license plates.

How modern OCR works

Modern OCR pipelines break the problem into four stages:

Image preprocessing — convert to grayscale, remove noise, correct skew, normalise contrast. This step has the biggest impact on accuracy.
Text detection — locate regions of the image that contain text. Modern systems use neural networks to draw bounding boxes around text lines and words.
Character segmentation — isolate individual characters (or in modern systems, process entire words at once using sequence models).
Recognition — classify each character using pattern matching or neural networks, then apply language models to correct ambiguous characters.

Image → Preprocess → Detect text regions → Segment characters → Recognise → Output text
  │         │              │                    │               │
  │     Grayscale      Bounding boxes      Split or         Neural net
  │     Deskew         around lines        sequence         + language
  │     Denoise                            model            model

The Tesseract engine

Tesseract is the most widely used open-source OCR engine. Originally developed by Hewlett-Packard in the 1980s, it was released as open source in 2005 and is now maintained by Google. Tesseract 5 uses an LSTM (Long Short-Term Memory) neural network for recognition, which dramatically improved accuracy over the older pattern-matching approach.

Tesseract supports over 100 languages and scripts, including Chinese, Arabic, and Devanagari. It can run in the browser via WebAssembly (through libraries like Tesseract.js), which means OCR can happen entirely on the client side without uploading images to a server.

What makes OCR hard

Challenge	Why it's hard	Mitigation
Handwriting	Infinite variation between writers	Specialised handwriting models (HTR)
Curved text	Characters distort along arcs	Text rectification preprocessing
Low contrast	Light text on light backgrounds	Adaptive thresholding, histogram equalisation
Non-Latin scripts	More glyphs, connected characters	Language-specific models
Complex layouts	Tables, columns, mixed content	Layout analysis before recognition

Preprocessing is half the battle. Before feeding an image to an OCR engine, try increasing the resolution to at least 300 DPI, converting to grayscale, and applying sharpening. These steps alone can improve accuracy from 70% to 95%+ on clean printed text.

Accuracy factors

OCR accuracy depends on the input quality far more than the engine:

Resolution — higher DPI means more pixels per character, giving the model more data to work with. 300 DPI is the standard for scanned documents.
Contrast — dark text on a white background is ideal. Coloured backgrounds, gradients, and watermarks all reduce accuracy.
Font clarity — standard fonts (Arial, Times) are recognised with near-perfect accuracy. Decorative, script, or heavily stylised fonts cause errors.
Image noise — dust, wrinkles, and JPEG compression artifacts confuse character boundaries.

Real-world uses

Receipt scanning — expense tracking apps extract totals, dates, and vendor names
Document digitisation — libraries convert books and archives to searchable text
License plate recognition — toll systems and parking garages read plates in real time
Accessibility — screen readers use OCR to describe text in images for visually impaired users

OCR doesn't just read text — it bridges the physical and digital worlds. Every scanned form, photographed whiteboard, and translated sign relies on a machine that learned to see letters in pixels.

Try it yourself

Put what you learned into practice with our Image to Text (OCR).

AI7 min read

How AI Inpainting Fills In Missing Parts of Photos

From clone stamps to LaMa: how AI learned to erase objects from photos by hallucinating plausible textures using Fourier convolutions.

March 25, 2026Read

AI6 min read

How Alpha Blending Watermarks Work (And Why They Are Reversible)

The math behind semi-transparent watermarks, why fixed patterns can be reversed with zero quality loss, and the difference between visible and invisible watermarks.

March 26, 2026Read

Design6 min read

Understanding Color Formats: HEX, RGB & HSL

Why screens mix red, green, and blue light, what HEX shorthand really encodes, and when HSL makes your life easier.

March 20, 2026Read