TC
ai6 min read

How OCR Reads Text from Images

You snap a photo of a receipt, and seconds later the text is on your screen, editable, searchable, and copy-pasteable. Behind that simple interaction is decades of research in Optical Character Recognition — the technology that teaches computers to read. How does a machine look at pixels and see letters?


A brief history

OCR began in the 1950s when postal services needed to sort mail automatically. Early systems could only read specially designed fonts printed in magnetic ink (the blocky numbers on the bottom of checks are a relic of this era). By the 1990s, scanners and desktop OCR software made document digitisation practical. Today, OCR runs in real time on phone cameras, reading signs, menus, and license plates.


How modern OCR works

Modern OCR pipelines break the problem into four stages:

  1. Image preprocessing — convert to grayscale, remove noise, correct skew, normalise contrast. This step has the biggest impact on accuracy.
  2. Text detection — locate regions of the image that contain text. Modern systems use neural networks to draw bounding boxes around text lines and words.
  3. Character segmentation — isolate individual characters (or in modern systems, process entire words at once using sequence models).
  4. Recognition — classify each character using pattern matching or neural networks, then apply language models to correct ambiguous characters.
Image → Preprocess → Detect text regions → Segment characters → Recognise → Output text
  │         │              │                    │               │
  │     Grayscale      Bounding boxes      Split or         Neural net
  │     Deskew         around lines        sequence         + language
  │     Denoise                            model            model

The Tesseract engine

Tesseract is the most widely used open-source OCR engine. Originally developed by Hewlett-Packard in the 1980s, it was released as open source in 2005 and is now maintained by Google. Tesseract 5 uses an LSTM (Long Short-Term Memory) neural network for recognition, which dramatically improved accuracy over the older pattern-matching approach.

Tesseract supports over 100 languages and scripts, including Chinese, Arabic, and Devanagari. It can run in the browser via WebAssembly (through libraries like Tesseract.js), which means OCR can happen entirely on the client side without uploading images to a server.


What makes OCR hard

ChallengeWhy it's hardMitigation
HandwritingInfinite variation between writersSpecialised handwriting models (HTR)
Curved textCharacters distort along arcsText rectification preprocessing
Low contrastLight text on light backgroundsAdaptive thresholding, histogram equalisation
Non-Latin scriptsMore glyphs, connected charactersLanguage-specific models
Complex layoutsTables, columns, mixed contentLayout analysis before recognition
Preprocessing is half the battle. Before feeding an image to an OCR engine, try increasing the resolution to at least 300 DPI, converting to grayscale, and applying sharpening. These steps alone can improve accuracy from 70% to 95%+ on clean printed text.

Accuracy factors

OCR accuracy depends on the input quality far more than the engine:

  • Resolution — higher DPI means more pixels per character, giving the model more data to work with. 300 DPI is the standard for scanned documents.
  • Contrast — dark text on a white background is ideal. Coloured backgrounds, gradients, and watermarks all reduce accuracy.
  • Font clarity — standard fonts (Arial, Times) are recognised with near-perfect accuracy. Decorative, script, or heavily stylised fonts cause errors.
  • Image noise — dust, wrinkles, and JPEG compression artifacts confuse character boundaries.

Real-world uses

  • Receipt scanning — expense tracking apps extract totals, dates, and vendor names
  • Document digitisation — libraries convert books and archives to searchable text
  • License plate recognition — toll systems and parking garages read plates in real time
  • Accessibility — screen readers use OCR to describe text in images for visually impaired users
OCR doesn't just read text — it bridges the physical and digital worlds. Every scanned form, photographed whiteboard, and translated sign relies on a machine that learned to see letters in pixels.

Try it yourself

Put what you learned into practice with our Image to Text (OCR).