Skip to content
LOCRAI
All articles

source: ocr-su-scansioni-qualita-revisione.md

category: dataQuality

published: August 5, 2025

read_time: 11m

OCR on scans: DPI, skew and the human review queue

Scans impose different constraints than digital PDFs. Image quality, stamps, skew: how to set up the workflow and when human review is needed.

Not all PDFs are alike. An invoice received by email as a native file behaves differently from the same document printed, signed, stamped and scanned in the office. For the latter, OCR is the only path — and image quality determines much of the outcome, regardless of how «intelligent» the downstream engine is.

Resolution and DPI: the operational minimum

For standard administrative text, 300 dpi is a good minimum. Below that, small characters (footnotes, item codes) become ambiguous. Above it, marginal gain must be weighed against upload time and storage. For smartphone photos, check focus and lighting: a blurry image does not recover at 600 dpi.

  • Prefer black-and-white or greyscale scanning for text — colour rarely helps OCR
  • Avoid aggressive compression: JPEG artefacts look like pen strokes
  • Multi-page: one crooked page in a long delivery note can corrupt the whole table

Skew, stamps and visual noise

Slightly rotated documents punish tables: columns misalign and OCR mixes cells. Stamps and signatures over amounts or VAT numbers are the classic review case — no engine should force a number 40% covered. Creases, stains and low-quality faxes belong here too: better to flag uncertainty than invent digits.

An OCR that never admits doubt on uncertain fields is more dangerous than one that asks for a second look.

The human review queue

In a mature workflow, review is not a total-failure fallback: it is a targeted filter. The system marks low-confidence fields, unreconciled totals, anomalous codes. The operator sees document and values side by side, fixes only the exception, the rest passes. Human time then scales with the share of «dirty» documents, not total volume.

Useful metrics — without marketing numbers

  • Share of documents in review by type (invoice vs delivery note vs order)
  • Average review time per exception — not just «minutes saved»
  • First-pass correct fields on digital vs scanned documents — two different curves
  • Downstream errors (accounting, warehouse) found after extraction

Prevent upstream

Standardising how scanning is done in the office — same resolution, same format, avoiding «photo of the document on the desk» — shrinks the queue more than any engine tuning. Where possible, ask suppliers for the native PDF: free in quality terms.

LOCRAI treats scans and digital PDFs on distinct paths and highlights fields to verify, so data quality stays under control even when the source document is not.

Want to see it on your documents?

We'll show you LOCRAI at work on one of your real workflows, in a short, concrete demo.

Request a demo