source: ocr-su-scansioni-qualita-revisione.md
category: dataQuality
published: August 5, 2025
read_time: 11m
OCR on scans: DPI, skew and the human review queue
Scans impose different constraints than digital PDFs. Image quality, stamps, skew: how to set up the workflow and when human review is needed.
Not all PDFs are alike. An invoice received by email as a native file behaves differently from the same document printed, signed, stamped and scanned in the office. For the latter, OCR is the only path — and image quality determines much of the outcome, regardless of how «intelligent» the downstream engine is.
Resolution and DPI: the operational minimum
For standard administrative text, 300 dpi is a good minimum. Below that, small characters (footnotes, item codes) become ambiguous. Above it, marginal gain must be weighed against upload time and storage. For smartphone photos, check focus and lighting: a blurry image does not recover at 600 dpi.
- Prefer black-and-white or greyscale scanning for text — colour rarely helps OCR
- Avoid aggressive compression: JPEG artefacts look like pen strokes
- Multi-page: one crooked page in a long delivery note can corrupt the whole table
Skew, stamps and visual noise
Slightly rotated documents punish tables: columns misalign and OCR mixes cells. Stamps and signatures over amounts or VAT numbers are the classic review case — no engine should force a number 40% covered. Creases, stains and low-quality faxes belong here too: better to flag uncertainty than invent digits.
An OCR that never admits doubt on uncertain fields is more dangerous than one that asks for a second look.
The human review queue
In a mature workflow, review is not a total-failure fallback: it is a targeted filter. The system marks low-confidence fields, unreconciled totals, anomalous codes. The operator sees document and values side by side, fixes only the exception, the rest passes. Human time then scales with the share of «dirty» documents, not total volume.
Useful metrics — without marketing numbers
- Share of documents in review by type (invoice vs delivery note vs order)
- Average review time per exception — not just «minutes saved»
- First-pass correct fields on digital vs scanned documents — two different curves
- Downstream errors (accounting, warehouse) found after extraction
Prevent upstream
Standardising how scanning is done in the office — same resolution, same format, avoiding «photo of the document on the desk» — shrinks the queue more than any engine tuning. Where possible, ask suppliers for the native PDF: free in quality terms.
LOCRAI treats scans and digital PDFs on distinct paths and highlights fields to verify, so data quality stays under control even when the source document is not.
Want to see it on your documents?
We'll show you LOCRAI at work on one of your real workflows, in a short, concrete demo.
Request a demoKeep reading
source: quanto-costa-data-entry-manuale.md
category: automation
published: June 25, 2026
read_time: 11m
Manual data entry: how to measure the real cost of your document workflow
source: automazione-ciclo-passivo.md
category: automation
published: June 24, 2026
read_time: 12m
Accounts payable automation: from invoice to ERP without data entry
