OCR, text layer, visual analysis: what really reads a document

A no-frills guide to the different ways of reading a document and when each one is worth it.

When you upload a document you take it for granted that it gets «read». But reading a PDF isn't a single operation: depending on how the file is made, there are very different ways to pull the text out of it. Understanding them helps explain why some documents are processed in a flash and others take a few seconds more — and why some vendors promise «AI everywhere» when often much less would do.

The already-structured document

The best case of all. Some documents — chief among them the electronic invoice — are already data: behind the graphical appearance there's a file with the fields labeled one by one. Here there's nothing to «read»; you just interpret the structure. It's 100% accurate and requires no advanced analysis.

Many «native digital» PDFs generated by business systems also have metadata or repeatable structures. A good engine recognizes these patterns and doesn't waste OCR where it isn't needed.

The native text layer

Many PDFs are born digital: the text is already inside the file, selectable and copyable. In these cases there's no need to «look at» the image — you just extract the text that's already there. Fast, reliable, cheap. The problem is when this layer is missing or incomplete.

Watch out for a false friend: sometimes the text layer exists but is scrambled — the total at the bottom of the page appears before the header in the extraction stream. Here you need layout intelligence, not just raw text copy.

OCR on scanned documents

An invoice printed and then scanned, or a photographed receipt, is just an image: there's no «real» text in it. This is where OCR comes in, recognizing characters starting from the pixels. It works well, but it's sensitive to scan quality: crooked, faded, or low-resolution images make life harder.

300 dpi is a good minimum for small text
Tight tables punish crooked scans
Stamps and signatures over numbers are the classic case for review

Visual analysis with AI, as a last resort

When the layout is complex or the scan is truly difficult, an extra level is needed: the AI looks at the document as a whole, understands where everything is, interprets tables and blocks. It's the most powerful weapon, but also the most expensive — which is why it makes sense to use it only when the other methods aren't enough.

Multimodal models excel on «messy» or never-seen documents. But using them on every structured electronic invoice would be like taking a jet to the corner shop: fast, yes, but the cost doesn't make sense.

The «deterministic first» cascade

The best logic is a cascade: try structure, then text layer, then OCR, then visual analysis. Every document takes the shortest path that leads it to a correct result. You pay only for the effort needed, and average times stay low even with 10% difficult scans.

The rule is simple: try the cheapest and most accurate way first, and step up a level only if you really need to.

What it means for you in practice

If your flow is mostly digital PDFs, you don't need a «vision-only» engine. If you receive many delivery notes photographed in the warehouse, robust OCR and targeted review matter more than perfect electronic invoices. Always ask: what's the mix of my documents, and how does the system choose the method for each one?

LOCRAI follows this philosophy: structure first, AI when needed. Lower cost, less variability, more explainability — and results that hold up when volume rises.

Want to see it on your documents?

We'll show you LOCRAI at work on one of your real workflows, in a short, concrete demo.

Request a demo