Sample Header Ad - 728x90

Is there something which lets tesseract tell some OCR-quality measure?

0 votes
0 answers
17 views
I am on Ubuntu. Most of my scanned documents are German, English or French. This question is related to my other question at https://unix.stackexchange.com/questions/792095/is-there-an-option-to-let-pdfsandwich-try-90-rotations-automatically-for-scanne Is there a way to let tesseract tell us how well its OCR worked, something like a quality measure like x% of everything looking like characters could clearly be identified, y% have been identified as characters but with a doubtfull distance to traineddata. If there were something like this, one might start tesseract (possibly time constrained for each page) and start it again with the same page rotated by 180° and try out if OCR works better for the upside-down orientation. Or it would be possible to start it again with the document turned 90°, 180° or 270° and fully do the OCR for the orientation which works best.
Asked by Adalbert Hanßen (303 rep)
Mar 7, 2025, 04:20 PM