Is there something which lets tesseract tell some OCR-quality measure?
0
votes
0
answers
17
views
I am on Ubuntu. Most of my scanned documents are German, English or French.
This question is related to my other question at https://unix.stackexchange.com/questions/792095/is-there-an-option-to-let-pdfsandwich-try-90-rotations-automatically-for-scanne
Is there a way to let tesseract tell us how well its OCR worked, something like a quality measure like x% of everything looking like characters could clearly be identified, y% have been identified as characters but with a doubtfull distance to traineddata.
If there were something like this, one might start tesseract (possibly time constrained for each page) and start it again with the same page rotated by 180° and try out if OCR works better for the upside-down orientation.
Or it would be possible to start it again with the document turned 90°, 180° or 270° and fully do the OCR for the orientation which works best.
Asked by Adalbert Hanßen
(303 rep)
Mar 7, 2025, 04:20 PM