I've got some large document scans with embedded OCR text on Internet Archive I'd like to read. Unfortunately the PDF pages render very slowly on my document readers (Okular, Evince, Zathura). I previously used the DJVU files for this reason, but since they stopped creating them I am out of options. I have tried to convert to DJVU myself with
pdf2djvu
, djvudigital
, some online tools and even first going to JPEG and each time gotten very large files, as the programs seem to have trouble separating the foreground and background. So several questions:
1. How did the Internet Archive team previously produce their DJVUs? Can their process be replicated or approximated?
2. The second link suggests slow PDF rendering has been an issue for a while (at least over Linux). Are there any workarounds, like faster backends? I tried linearizing the files but that didn't improve things.
For testing the issue consider this volume of Poincaré's collected works
Asked by Kariuki
(1 rep)
Nov 29, 2022, 01:43 PM