Sample Header Ad - 728x90

How does `pdfimages` differ from `pdftoppm`?

4 votes
1 answer
1465 views
For the purpose of processing pdf with Scan Tailor (in order to remove the background of photographed pdf pages , or to split pdf pages) given that this program needs input of images - it cannot input pdf as such - I have used a command like pdftoppm MY_PDF NAME_OF_IMAGE -png to process a low quality pdf, and the resulting images were worse than the original pdf. enter image description here But with pdfimages tool from poppler-utils the results are as good as the original. This stays true if a different variable than -png is used (or if no variable is used and the output is ppm). I thought that from now on pdfimages is a better solution for my purpose, but then I have noticed that for many other pdf files that command is not good at all, as it gives fragments of image or text where pdftoppm gives normal text as expected. Bad images if extracted from pdf with pdfimages viewed in Dolphin: enter image description here Correct images if extracted from the same pdf with pdftoppm viewed in Dolphin: enter image description here Why these differences?
Asked by cipricus (1779 rep)
Oct 22, 2022, 07:20 PM
Last activity: Oct 22, 2022, 11:58 PM