How does `pdfimages` differ from `pdftoppm`?
4
votes
1
answer
1465
views
For the purpose of processing pdf with Scan Tailor (in order to remove the background of photographed pdf pages , or to split pdf pages) given that this program needs input of images - it cannot input pdf as such - I have used a command like
But with
Correct images if extracted from the same pdf with
Why these differences?
pdftoppm MY_PDF NAME_OF_IMAGE -png
to process a low quality pdf, and the resulting images were worse than the original pdf.

pdfimages
tool from poppler-utils
the results are as good as the original.
This stays true if a different variable than -png
is used (or if no variable is used and the output is ppm
).
I thought that from now on pdfimages
is a better solution for my purpose, but then I have noticed that for many other pdf files that command is not good at all, as it gives fragments of image or text where pdftoppm
gives normal text as expected.
Bad images if extracted from pdf with pdfimages
viewed in Dolphin:

pdftoppm
viewed in Dolphin:

Asked by cipricus
(1779 rep)
Oct 22, 2022, 07:20 PM
Last activity: Oct 22, 2022, 11:58 PM
Last activity: Oct 22, 2022, 11:58 PM