Unix & Linux Stack Exchange
Q&A for users of Linux, FreeBSD and other Unix-like operating systems
Latest Questions
5
votes
4
answers
869
views
Eliminate duplicate pages from pdf
I have a pdf document with over 200 duplicate pages among the total 900 of the document. When there is a duplicate, it appears immediately after the original. Maybe with `pdftk` the job can be done, but I need some way to find out the duplicates...
I have a pdf document with over 200 duplicate pages among the total 900 of the document. When there is a duplicate, it appears immediately after the original.
Maybe with
pdftk
the job can be done, but I need some way to find out the duplicates...
fich
(340 rep)
Jun 20, 2021, 07:50 PM
• Last activity: May 25, 2025, 05:25 AM
0
votes
1
answers
41
views
Spliting PDF while keeping index in the new file
I have got a PDF file with many tomes in it. Because it contains a lot (>5,000) of pages I want to split it. I have used `pdftk` like this: ```bash pdftk input.pdf cat 487-2987 output second_tome.pdf ``` It works, but somehow `pdftk` doesn't put index in the output file. Because the content has many...
I have got a PDF file with many tomes in it. Because it contains a lot (>5,000) of pages I want to split it. I have used
pdftk
like this:
pdftk input.pdf cat 487-2987 output second_tome.pdf
It works, but somehow pdftk
doesn't put index in the output file. Because the content has many chapters I would like it to keep index, so I could quickly skip to a chapter in my PDF viewer.
I tried gs
, but it behaves similar to pdftk
: it doesn't write index. And works very slow.
I tried qpdf
, which **do** keep the index, but it puts the *entire* index of the input PDF, which results in the output file having information of all old contents. Also, if (like in the example above) I want to separate a range of pages, the "first" page in the output PDF will not start from 1.
Is there any way to do a split with an index?
Felix.leg
(103 rep)
May 24, 2025, 10:35 AM
• Last activity: May 24, 2025, 12:54 PM
0
votes
1
answers
2582
views
How to convert a directory of jpg files to a pdf with filenames as bookmarks?
I have a directory of jpg files that are scans of my handwritten notes. How do I convert them to a single pdf file that has the filenames as bookmarks? (I eventually also want to add ocr. As mentioned online, we can convert the `.jpg` files to a `.pdf` using `img2pdf .jp --output combined.pdf`, and...
I have a directory of jpg files that are scans of my handwritten notes. How do I convert them to a single pdf file that has the filenames as bookmarks?
(I eventually also want to add ocr. As mentioned online, we can convert the
.jpg
files to a .pdf
using img2pdf .jp --output combined.pdf
, and then we can add ocr using ocrmypdf combined.pdf combined_ocr.pdf
. My question is mainly about how to also make sure the pdf file has bookmarks (created from filenames) so that the document is easy to navigate.)
jm jm
(1 rep)
Sep 23, 2021, 04:04 AM
• Last activity: Apr 17, 2025, 03:03 AM
7
votes
4
answers
20351
views
Proper way to convert PDF to word from bash command-line
I need to convert 1K pdf files to doc on a debian server. I can convert a PDF to word using libreoffice commandline: libreoffice --headless --invisible --convert-to doc Sample-doc-file-100kb.pdf Or using soffice: soffice --nocrashreport --nologo --nolockcheck --nofirststartwizard --invisible --headl...
I need to convert 1K pdf files to doc on a debian server. I can convert a PDF to word using libreoffice commandline:
libreoffice --headless --invisible --convert-to doc Sample-doc-file-100kb.pdf
Or using soffice:
soffice --nocrashreport --nologo --nolockcheck --nofirststartwizard --invisible --headless --convert-to doc Sample-doc-file-100kb.pdf
The main problem with the above two commands, is that the doc file doesn't include images in the pages, it only contains the formatted text. Is there a better way to convert pdf to doc, including also the images present in the pdf? I am not interested in web services like zamzam, I need to do that from command-line on the server. Thank you.
user2972081
(171 rep)
Jun 21, 2016, 07:45 PM
• Last activity: Mar 5, 2025, 03:27 AM
206
votes
11
answers
178676
views
Command line: How do you rotate a PDF file 90 degrees?
When I scan documents that are landscape-oriented, the output PDF files are portrait and so all the PDF viewers display the scanned documents in portrait. **From the command line, how do you rotate a PDF file 90 degrees?** I tried searching and found a bunch of solutions but I had trouble finding wh...
When I scan documents that are landscape-oriented, the output PDF files are portrait and so all the PDF viewers display the scanned documents in portrait.
**From the command line, how do you rotate a PDF file 90 degrees?**
I tried searching and found a bunch of solutions but I had trouble finding what looked like an authoritative solution that uses a stable and robust Linux/Unix tool.
----
footnote
For example, here is a sampling of some of the haphazard solutions I found:
- "just use Adobe Acrobat Pro to rotate the file and then save the file"
- "use pdfjam"
- "use PDFtk"
- "use ${PROGRAM_NAME} from Poppler"
- "use ImageMagick's convert"
-- but then all the comments were very negative and stating "the image quality is ruined"
- "open the file in a PDF viewer, then rotate, then print using a PDF printer like cutePDF or PDF printer or etc"
- "use ${PROGRAM_NAME}", then I searched for "${PROGRAM_NAME}" and there is something about "Fedora removed ${PROGRAM_NAME} because of licensing issues"
Trevor Boyd Smith
(4181 rep)
Sep 24, 2017, 12:19 AM
• Last activity: Feb 22, 2025, 04:36 PM
5
votes
1
answers
7398
views
How to remove pages from a PDF while leaving the document otherwise unchanged
I have a PDF book that I want to remove a few pages from to reduce the file size. My normal solution to this didn't work, and when I tried others they introduced new problems: - I usually use PDF Arranger for this, which is normally a great tool. However, when I try it on this particular document I...
I have a PDF book that I want to remove a few pages from to reduce the file size. My normal solution to this didn't work, and when I tried others they introduced new problems:
- I usually use PDF Arranger for this, which is normally a great tool. However, when I try it on this particular document I get an error I've never seen before (invalid literal for int() with base 8: b'228')
- I can use pdftk to remove the pages, but the file size of the resultant document is more than double that of the original, which defeats the purpose of removing the pages in the first place
- I can also use the Print to File command to remove the correct pages, but then I get a huge margin around the pages, with a smaller font and more whitespace, making the file harder to read
As you can see, it's surprisingly tricky to remove pages while otherwise leaving the document the same. Any advice on other solutions, or figuring out what's going on with these, would be much apreciated!
pez
(51 rep)
Apr 15, 2020, 07:14 PM
• Last activity: Aug 27, 2024, 01:58 PM
0
votes
2
answers
106
views
Create pdf output file alternating pages between two files
I have a pdf file, let's call it A, composed of a lot of pages. Then I have a second pdf file, let's call it B, composed of a single page. My goal is to have an output file, let's call it O, with the following pattern: O[1] = A[1] O[2] = B[1] O[3] = A[2] O[4] = B[1] O[5] = A[3] O[6] = B[1] O[7] = A[...
I have a pdf file, let's call it A, composed of a lot of pages.
Then I have a second pdf file, let's call it B, composed of a single page.
My goal is to have an output file, let's call it O, with the following pattern:
O = A
O = B
O = A
O = B
O = A
O = B
O = A
O = B
...
In other words, I want to interleaving the pages of A with page B.
To give you a background, file A contains the slides of a course I've attended, I created file B with a single page filled by horizontal lines.
The resulting output file will allow me to place my notes on the right side of the slides (since I'm going to print two pages per sheet).
I'm able to do this importing every single page in LibreOffice, but I'm looking for a script (perhaps pdftk?) to easily run against different files.
Mark
(815 rep)
Apr 15, 2024, 06:34 AM
• Last activity: Jun 9, 2024, 02:42 PM
0
votes
1
answers
46
views
How to concatenate various jpgs pngs and pdf into single pdf leaving their sizes as they where?
I am trying to make a kind of dossier out of various pngs,jpgs and single page pdf images. It has to be a single pdf file, but it doesn't matter that results into something conveniently printable. As such, each page can be whatever size and format whatsoever. I have seen answers here that show how t...
I am trying to make a kind of dossier out of various pngs,jpgs and single page pdf images. It has to be a single pdf file, but it doesn't matter that results into something conveniently printable. As such, each page can be whatever size and format whatsoever. I have seen answers here that show how to make them into nice same-size pages, or center them, or such, but my interest is exactly the opposite. The order is already solved as the filenames are nicely numbered. If I try with
pdfunite
(after a convert
), it makes them fit into letter paper pages. Using convert
directly also makes them fit them into letter paper pages, but it crops images that are to wide.
wpkzz
(113 rep)
May 28, 2024, 09:35 PM
• Last activity: May 28, 2024, 11:00 PM
0
votes
0
answers
39
views
Remove PDF pages if content is subset of next page
I have a lot of uni-slides where a list of bullet points gets shown page by page until one slide contains all points. For the sake of explaining a topic this is quite nice but for learning I would like to remove all the pages leading up to the overview. Then I wouldn't have to mindlessly scroll the...
I have a lot of uni-slides where a list of bullet points gets shown page by page until one slide contains all points.
For the sake of explaining a topic this is quite nice but for learning I would like to remove all the pages leading up to the overview. Then I wouldn't have to mindlessly scroll the pdf up and down when searching something.
Are there any tools that support this operation/ has someone written a script that can delete those pages?
**TLDR:**
*PDF pages often only add one piece of additional information and remove none. What can I use to keep the page with all information?*
I tried the answer from this [question](https://unix.stackexchange.com/questions/204040/extract-completed-slides-of-a-slide-show-pdf) from 2015, but it failed on all pdfs I tried with "Input Errors".
BillGatesPriv
(11 rep)
May 12, 2024, 09:46 AM
8
votes
4
answers
5544
views
How can I rasterize all of the text in a PDF?
You know when you have a pdf, which is a scan of a document and it's a really huge file, because it just stores the picture of the scanned document? And there are OCR tools which can help you to make a proper document which just stores the text? Well, I need the reverse of that! Let's say I have a p...
You know when you have a pdf, which is a scan of a document and it's a really huge file, because it just stores the picture of the scanned document?
And there are OCR tools which can help you to make a proper document which just stores the text?
Well, I need the reverse of that! Let's say I have a perfect pdf document generated with
pdflatex
and I need to turn it into such a "huge" pdf, which looks exactly the same when printed on paper (with a certain dpi value), but is just a picture of the original.
My initial idea is to turn the pdf into a series of JPGs and then back into a PDF, but perhaps there is some canonical way for that?
---
In case you wonder why I would want to do such a thing: I'm currently stuck with a network printer, which is not maintained by me, and which randomly drops characters in printed files! So until someone figures out what's wrong there, I want this as workaround.
Dimitri Schachmann
(183 rep)
Apr 26, 2015, 02:09 PM
• Last activity: Feb 18, 2024, 01:40 PM
10
votes
3
answers
5452
views
How to concatenate pdf files with different frame sizes
For concatenating presentations of the same topic I use `pdftk` (e. g. `pdftk stones\ in\ england.pdf stones\ from\ namibia.pdf cat output nice\ stones.pdf`). Files with diferent frame sizes but same aspect ratio are just stringed together without any respect to the frame sizes. How can I concatenat...
For concatenating presentations of the same topic I use
pdftk
(e. g. pdftk stones\ in\ england.pdf stones\ from\ namibia.pdf cat output nice\ stones.pdf
).
Files with diferent frame sizes but same aspect ratio are just stringed together without any respect to the frame sizes.
How can I concatenate multiple *.pdf
with one resulting size for all frames (in the same aspect ratio)?
muggi
(759 rep)
Mar 20, 2018, 09:31 AM
• Last activity: Dec 29, 2023, 01:27 PM
6
votes
4
answers
2623
views
Concatenate PDFs but extend pdf's to be even number of pages
I want to concatenate a bunch of PDFs but for printing purposes I would prefer that empty pages are added to each document that have an odd number of pages. Can I do this with PDFTK?
I want to concatenate a bunch of PDFs but for printing purposes I would prefer that empty pages are added to each document that have an odd number of pages. Can I do this with PDFTK?
ase
(203 rep)
Mar 29, 2016, 02:17 PM
• Last activity: Nov 7, 2023, 11:15 AM
31
votes
4
answers
28139
views
How do I insert a blank page into a PDF with ghostscript or pdftk?
I have a PDF file that needs a blank page inserted into it every so often. The pattern is unpredictable, so I need a command that will allow me to fit one in wherever necessary. How can i do this?
I have a PDF file that needs a blank page inserted into it every so often. The pattern is unpredictable, so I need a command that will allow me to fit one in wherever necessary.
How can i do this?
ixtmixilix
(13520 rep)
Jul 2, 2011, 11:49 PM
• Last activity: Nov 7, 2023, 11:08 AM
8
votes
4
answers
2429
views
How to pdftk end minus 1?
I want to exctact pages from a pdf document such that all pages are extracted except the first one and the last one. Code but `(end-1)` does not work, nor `2-end-1` pdftk 1.pdf cat 2-(end-1) output output.pdf
I want to exctact pages from a pdf document such that all pages are extracted except the first one and the last one.
Code but
(end-1)
does not work, nor 2-end-1
pdftk 1.pdf cat 2-(end-1) output output.pdf
Léo Léopold Hertz 준영
(7138 rep)
Mar 31, 2017, 08:44 AM
• Last activity: Oct 18, 2023, 09:23 PM
0
votes
1
answers
119
views
Combine PDF forms including filled-in information
I have been trying to file my taxes by filling out the individual forms, which are fillable PDFs on the IRS's website, and concatenating them using the `pdfunite` command: ``` pdfunite f1040.pdf f1040s1.pdf f1040s3.pdf f1040sa.pdf Fed_2022_all.pdf ``` However, that results in the saved form values f...
I have been trying to file my taxes by filling out the individual forms, which are fillable PDFs on the IRS's website, and concatenating them using the
pdfunite
command:
pdfunite f1040.pdf f1040s1.pdf f1040s3.pdf f1040sa.pdf Fed_2022_all.pdf
However, that results in the saved form values from each form being saved only on the first page. For instance, Form 1040 is two pages, and the form fields on page 2 are rendered uneditable. How can I prevent this?
Mirlan
(101 rep)
Oct 14, 2023, 07:14 PM
• Last activity: Oct 14, 2023, 07:21 PM
0
votes
2
answers
222
views
How to define a multiple page ranges for pdftk with a bash variable
I'm using Arch linux, Openbox window manager, and bash. Everything is up to date with the latest versions. Can anyone tell me why I can't get the `"$page_range"` variable to show up within `pdftk` when I specify a couple of page ranges as `3-5 7-9`? When I specify only one page range `3-5` in my yad...
I'm using Arch linux, Openbox window manager, and bash.
Everything is up to date with the latest versions.
Can anyone tell me why I can't get the
"$page_range"
variable to show up within pdftk
when I specify a couple of page ranges as 3-5 7-9
?
When I specify only one page range 3-5
in my yad pop-up box everything works as it should.
pdftk does allow more than one page range to be defined within the command. Indeed when I type the command out on the command line without using bash variables within it, pdftk works as expected taking the page ranges 3-5 7-9
. Just not when I contain this value within the variable "$page_range"
.
All I want to do is extract page ranges 3-5 and 7-9 from file
/home/$USER/my_file.pdf
into another pdf file
using the variable $page_range
to define my ranges.
Here is my simple script.
#!/bin/bash
# collect the values with yad
extract_values=$(yad --form --width=200 \
--title="Enter the page ranges you wish to extract" \
--text="\n\n Enter the page ranges you wish to extract\n as eg 301-302\n or 301-302 305-306\n for grouping" \
--field="Page range":text "11-13 21-23" \
--button="Cancel!gtk-close":2 \
--button="Edit script":1 \
--button="Submit":0)
# strip out the values from the string
page_range=$(echo $extract_values | cut -d '|' -f 1)
echo $page_range
# produce a unique file extender
page_range_slugify="$(echo "$page_range" | sed 's/ /_/g')"
echo;echo $page_range_slugify
echo
# specify the filename
f=/home/$USER/my_file.pdf
# get path and file name without pdf extension
fz="${f%.*}"
# check everything is as it should be
yad --text="\n page range = $page_range\n page_range_slugify = $page_range_slugify\n file + path without file extension = $fz\n\n"
# below works only for one range but will not expand for two page ranges
pdftk "$f" cat "$page_range" output "$fz"_"$page_range_slugify".pdf
# below takes one range only as above
#pdftk "$f" cat "$(printf %s "$page_range")" output "$fz"_"$page_range_slugify".pdf
# below takes both ranges when ranges are directly placed within the command
#pdftk "$f" cat 3-5 7-9 output "$fz"_"$page_range_slugify".pdf
Kes
(909 rep)
Sep 13, 2023, 10:06 AM
• Last activity: Sep 15, 2023, 09:14 AM
18
votes
4
answers
16266
views
Printing two pages per sheet from the command line
Say I start off from a PDF document, say of 12 pages, viewed with **evince**. To produce another PDF of 6 sheets, with a page setup of two pages per side, I normally use the "Print to File" device listed in the ^P dialogue window. This works out pretty neatly. I would like to translate this operatio...
Say I start off from a PDF document, say of 12 pages, viewed with **evince**.
To produce another PDF of 6 sheets, with a page setup of two pages per side,
I normally use the "Print to File" device listed in the ^P dialogue window.
This works out pretty neatly.
I would like to translate this operation for the command line.
- To my understanding, this is not an operation that **pdftk** can do. Please cross check.
- The command
lp
, which would accept the option -o number-up=2
, does not recognize any device called "Print to File", which indeed does not show up in lpstat -p -d
.
- I am aware of the post What is “Print to File” and can it be used from command line? . I have installed **cups-pdf** whereby a new printer named PDF is acknowledged. However, the print quality of a simple text file is way too raw (for example, no print margins to start with). Moreover, if I reprint an existing PDF file on this device, say lp -p PDF existing.pdf
, evince can't even manage to open that copycatted output, while this is not the case with the "Print to File" way.
- I had a look at man evince
. At the bottom, it touches upon a few print preview options and redirects to a GNOME-developer project page . Admittedly I am not able to make sense and use of it.
Is there actually a way to combine the flexibility of the command line with the print quality that I obtain from that "Print to File" option in the GUI evince?
My test case, again, would be to create from the command line a PDF out of a source document printed with two pages per sheet.
Thanks for thinking along.
XavierStuvw
(1179 rep)
Jan 9, 2016, 09:45 PM
• Last activity: Sep 9, 2023, 11:47 AM
5
votes
3
answers
2307
views
pdfimages wont extract all images
I'm using pdfimages to extract images from a [PDF File][1]. I've counted at last 10 images. But the program will only extract 4. pdfimages -all file.pdf i Generates -rw-rw-r-- 1 victor victor 61389 Jul 14 21:48 i-000.png -rw-rw-r-- 1 victor victor 88 Jul 14 21:48 i-001.png -rw-rw-r-- 1 victor victor...
I'm using pdfimages to extract images from a PDF File .
I've counted at last 10 images.
But the program will only extract 4.
pdfimages -all file.pdf i
Generates
-rw-rw-r-- 1 victor victor 61389 Jul 14 21:48 i-000.png
-rw-rw-r-- 1 victor victor 88 Jul 14 21:48 i-001.png
-rw-rw-r-- 1 victor victor 5226 Jul 14 21:48 i-002.png
-rw-rw-r-- 1 victor victor 95657 Jul 14 21:48 i-003.png
Am I missing some setting?
How can I extract all images?
Victor Ribeiro
(151 rep)
Jul 15, 2016, 12:55 AM
• Last activity: Jul 18, 2023, 02:30 PM
2
votes
0
answers
319
views
Combine PDFs to new PDF with filenames as bookmarks
I have a directory *foo* containing .pdf files named with pattern *X01, X02, ...*, each two pages long. I want to combine them to a new .pdf, named "_all_YY-MM-DDTHHMMSS.pdf_" that will contain the file names as bookmarks. I used these two commands. While the first one works well, $ gs -dBATCH -dNOP...
I have a directory *foo* containing .pdf files named with pattern *X01, X02, ...*, each two pages long. I want to combine them to a new .pdf, named "_all_YY-MM-DDTHHMMSS.pdf_" that will contain the file names as bookmarks.
I used these two commands. While the first one works well,
$ gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -dAutoRotatePages=/None -sOutputFile=all_$(date +"%FT%H%M%S").pdf X*.pdf
the second one, based on [this answer](https://unix.stackexchange.com/a/72457/372935) fails.
$ pdftk all_2023-07-12T094706.pdf update_info {ls | grep X*} output out.pdf
grep: X}: No such file or directory
grep: output: No such file or directory
grep: out.pdf: No such file or directory
Done. Input errors, so no output created.
I was trying to
grep
the ls
for the filenames starting with _X*_, in order to exclude the new combined .pdf names _all..._.
How do I get this to work, preferably by adding update_info
to the first command?
I'm aware of solutions like [this](https://unix.stackexchange.com/a/709643/372935) , but they look rather tedious.
jay.sf
(237 rep)
Jul 12, 2023, 08:13 AM
0
votes
1
answers
165
views
Correcting typo in PDF, other solutions aren't working
I have a job interview on Tuesday, and I need to correct a typo "obestiy" [sic] to "obesity" in a PDF from LaTeX that I cannot recompile due to missing images and tables. I have tried 1. `qpdf general.audience.pdf --object-streams=disable expanded.pdf`, as suggested by https://unix.stackexchange.com...
I have a job interview on Tuesday, and I need to correct a typo "obestiy" [sic] to "obesity" in a PDF from LaTeX that I cannot recompile due to missing images and tables.
I have tried
1.
as suggested by https://unix.stackexchange.com/questions/17220/how-to-view-and-edit-the-code-of-a-pdf-file/109177#109177 , but when I try to edit the file, "obestiy" doesn't appear, so this method didn't work. 2. https://askubuntu.com/questions/803850/find-and-replace-with-on-pdf-file-from-command-line suggests that I can use qpdf thus:
but then the word "obestiy" [sic] never shows up. Perhaps there is/are some characters in between the letters, explaining why "obestiy" doesn't show up? How can I edit the PDF to correct the typo? 3. LibreOffice Draw distorts and ruins all of the text in the file, making the PDF unusable. Maybe there is a way for LibreOffice Draw to *not* alter the fonts?
qpdf general.audience.pdf --object-streams=disable expanded.pdf
, as suggested by https://unix.stackexchange.com/questions/17220/how-to-view-and-edit-the-code-of-a-pdf-file/109177#109177 , but when I try to edit the file, "obestiy" doesn't appear, so this method didn't work. 2. https://askubuntu.com/questions/803850/find-and-replace-with-on-pdf-file-from-command-line suggests that I can use qpdf thus:
pdftk general.audience.pdf output uncompressed.pdf uncompress
but then the word "obestiy" [sic] never shows up. Perhaps there is/are some characters in between the letters, explaining why "obestiy" doesn't show up? How can I edit the PDF to correct the typo? 3. LibreOffice Draw distorts and ruins all of the text in the file, making the PDF unusable. Maybe there is a way for LibreOffice Draw to *not* alter the fonts?
con
(109 rep)
Apr 21, 2023, 09:00 PM
• Last activity: Apr 21, 2023, 09:44 PM
Showing page 1 of 20 total questions