Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

5 votes

4 answers

869 views

Eliminate duplicate pages from pdf

I have a pdf document with over 200 duplicate pages among the total 900 of the document. When there is a duplicate, it appears immediately after the original. Maybe with `pdftk` the job can be done, but I need some way to find out the duplicates...

                                  I have a pdf document with over 200 duplicate pages among the total 900 of the document. When there is a duplicate, it appears immediately after the original.

Maybe with pdftk the job can be done, but I need some way to find out the duplicates...

fich (340 rep)

Jun 20, 2021, 07:50 PM • Last activity: May 25, 2025, 05:25 AM

0 votes

1 answers

41 views

Spliting PDF while keeping index in the new file

pdf split pdftk

I have got a PDF file with many tomes in it. Because it contains a lot (>5,000) of pages I want to split it. I have used `pdftk` like this: ```bash pdftk input.pdf cat 487-2987 output second_tome.pdf ``` It works, but somehow `pdftk` doesn't put index in the output file. Because the content has many...

I have got a PDF file with many tomes in it. Because it contains a lot (>5,000) of pages I want to split it. I have used pdftk like this:

pdftk input.pdf cat 487-2987 output second_tome.pdf

It works, but somehow pdftk doesn't put index in the output file. Because the content has many chapters I would like it to keep index, so I could quickly skip to a chapter in my PDF viewer. I tried gs, but it behaves similar to pdftk: it doesn't write index. And works very slow. I tried qpdf, which **do** keep the index, but it puts the *entire* index of the input PDF, which results in the output file having information of all old contents. Also, if (like in the example above) I want to separate a range of pages, the "first" page in the output PDF will not start from 1. Is there any way to do a split with an index?

Felix.leg (103 rep)

May 24, 2025, 10:35 AM • Last activity: May 24, 2025, 12:54 PM

0 votes

1 answers

2582 views

How to convert a directory of jpg files to a pdf with filenames as bookmarks?

pdf pdftk bookmarks

I have a directory of jpg files that are scans of my handwritten notes. How do I convert them to a single pdf file that has the filenames as bookmarks? (I eventually also want to add ocr. As mentioned online, we can convert the `.jpg` files to a `.pdf` using `img2pdf .jp --output combined.pdf`, and...

                                  I have a directory of jpg files that are scans of my handwritten notes. How do I convert them to a single pdf file that has the filenames as bookmarks?

(I eventually also want to add ocr. As mentioned online, we can convert the .jpg files to a .pdf using img2pdf .jp --output combined.pdf, and then we can add ocr using ocrmypdf combined.pdf combined_ocr.pdf. My question is mainly about how to also make sure the pdf file has bookmarks (created from filenames) so that the document is easy to navigate.)

jm jm (1 rep)

Sep 23, 2021, 04:04 AM • Last activity: Apr 17, 2025, 03:03 AM

7 votes

4 answers

20351 views

Proper way to convert PDF to word from bash command-line

text-processing pdf libreoffice pdftk

I need to convert 1K pdf files to doc on a debian server. I can convert a PDF to word using libreoffice commandline: libreoffice --headless --invisible --convert-to doc Sample-doc-file-100kb.pdf Or using soffice: soffice --nocrashreport --nologo --nolockcheck --nofirststartwizard --invisible --headl...

                                  I need to convert 1K pdf files to doc on a debian server. I can convert a PDF to word using libreoffice commandline:

    libreoffice --headless --invisible --convert-to doc Sample-doc-file-100kb.pdf

Or using soffice:

    soffice --nocrashreport --nologo --nolockcheck --nofirststartwizard --invisible --headless --convert-to doc Sample-doc-file-100kb.pdf

The main problem with the above two commands, is that the doc file doesn't include images in the pages, it only contains the formatted text. Is there a better way to convert pdf to doc, including also the images present in the pdf? I am not interested in web services like zamzam, I need to do that from command-line on the server. Thank you.

user2972081 (171 rep)

Jun 21, 2016, 07:45 PM • Last activity: Mar 5, 2025, 03:27 AM

206 votes

11 answers

178676 views

Command line: How do you rotate a PDF file 90 degrees?

command-line pdf pdftk poppler

When I scan documents that are landscape-oriented, the output PDF files are portrait and so all the PDF viewers display the scanned documents in portrait. **From the command line, how do you rotate a PDF file 90 degrees?** I tried searching and found a bunch of solutions but I had trouble finding wh...

                                  When I scan documents that are landscape-oriented, the output PDF files are portrait and so all the PDF viewers display the scanned documents in portrait.

**From the command line, how do you rotate a PDF file 90 degrees?**

I tried searching and found a bunch of solutions but I had trouble finding what looked like an authoritative solution that uses a stable and robust Linux/Unix tool.

----

footnote 

For example, here is a sampling of some of the haphazard solutions I found:

- "just use Adobe Acrobat Pro to rotate the file and then save the file"
- "use pdfjam"
- "use PDFtk"
- "use ${PROGRAM_NAME} from Poppler"
- "use ImageMagick's convert"
-- but then all the comments were very negative and stating "the image quality is ruined"
- "open the file in a PDF viewer, then rotate, then print using a PDF printer like cutePDF or PDF printer or etc"
- "use ${PROGRAM_NAME}", then I searched for "${PROGRAM_NAME}" and there is something about "Fedora removed ${PROGRAM_NAME} because of licensing issues"

                                

Trevor Boyd Smith (4181 rep)

Sep 24, 2017, 12:19 AM • Last activity: Feb 22, 2025, 04:36 PM

5 votes

1 answers

7398 views

How to remove pages from a PDF while leaving the document otherwise unchanged

pdf pdftk

I have a PDF book that I want to remove a few pages from to reduce the file size. My normal solution to this didn't work, and when I tried others they introduced new problems: - I usually use PDF Arranger for this, which is normally a great tool. However, when I try it on this particular document I...

                                  I have a PDF book that I want to remove a few pages from to reduce the file size. My normal solution to this didn't work, and when I tried others they introduced new problems:

 - I usually use PDF Arranger for this, which is normally a great tool. However, when I try it on this particular document I get an error I've never seen before (invalid literal for int() with base 8: b'228')
 - I can use pdftk to remove the pages, but the file size of the resultant document is more than double that of the original, which defeats the purpose of removing the pages in the first place
 - I can also use the Print to File command to remove the correct pages, but then I get a huge margin around the pages, with a smaller font and more whitespace, making the file harder to read

As you can see, it's surprisingly tricky to remove pages while otherwise leaving the document the same. Any advice on other solutions, or figuring out what's going on with these, would be much apreciated!
                                

pez (51 rep)

Apr 15, 2020, 07:14 PM • Last activity: Aug 27, 2024, 01:58 PM

0 votes

2 answers

106 views

Create pdf output file alternating pages between two files

pdf pdftk

I have a pdf file, let's call it A, composed of a lot of pages. Then I have a second pdf file, let's call it B, composed of a single page. My goal is to have an output file, let's call it O, with the following pattern: O[1] = A[1] O[2] = B[1] O[3] = A[2] O[4] = B[1] O[5] = A[3] O[6] = B[1] O[7] = A[...

                                  I have a pdf file, let's call it A, composed of a lot of pages.
Then I have a second pdf file, let's call it B, composed of a single page.

My goal is to have an output file, let's call it O, with the following pattern:

    O = A
    O = B
    O = A
    O = B
    O = A
    O = B
    O = A
    O = B
    ...

In other words, I want to interleaving the pages of A with page B.
To give you a background, file A contains the slides of a course I've attended, I created file B with a single page filled by horizontal lines.

The resulting output file will allow me to place my notes on the right side of the slides (since I'm going to print two pages per sheet).

I'm able to do this importing every single page in LibreOffice, but I'm looking for a script (perhaps pdftk?) to easily run against different files.


                                

Mark (815 rep)

Apr 15, 2024, 06:34 AM • Last activity: Jun 9, 2024, 02:42 PM

0 votes

1 answers

46 views

How to concatenate various jpgs pngs and pdf into single pdf leaving their sizes as they where?

pdf imagemagick pdftk

I am trying to make a kind of dossier out of various pngs,jpgs and single page pdf images. It has to be a single pdf file, but it doesn't matter that results into something conveniently printable. As such, each page can be whatever size and format whatsoever. I have seen answers here that show how t...

                                  I am trying to make a kind of dossier out of various pngs,jpgs and single page pdf images. It has to be a single pdf file, but it doesn't matter that results into something conveniently printable. As such, each page can be whatever size and format whatsoever. I have seen answers here that show how to make them into nice same-size pages, or center them, or such, but my interest is exactly the opposite. The order is already solved as the filenames are nicely numbered. If I try with pdfunite (after a convert), it makes them fit into letter paper pages. Using convert directly also makes them fit them into letter paper pages, but it crops images that are to wide.  
                                

wpkzz (113 rep)

May 28, 2024, 09:35 PM • Last activity: May 28, 2024, 11:00 PM

0 votes

0 answers

39 views

Remove PDF pages if content is subset of next page

scripting pdf pdftk

I have a lot of uni-slides where a list of bullet points gets shown page by page until one slide contains all points. For the sake of explaining a topic this is quite nice but for learning I would like to remove all the pages leading up to the overview. Then I wouldn't have to mindlessly scroll the...

                                  I have a lot of uni-slides where a list of bullet points gets shown page by page until one slide contains all points.  
For the sake of explaining a topic this is quite nice but for learning I would like to remove all the pages leading up to the overview. Then I wouldn't have to mindlessly scroll the pdf up and down when searching something.

Are there any tools that support this operation/ has someone written a script that can delete those pages?

**TLDR:**  
*PDF pages often only add one piece of additional information and remove none. What can I use to keep the page with all information?*

I tried the answer from this [question](https://unix.stackexchange.com/questions/204040/extract-completed-slides-of-a-slide-show-pdf)  from 2015, but it failed on all pdfs I tried with "Input Errors".

BillGatesPriv (11 rep)

May 12, 2024, 09:46 AM

8 votes

4 answers

5544 views

How can I rasterize all of the text in a PDF?

linux pdf pdftk ocr

You know when you have a pdf, which is a scan of a document and it's a really huge file, because it just stores the picture of the scanned document? And there are OCR tools which can help you to make a proper document which just stores the text? Well, I need the reverse of that! Let's say I have a p...

                                  You know when you have a pdf, which is a scan of a document and it's a really huge file, because it just stores the picture of the scanned document? 

And there are OCR tools which can help you to make a proper document which just stores the text?

Well, I need the reverse of that! Let's say I have a perfect pdf document generated with pdflatex and I need to turn it into such a "huge" pdf, which looks exactly the same when printed on paper (with a certain dpi value), but is just a picture of the original.

My initial idea is to turn the pdf into a series of JPGs and then back into a PDF, but perhaps there is some canonical way for that?

---
In case you wonder why I would want to do such a thing: I'm currently stuck with a network printer, which is not maintained by me, and which randomly drops characters in printed files! So until someone figures out what's wrong there, I want this as workaround.

Dimitri Schachmann (183 rep)

Apr 26, 2015, 02:09 PM • Last activity: Feb 18, 2024, 01:40 PM

10 votes

3 answers

5452 views

How to concatenate pdf files with different frame sizes

pdftk

For concatenating presentations of the same topic I use `pdftk` (e. g. `pdftk stones\ in\ england.pdf stones\ from\ namibia.pdf cat output nice\ stones.pdf`). Files with diferent frame sizes but same aspect ratio are just stringed together without any respect to the frame sizes. How can I concatenat...

                                  For concatenating presentations of the same topic I use pdftk (e. g. pdftk stones\ in\ england.pdf stones\ from\ namibia.pdf cat output nice\ stones.pdf).

Files with diferent frame sizes but same aspect ratio are just stringed together without any respect to the frame sizes.

How can I concatenate multiple *.pdf with one resulting size for all frames (in the same aspect ratio)?

muggi (759 rep)

Mar 20, 2018, 09:31 AM • Last activity: Dec 29, 2023, 01:27 PM

6 votes

4 answers

2623 views

Concatenate PDFs but extend pdf's to be even number of pages

pdf pdftk

I want to concatenate a bunch of PDFs but for printing purposes I would prefer that empty pages are added to each document that have an odd number of pages. Can I do this with PDFTK?

                                  I want to concatenate a bunch of PDFs but for printing purposes I would prefer that empty pages are added to each document that have an odd number of pages. Can I do this with PDFTK?
                                

ase (203 rep)

Mar 29, 2016, 02:17 PM • Last activity: Nov 7, 2023, 11:15 AM

31 votes

4 answers

28139 views

How do I insert a blank page into a PDF with ghostscript or pdftk?

pdf ghostscript pdftk

I have a PDF file that needs a blank page inserted into it every so often. The pattern is unpredictable, so I need a command that will allow me to fit one in wherever necessary. How can i do this?

                                  I have a PDF file that needs a blank page inserted into it every so often. The pattern is unpredictable, so I need a command that will allow me to fit one in wherever necessary.

How can i do this?

ixtmixilix (13520 rep)

Jul 2, 2011, 11:49 PM • Last activity: Nov 7, 2023, 11:08 AM

8 votes

4 answers

2429 views

How to pdftk end minus 1?

pdf pdftk

I want to exctact pages from a pdf document such that all pages are extracted except the first one and the last one. Code but `(end-1)` does not work, nor `2-end-1` pdftk 1.pdf cat 2-(end-1) output output.pdf

                                  I want to exctact pages from a pdf document such that all pages are extracted except the first one and the last one. 
Code but (end-1) does not work, nor 2-end-1

    pdftk 1.pdf cat 2-(end-1) output output.pdf

Léo Léopold Hertz 준영 (7138 rep)

Mar 31, 2017, 08:44 AM • Last activity: Oct 18, 2023, 09:23 PM

0 votes

1 answers

119 views

Combine PDF forms including filled-in information

pdf pdftk poppler

I have been trying to file my taxes by filling out the individual forms, which are fillable PDFs on the IRS's website, and concatenating them using the `pdfunite` command: ``` pdfunite f1040.pdf f1040s1.pdf f1040s3.pdf f1040sa.pdf Fed_2022_all.pdf ``` However, that results in the saved form values f...

I have been trying to file my taxes by filling out the individual forms, which are fillable PDFs on the IRS's website, and concatenating them using the pdfunite command:

pdfunite f1040.pdf f1040s1.pdf f1040s3.pdf f1040sa.pdf Fed_2022_all.pdf

However, that results in the saved form values from each form being saved only on the first page. For instance, Form 1040 is two pages, and the form fields on page 2 are rendered uneditable. How can I prevent this?

Mirlan (101 rep)

Oct 14, 2023, 07:14 PM • Last activity: Oct 14, 2023, 07:21 PM

0 votes

2 answers

222 views

How to define a multiple page ranges for pdftk with a bash variable

bash pdftk

I'm using Arch linux, Openbox window manager, and bash. Everything is up to date with the latest versions. Can anyone tell me why I can't get the `"$page_range"` variable to show up within `pdftk` when I specify a couple of page ranges as `3-5 7-9`? When I specify only one page range `3-5` in my yad...

                                  I'm using Arch linux, Openbox window manager, and bash.  
Everything is up to date with the latest versions.   

Can anyone tell me why I can't get the "$page_range" variable to show up within pdftk when I specify a couple of page ranges as 3-5 7-9?  

When I specify only one page range 3-5 in my yad pop-up box everything works as it should.  

pdftk does allow more than one page range to be defined within the command. Indeed when I type the command out on the command line without using bash variables within it, pdftk works as expected taking the page ranges 3-5 7-9. Just not when I contain this value within the variable "$page_range".  
 
All I want to do is extract page ranges 3-5 and 7-9 from file  
/home/$USER/my_file.pdf    
into another pdf file  
using the variable $page_range to define my ranges.  

Here is my simple script.

    #!/bin/bash
    
    # collect the values with yad
    
    extract_values=$(yad --form --width=200 \
    --title="Enter the page ranges you wish to extract" \
    --text="\n\n  Enter the page ranges you wish to extract\n    as eg 301-302\n    or 301-302 305-306\n     for grouping" \
    --field="Page range":text "11-13 21-23" \
    --button="Cancel!gtk-close":2 \
    --button="Edit script":1 \
    --button="Submit":0)
    
    # strip out the values from the string
    page_range=$(echo $extract_values | cut -d '|' -f  1)
    echo $page_range
    
    # produce a unique file extender 
    page_range_slugify="$(echo "$page_range" | sed 's/ /_/g')" 
    echo;echo $page_range_slugify
    echo
    
    # specify the filename
    f=/home/$USER/my_file.pdf
    
    # get path and file name without pdf extension
    fz="${f%.*}"
    
    # check everything is as it should be
    yad --text="\n page range = $page_range\n page_range_slugify = $page_range_slugify\n file + path without file extension = $fz\n\n"
    
    # below works only for one range but will not expand for two page ranges
    pdftk "$f" cat "$page_range" output "$fz"_"$page_range_slugify".pdf
    
    # below takes one range only as above 
    #pdftk "$f" cat "$(printf %s "$page_range")" output "$fz"_"$page_range_slugify".pdf
    
    # below takes both ranges when ranges are directly placed within the command
    #pdftk "$f" cat 3-5 7-9 output "$fz"_"$page_range_slugify".pdf

Kes (909 rep)

Sep 13, 2023, 10:06 AM • Last activity: Sep 15, 2023, 09:14 AM

18 votes

4 answers

16266 views

Printing two pages per sheet from the command line

pdf cups evince pdftk lp

Say I start off from a PDF document, say of 12 pages, viewed with **evince**. To produce another PDF of 6 sheets, with a page setup of two pages per side, I normally use the "Print to File" device listed in the ^P dialogue window. This works out pretty neatly. I would like to translate this operatio...

                                  Say I start off from a PDF document, say of 12 pages, viewed with **evince**. 
To produce another PDF of 6 sheets, with a page setup of two pages per side, 
I normally use the "Print to File" device listed in the ^P dialogue window.
This works out pretty neatly.

I would like to translate this operation for the command line. 

 - To my understanding, this is not an operation that **pdftk** can do. Please cross check.
 - The  command lp, which would accept the option -o number-up=2, does not recognize any device called "Print to File", which indeed does not show up in lpstat -p -d.
 - I am aware of the post What is “Print to File” and can it be used from command line? . I have installed **cups-pdf** whereby a new printer named PDF is acknowledged. However, the print quality of a simple text file is way too raw (for example, no print margins to start with). Moreover, if I reprint an existing PDF file on this device, say lp -p PDF existing.pdf, evince can't even manage to open that copycatted output, while this is not the case with the "Print to File" way.
 - I had a look at man evince. At the bottom, it touches upon a few print preview options and redirects to a GNOME-developer project page . Admittedly I am not able to make sense and use of it. 

Is there actually a way to combine the flexibility of the command line with the print quality that I obtain from that "Print to File" option in the GUI  evince? 

My test case, again, would be to create from the command line a PDF out of a source document printed with two pages per sheet.

Thanks for thinking along.

                                

XavierStuvw (1179 rep)

Jan 9, 2016, 09:45 PM • Last activity: Sep 9, 2023, 11:47 AM

5 votes

3 answers

2307 views

pdfimages wont extract all images

pdf images pdftk

I'm using pdfimages to extract images from a [PDF File][1]. I've counted at last 10 images. But the program will only extract 4. pdfimages -all file.pdf i Generates -rw-rw-r-- 1 victor victor 61389 Jul 14 21:48 i-000.png -rw-rw-r-- 1 victor victor 88 Jul 14 21:48 i-001.png -rw-rw-r-- 1 victor victor...

                                  I'm using pdfimages to extract images from a PDF File .
I've counted at last 10 images.
But the program will only extract 4.

    pdfimages -all file.pdf i

Generates

    -rw-rw-r--    1 victor victor   61389 Jul 14 21:48 i-000.png
    -rw-rw-r--    1 victor victor      88 Jul 14 21:48 i-001.png
    -rw-rw-r--    1 victor victor    5226 Jul 14 21:48 i-002.png
    -rw-rw-r--    1 victor victor   95657 Jul 14 21:48 i-003.png

Am I missing some setting?

How can I extract all images?
                                

Victor Ribeiro (151 rep)

Jul 15, 2016, 12:55 AM • Last activity: Jul 18, 2023, 02:30 PM

2 votes

0 answers

319 views

Combine PDFs to new PDF with filenames as bookmarks

linux pdf pdftk ghostscript bookmarks

I have a directory *foo* containing .pdf files named with pattern *X01, X02, ...*, each two pages long. I want to combine them to a new .pdf, named "_all_YY-MM-DDTHHMMSS.pdf_" that will contain the file names as bookmarks. I used these two commands. While the first one works well, $ gs -dBATCH -dNOP...

                                  I have a directory *foo* containing .pdf files named with pattern *X01, X02, ...*, each two pages long. I want to combine them to a new .pdf, named "_all_YY-MM-DDTHHMMSS.pdf_" that will contain the file names as bookmarks.

I used these two commands. While the first one works well, 

    $ gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -dAutoRotatePages=/None -sOutputFile=all_$(date +"%FT%H%M%S").pdf X*.pdf

the second one, based on [this answer](https://unix.stackexchange.com/a/72457/372935)  fails.

    $ pdftk all_2023-07-12T094706.pdf update_info {ls | grep X*} output out.pdf
    grep: X}: No such file or directory
    grep: output: No such file or directory
    grep: out.pdf: No such file or directory
    Done.  Input errors, so no output created.

I was trying to grep the ls for the filenames starting with _X*_, in order to exclude the new combined .pdf names _all..._.

How do I get this to work, preferably by adding update_info to the first command?

I'm aware of solutions like [this](https://unix.stackexchange.com/a/709643/372935) , but they look rather tedious.
                                

jay.sf (237 rep)

Jul 12, 2023, 08:13 AM

0 votes

1 answers

165 views

Correcting typo in PDF, other solutions aren't working

sed pdf pdftk qpdf

I have a job interview on Tuesday, and I need to correct a typo "obestiy" [sic] to "obesity" in a PDF from LaTeX that I cannot recompile due to missing images and tables. I have tried 1. `qpdf general.audience.pdf --object-streams=disable expanded.pdf`, as suggested by https://unix.stackexchange.com...

                                  I have a job interview on Tuesday, and I need to correct a typo "obestiy" [sic] to "obesity" in a PDF from LaTeX that I cannot recompile due to missing images and tables.

I have tried

1. qpdf general.audience.pdf --object-streams=disable expanded.pdf, 

as suggested by https://unix.stackexchange.com/questions/17220/how-to-view-and-edit-the-code-of-a-pdf-file/109177#109177 , but when I try to edit the file, "obestiy" doesn't appear, so this method didn't work.

2. https://askubuntu.com/questions/803850/find-and-replace-with-on-pdf-file-from-command-line  suggests that I can use qpdf thus:

pdftk general.audience.pdf output uncompressed.pdf uncompress

but then the word "obestiy" [sic] never shows up.

Perhaps there is/are some characters in between the letters, explaining why "obestiy" doesn't show up?

How can I edit the PDF to correct the typo?

3. LibreOffice Draw distorts and ruins all of the text in the file, making the PDF unusable. Maybe there is a way for LibreOffice Draw to *not* alter the fonts?

con (109 rep)

Apr 21, 2023, 09:00 PM • Last activity: Apr 21, 2023, 09:44 PM

Showing page 1 of 20 total questions