Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

0 votes

0 answers

26 views

What is the correct usage of $1 in a pdfgrep script?

pdfgrep

I'm trying to create a simple scripted command to search for specific words within a directory of pdf files. I presently use the command: pdfgrep -Ri -C 0 '\ ' /media/files/pdf-all To search for TERM, and it works fine. In order to avoid having to digit the whole command I created the following scri...

                                  I'm trying to create a simple scripted command to search for specific words within a directory of pdf files.
I presently use the command:

    pdfgrep -Ri -C 0 '\' /media/files/pdf-all
To search for TERM, and it works fine.
In order to avoid having to digit the whole command I created the following script named pdfx:

    #!/bin/bash
    term=$1
    pdfgrep -Ri -C 0 '\' /media/files/pdf-all
So that I can simply digit pdfx TERM or whatever word I need to search for.

However it does not work :(

What am I doing wrong?

black-clover (383 rep)

Jul 18, 2025, 10:18 PM • Last activity: Jul 18, 2025, 11:45 PM

3 votes

3 answers

5814 views

Regex search in PDF reader

regular-expression pdf zathura documents pdfgrep

I am using zathura, as I enjoy its minimalist approach, but I would also switch to mupdf or anything else if this would solve my problem. I need to highlight every word (in PDF and epub documents) one by one from start to finish because I can concentrate better on the text if I have some kind of mot...

                                  I am using zathura, as I enjoy its minimalist approach, but I would also switch to mupdf or anything else if this would solve my problem.

I need to highlight every word (in PDF and epub documents) one by one from start to finish because I can concentrate better on the text if I have some kind of motion in it. My approach would have been to perform a regex search that matches every word, but neither zathura nor mupdf support regex in searches. Is there a way to do this? 

I would try to fork zathura but to be honest I don't really want to spend that amount of time if there is another minimal Gnu/Linux compatible document viewer that does what I need. And if there is any way to use terminal tools like pdfgrep for highlighting the results in zathura that would also do the job.

luca (152 rep)

Mar 29, 2020, 03:38 PM • Last activity: Jun 6, 2025, 09:30 AM

1 votes

1 answers

79 views

pdfgrep doesn't work with Arabic language strings

unicode pdfgrep

I want to use `pdfgrep` and it works. When I want to search for an Arabic text or string, it shows nothing. However, it works properly when I search for an English string. Does anyone have a solution or even an alternative? This is the code I used: ```lang-sh pdfgrep -in 'احمد' name.pdf ```

I want to use pdfgrep and it works. When I want to search for an Arabic text or string, it shows nothing. However, it works properly when I search for an English string. Does anyone have a solution or even an alternative? This is the code I used:

-sh
pdfgrep -in 'احمد' name.pdf

VANMEN (11 rep)

Jul 14, 2022, 10:20 PM • Last activity: May 26, 2025, 11:11 AM

5 votes

2 answers

757 views

Are there PDF files that pdfgrep cannot search yet display with xpdf?

pdf version pdfgrep

I am on a Chromebook running Debian with pdfgrep v2.1.2. I have a PDF file of the full Mueller Report that I occasionally want to search for particular references. Pdfgrep of the file for any pattern *quickly* returns nothing. Pdftotext cannot seem to handle it and produces a very short file of junk...

Title:           
Creator:         RICOH MP C6502
Producer:        RICOH MP C6502
CreationDate:    Wed Apr 17 15:23:21 2019 PDT
ModDate:         Wed Apr 17 15:59:41 2019 PDT
Custom Metadata: no
Metadata Stream: yes
Tagged:          no
UserProperties:  no
Suspects:        no
Form:            AcroForm
JavaScript:      no
Pages:           448
Encrypted:       no
Page size:       792 x 612 pts (letter)
Page rot:        270
File size:       145509756 bytes
Optimized:       yes
PDF version:     1.6

Could pdfgrep not be compatible with the PDF version of the file?

David Masterson (51 rep)

Nov 3, 2024, 04:18 AM • Last activity: Nov 16, 2024, 12:59 AM

4 votes

2 answers

413 views

pdfgrep How to locate the pages that contain multiple strings and print the page numbers?

pdfgrep

In a pdf file, there are some pages that contain both string1 and string2. I would like to locate those pages and print the page numbers.

                                  In a pdf file, there are some pages that contain both string1 and string2. I would like to locate those pages and print the page numbers. 
                                

Glenn (43 rep)

Jul 4, 2024, 04:16 PM • Last activity: Jul 6, 2024, 08:16 AM

0 votes

1 answers

99 views

pdfgrep multiple files with different passwords

pdf cygwin pdfgrep

I am trying to grep strings in password protected PDFs (credit card statements). There are multiple files with different passwords. The [manpage(?)](https://pdfgrep.org/pdfgrep.html) says --password=Value can be specified multiple times and each password would be tried against every pdf file to be g...

                                  I am trying to grep strings in password protected PDFs (credit card statements). There are multiple files with different passwords. The [manpage(?)](https://pdfgrep.org/pdfgrep.html)   says --password=Value can be specified multiple times and each password would be tried against every pdf file to be grepped. But, I find it is only the last password that gets used.

    pdfgrep -P "[0-9] [JFMASOND][aepuco][nbrylgptv] [0-9].+[0-9,]+\.[0-9][0-9] *([cC][rR])?" --password=password1 --password=password2 *.pdf

Only password2 is being used and only those files are grepped. Obviously, other way round if password1 is last password given.

Couple of questions:

 1. how to provide multiple passwords to pdfgrep?
 2. any other simpler way of grepping (or getting a list of) credit card transactions from the monthly statements?

Not sure if it matters, I'm trying on cygwin.

tpb261 (135 rep)

Dec 25, 2023, 12:23 PM • Last activity: Dec 31, 2023, 01:30 AM

3 votes

1 answers

195 views

Is there a tool for searching keywords super fast in many pdfs files?

pdf search cache pdfgrep

I have a bunch of technical books, and I have been using `pdfgrep` for a while, but it takes substantial amount of time for searching all. can somebody recommend me of a cli tool for searching in pdf files super fast? it should have an underline database for caching purposes - similar to `locate` co...

                                  I have a bunch of technical books,
and I have been using pdfgrep for a while,
but it takes substantial amount of time for searching all.

can somebody recommend me of a cli tool for searching in pdf files super fast?

it should have an underline database for caching purposes - similar to locate command but just for pdf's keywords.

Thank you all! :)

JammingThebBits (426 rep)

Aug 15, 2018, 11:24 AM • Last activity: Oct 25, 2022, 10:21 PM

1 votes

1 answers

62 views

Deep search of several pdf files with pdfgrep, ignoring counts less than

pdfgrep

I am doing a "deep search" within several pdf files with "pdfgrep", trying to find a word and get a count on the documents like this: # pdfgrep -ric PATTERN ./Example1.pdf:0 ./Example2.pdf:10 Any idea how i can ignore the printout for files with an defined number of counts? Like 0 or less than...?

                                  I am doing a "deep search" within several pdf files with "pdfgrep", trying to find a word and get a count on the documents like this:

    # pdfgrep -ric PATTERN

    ./Example1.pdf:0
    ./Example2.pdf:10

Any idea how i can ignore the printout for files with an defined number of counts? Like 0 or less than...?

Nils (113 rep)

May 27, 2022, 07:50 AM • Last activity: May 27, 2022, 08:07 AM

0 votes

1 answers

199 views

Is it possible to integrate pdfgrep into nemo search?

ubuntu search nemo indexing pdfgrep

I often find myself looking for PDF documents. Luckily, I found pdfgrep that really does a great job at finding PDF documents by content. Following command lets me search for documents that have my search word on the first page ```shell pdfgrep -irl --page-range=1 2>/dev/null 'mysearchword' ``` Is i...

pdfgrep -irl --page-range=1 2>/dev/null 'mysearchword'

Is it possible to integrate this command into the Nemo file manager search?

Charles David Mupende (3 rep)

Dec 8, 2021, 01:22 PM • Last activity: Dec 8, 2021, 02:54 PM

1 votes

1 answers

1694 views

How do I pdfgrep using a specific pattern (Syntax?)

regular-expression pdfgrep

I'm trying to use pdfgrep to search each occurences of a specific pattern (MUST start with E OR S) then followed by 5 digits (Only) THEN execute a command afterward (Which is likely to be a mv command) So far, I have the following command : pdfgrep -e '[E-S]\d{5,}$' filename.pdf But for the life of...

                                  I'm trying to use pdfgrep to search each occurences of a specific pattern (MUST start with E OR S) then followed by 5 digits (Only) THEN execute a command afterward (Which is likely to be a mv command)

So far, I have the following command :

    pdfgrep -e '[E-S]\d{5,}$' filename.pdf

But for the life of me, I am unable to find anything in that PDF. Searching for a specific term (pdfgrep "term" filename.pdf) does return the term in question so I know pdfgrep is able to find it.

I am guessing my issue is the syntax of the command or regex but I cannot find where exactly...

ATragicEnding (11 rep)

Feb 3, 2021, 07:32 PM • Last activity: Feb 4, 2021, 03:18 PM

2 votes

1 answers

859 views

Is there any ligature-aware alternative for "pdfgrep" in command line?

text-processing command-line pdf character-encoding pdfgrep

I always use "pdfgrep" to search inside of multiple PDF files from the command line. But I met a problem: This ligature character "ﬁ" (see https://www.compart.com/en/unicode/U+FB01).  "ﬁ" is in the word "fixed", so I could not search the term "fixed point operator" with `pdfgrep -iR 'fixed...

                                  I always use "pdfgrep" to search inside of multiple PDF files from the command line. But I met a problem: This ligature character "ﬁ" (see https://www.compart.com/en/unicode/U+FB01).  ;
"ﬁ" is in the word "fixed", so I could not search the term "fixed point operator" with pdfgrep -iR 'fixed point operator'. However, when I open the file with PDF readers such as Foxit reader and Evince, "ﬁ" is split into "f" and "i", thus searchable. Is there any more reliable alternative for the "pdfgrep"? Or is there any option keywords in "pdfgrep" to expand the encoding? 

The PDF file is http://direct.mit.edu/books/chapter-pdf/238450/9780262321037_can.pdf    . 

Ubuntu 20.04, amd64, kernel version Linux 5.6.0-1018-oem. pdfgrep has an option --unac. But if I install pdfgrep with sudo apt-get install pdfgrep, command --unac will report "pdfgrep: UNAC support disabled at compile time!"

    pdfgrep:
      Installed: 2.1.2-1build1
      Candidate: 2.1.2-1build1
      Version table:
     *** 2.1.2-1build1 500
            500 http://mirrors.huaweicloud.com/ubuntu  focal/universe amd64 Packages
            100 /var/lib/dpkg/status

                                

la la (21 rep)

Aug 29, 2020, 04:05 AM • Last activity: Dec 28, 2020, 11:56 PM

-4 votes

2 answers

171 views

Can we search in a pdf file for pages containing several words in no particular order?

pdf search pdfgrep

I would like to search in a pdf file for all the pages, each containing several given words in no particular order. For example, I want to find all the pages which contain both "hello" and "world" in no particular order. I am not sure if `pdfgrep` can do it. I am trying to do something similar to ho...

                                  I would like to search in a pdf file for all the pages, each containing several given words in no particular order. For example, I want to find all the pages which contain both "hello" and "world" in no particular order. 

I am not sure if pdfgrep  can do it.

I am trying to do something similar to how we can search for several words in a book shown in Google Books.  

Thanks.

Tim (106422 rep)

Apr 20, 2019, 02:15 AM • Last activity: Apr 20, 2019, 05:33 AM

0 votes

0 answers

391 views

Split pdf based on keyword

pdf pdfgrep

Is there an utility that would split PDF file based on keyword? I can only find split by pages (e.g. QPDF). I can also see pdfgrep, but I don't know whether this has been already combined in some other utility or not. I can write the bash script but how do I return the pages to split by from pdfgrep...

                                  Is there an utility that would split PDF file based on keyword? I can only find split by pages (e.g. QPDF). I can also see pdfgrep, but I don't know whether this has been already combined in some other utility or not. I can write the bash script but how do I return the pages to split by from pdfgrep?
                                

Tomas Greif (379 rep)

Feb 27, 2019, 01:16 AM

3 votes

2 answers

4665 views

Is there a way to search (grep/find) a specific word within multiple pdf files located on a specific drive?

grep find pdf search pdfgrep

I am trying to locate a client's pdf file that was saved on an external backup drive, which contains a little over 8000 pdf files and hundreds of folders. For example, if I want to search all pdf files on drive X: that contains my client's name "Sequoia Group", what are some useful command lines and...

                                  I am trying to locate a client's pdf file that was saved on an external backup drive, which contains a little over 8000 pdf files and hundreds of folders.

For example, if I want to search all pdf files on drive X: that contains my client's name "Sequoia Group", what are some useful command lines and/or tools to achieve relevant output results? 

I'm using MacOS High Sierra, with zsh, I've also installed GNU grep, ack, and pdfgrep via homebrew. However, I haven't been able to find the file yet. 

Filename is unkown, since all files were saved as PDF-Backup-0001, PDF-Backup-0002...etc.,

I used the following commands so far with no luck:

    #grep -wirl "sequoia group" ./
    
    #pdfgrep -iHncRZ "sequoia group"
    
    #mdfind "sequoia group"

Also, this command line was suggested, however, I am not sure where to put the name, so I replaced /path with the drive's path, and pattern with "sequoia", still did not find any matches

    #find /path -iname '*.pdf' -exec pdfgrep pattern {} + 
    #find /Volumes/X Backup -iname '*.pdf' -exec pdfgrep "sequoia" {} +

DiFrag (31 rep)

Jan 26, 2019, 04:03 PM • Last activity: Jan 26, 2019, 06:45 PM

3 votes

2 answers

1525 views

How can I get the page numbers only of a pattern in a pdf file, regardless if the pattern is multiline?

text-processing awk grep pdf pdfgrep

I find the page numbers of a multiline pattern in a pdf file, by https://unix.stackexchange.com/questions/457834/how-shall-i-grep-a-multi-line-pattern-in-a-pdf-file-and-in-a-text-file and https://unix.stackexchange.com/questions/457778/how-can-i-search-a-string-in-a-pdf-file-and-find-the-physical-pa...

                                  I find the page numbers of a multiline pattern in a pdf file, by https://unix.stackexchange.com/questions/457834/how-shall-i-grep-a-multi-line-pattern-in-a-pdf-file-and-in-a-text-file  and   https://unix.stackexchange.com/questions/457778/how-can-i-search-a-string-in-a-pdf-file-and-find-the-physical-page-number-of-ea/457780#457780 

    $ pdfgrep -Pn '(?s)image\s+?not\s+?available'  main_text.pdf 
    49: image
       not
    available
    51: image
       not
    available
    53: image
       not
    available
    54: image
       not
    available
    55: image
       not
    available
    
 I would like to extract the page number only,  but because the pattern is multiline, I get
   
    $ pdfgrep -Pn '(?s)image\s+?not\s+?available'  main_text.pdf | awk -F":" '{print $1}'
    49
       not
    available
    51
       not
    available
    53
       not
    available
    54
       not
    available
    55
       not
    available

instead of

    49
    51
    53
    54
    55

I wonder how I can extract the page numbers only, regardless if the pattern is multiline? Thanks.
                                

Tim (106422 rep)

Jul 22, 2018, 11:26 PM • Last activity: Jul 22, 2018, 11:43 PM

1 votes

0 answers

152 views

How shall I grep a multi-line pattern in a pdf file and in a text file?

grep pdf pdfgrep

In the output of `less my.pdf`, a string `image not available` appears multiple times, for example: ... Lastly, what remains to ^L image not available ^L Implementations and Systems I would like to grep the string in the pdf file, for example, by `pdfgrep`. But `pdfgrep -ni "image not available" my....

                                  In the output of less my.pdf,  a string `image
   not
available` appears multiple times, for example:

    ... Lastly, what remains to
    ^L image
       not
    available
    ^L     Implementations and Systems      

I would like to grep the string in the pdf file, for example, by pdfgrep.
But pdfgrep  -ni "image not available" my.pdf doesn't find anything. What shall I do then?

Related question: Since pdfgrep has similar user interface to grep, how does grep handle a pattern which spans more than one lines?

Thanks.


                                

Tim (106422 rep)

Jul 22, 2018, 09:58 PM

Showing page 1 of 16 total questions