Sample Header Ad - 728x90

Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

0 votes
0 answers
26 views
What is the correct usage of $1 in a pdfgrep script?
I'm trying to create a simple scripted command to search for specific words within a directory of pdf files. I presently use the command: pdfgrep -Ri -C 0 '\ ' /media/files/pdf-all To search for TERM, and it works fine. In order to avoid having to digit the whole command I created the following scri...
I'm trying to create a simple scripted command to search for specific words within a directory of pdf files. I presently use the command: pdfgrep -Ri -C 0 '\' /media/files/pdf-all To search for TERM, and it works fine. In order to avoid having to digit the whole command I created the following script named pdfx: #!/bin/bash term=$1 pdfgrep -Ri -C 0 '\' /media/files/pdf-all So that I can simply digit pdfx TERM or whatever word I need to search for. However it does not work :( What am I doing wrong?
black-clover (383 rep)
Jul 18, 2025, 10:18 PM • Last activity: Jul 18, 2025, 11:45 PM
3 votes
3 answers
5814 views
Regex search in PDF reader
I am using zathura, as I enjoy its minimalist approach, but I would also switch to mupdf or anything else if this would solve my problem. I need to highlight every word (in PDF and epub documents) one by one from start to finish because I can concentrate better on the text if I have some kind of mot...
I am using zathura, as I enjoy its minimalist approach, but I would also switch to mupdf or anything else if this would solve my problem. I need to highlight every word (in PDF and epub documents) one by one from start to finish because I can concentrate better on the text if I have some kind of motion in it. My approach would have been to perform a regex search that matches every word, but neither zathura nor mupdf support regex in searches. Is there a way to do this? I would try to fork zathura but to be honest I don't really want to spend that amount of time if there is another minimal Gnu/Linux compatible document viewer that does what I need. And if there is any way to use terminal tools like pdfgrep for highlighting the results in zathura that would also do the job.
luca (152 rep)
Mar 29, 2020, 03:38 PM • Last activity: Jun 6, 2025, 09:30 AM
1 votes
1 answers
79 views
pdfgrep doesn't work with Arabic language strings
I want to use `pdfgrep` and it works. When I want to search for an Arabic text or string, it shows nothing. However, it works properly when I search for an English string. Does anyone have a solution or even an alternative? This is the code I used: ```lang-sh pdfgrep -in 'احمد' name.pdf ```
I want to use pdfgrep and it works. When I want to search for an Arabic text or string, it shows nothing. However, it works properly when I search for an English string. Does anyone have a solution or even an alternative? This is the code I used:
-sh
pdfgrep -in 'احمد' name.pdf
VANMEN (11 rep)
Jul 14, 2022, 10:20 PM • Last activity: May 26, 2025, 11:11 AM
5 votes
2 answers
757 views
Are there PDF files that pdfgrep cannot search yet display with xpdf?
I am on a Chromebook running Debian with pdfgrep v2.1.2. I have a PDF file of the full Mueller Report that I occasionally want to search for particular references. Pdfgrep of the file for any pattern *quickly* returns nothing. Pdftotext cannot seem to handle it and produces a very short file of junk...
I am on a Chromebook running Debian with pdfgrep v2.1.2. I have a PDF file of the full Mueller Report that I occasionally want to search for particular references. Pdfgrep of the file for any pattern *quickly* returns nothing. Pdftotext cannot seem to handle it and produces a very short file of junk. Pdfinfo produces the following:
Title:           
Creator:         RICOH MP C6502
Producer:        RICOH MP C6502
CreationDate:    Wed Apr 17 15:23:21 2019 PDT
ModDate:         Wed Apr 17 15:59:41 2019 PDT
Custom Metadata: no
Metadata Stream: yes
Tagged:          no
UserProperties:  no
Suspects:        no
Form:            AcroForm
JavaScript:      no
Pages:           448
Encrypted:       no
Page size:       792 x 612 pts (letter)
Page rot:        270
File size:       145509756 bytes
Optimized:       yes
PDF version:     1.6
Could pdfgrep not be compatible with the PDF version of the file?
David Masterson (51 rep)
Nov 3, 2024, 04:18 AM • Last activity: Nov 16, 2024, 12:59 AM
4 votes
2 answers
413 views
pdfgrep How to locate the pages that contain multiple strings and print the page numbers?
In a pdf file, there are some pages that contain both string1 and string2. I would like to locate those pages and print the page numbers.
In a pdf file, there are some pages that contain both string1 and string2. I would like to locate those pages and print the page numbers.
Glenn (43 rep)
Jul 4, 2024, 04:16 PM • Last activity: Jul 6, 2024, 08:16 AM
0 votes
1 answers
99 views
pdfgrep multiple files with different passwords
I am trying to grep strings in password protected PDFs (credit card statements). There are multiple files with different passwords. The [manpage(?)](https://pdfgrep.org/pdfgrep.html) says --password=Value can be specified multiple times and each password would be tried against every pdf file to be g...
I am trying to grep strings in password protected PDFs (credit card statements). There are multiple files with different passwords. The [manpage(?)](https://pdfgrep.org/pdfgrep.html) says --password=Value can be specified multiple times and each password would be tried against every pdf file to be grepped. But, I find it is only the last password that gets used. pdfgrep -P "[0-9] [JFMASOND][aepuco][nbrylgptv] [0-9].+[0-9,]+\.[0-9][0-9] *([cC][rR])?" --password=password1 --password=password2 *.pdf Only password2 is being used and only those files are grepped. Obviously, other way round if password1 is last password given. Couple of questions: 1. how to provide multiple passwords to pdfgrep? 2. any other simpler way of grepping (or getting a list of) credit card transactions from the monthly statements? Not sure if it matters, I'm trying on cygwin.
tpb261 (135 rep)
Dec 25, 2023, 12:23 PM • Last activity: Dec 31, 2023, 01:30 AM
3 votes
1 answers
195 views
Is there a tool for searching keywords super fast in many pdfs files?
I have a bunch of technical books, and I have been using `pdfgrep` for a while, but it takes substantial amount of time for searching all. can somebody recommend me of a cli tool for searching in pdf files super fast? it should have an underline database for caching purposes - similar to `locate` co...
I have a bunch of technical books, and I have been using pdfgrep for a while, but it takes substantial amount of time for searching all. can somebody recommend me of a cli tool for searching in pdf files super fast? it should have an underline database for caching purposes - similar to locate command but just for pdf's keywords. Thank you all! :)
JammingThebBits (426 rep)
Aug 15, 2018, 11:24 AM • Last activity: Oct 25, 2022, 10:21 PM
1 votes
1 answers
62 views
Deep search of several pdf files with pdfgrep, ignoring counts less than
I am doing a "deep search" within several pdf files with "pdfgrep", trying to find a word and get a count on the documents like this: # pdfgrep -ric PATTERN ./Example1.pdf:0 ./Example2.pdf:10 Any idea how i can ignore the printout for files with an defined number of counts? Like 0 or less than...?
I am doing a "deep search" within several pdf files with "pdfgrep", trying to find a word and get a count on the documents like this: # pdfgrep -ric PATTERN ./Example1.pdf:0 ./Example2.pdf:10 Any idea how i can ignore the printout for files with an defined number of counts? Like 0 or less than...?
Nils (113 rep)
May 27, 2022, 07:50 AM • Last activity: May 27, 2022, 08:07 AM
0 votes
1 answers
199 views
Is it possible to integrate pdfgrep into nemo search?
I often find myself looking for PDF documents. Luckily, I found pdfgrep that really does a great job at finding PDF documents by content. Following command lets me search for documents that have my search word on the first page ```shell pdfgrep -irl --page-range=1 2>/dev/null 'mysearchword' ``` Is i...
I often find myself looking for PDF documents. Luckily, I found pdfgrep that really does a great job at finding PDF documents by content. Following command lets me search for documents that have my search word on the first page
pdfgrep -irl --page-range=1 2>/dev/null 'mysearchword'
Is it possible to integrate this command into the Nemo file manager search?
Charles David Mupende (3 rep)
Dec 8, 2021, 01:22 PM • Last activity: Dec 8, 2021, 02:54 PM
1 votes
1 answers
1694 views
How do I pdfgrep using a specific pattern (Syntax?)
I'm trying to use pdfgrep to search each occurences of a specific pattern (MUST start with E OR S) then followed by 5 digits (Only) THEN execute a command afterward (Which is likely to be a mv command) So far, I have the following command : pdfgrep -e '[E-S]\d{5,}$' filename.pdf But for the life of...
I'm trying to use pdfgrep to search each occurences of a specific pattern (MUST start with E OR S) then followed by 5 digits (Only) THEN execute a command afterward (Which is likely to be a mv command) So far, I have the following command : pdfgrep -e '[E-S]\d{5,}$' filename.pdf But for the life of me, I am unable to find anything in that PDF. Searching for a specific term (pdfgrep "term" filename.pdf) does return the term in question so I know pdfgrep is able to find it. I am guessing my issue is the syntax of the command or regex but I cannot find where exactly...
ATragicEnding (11 rep)
Feb 3, 2021, 07:32 PM • Last activity: Feb 4, 2021, 03:18 PM
2 votes
1 answers
859 views
Is there any ligature-aware alternative for "pdfgrep" in command line?
I always use "pdfgrep" to search inside of multiple PDF files from the command line. But I met a problem: This ligature character "fi" (see https://www.compart.com/en/unicode/U+FB01).  "fi" is in the word "fixed", so I could not search the term "fixed point operator" with `pdfgrep -iR 'fixed...
I always use "pdfgrep" to search inside of multiple PDF files from the command line. But I met a problem: This ligature character "fi" (see https://www.compart.com/en/unicode/U+FB01).  ; "fi" is in the word "fixed", so I could not search the term "fixed point operator" with pdfgrep -iR 'fixed point operator'. However, when I open the file with PDF readers such as Foxit reader and Evince, "fi" is split into "f" and "i", thus searchable. Is there any more reliable alternative for the "pdfgrep"? Or is there any option keywords in "pdfgrep" to expand the encoding? The PDF file is http://direct.mit.edu/books/chapter-pdf/238450/9780262321037_can.pdf . Ubuntu 20.04, amd64, kernel version Linux 5.6.0-1018-oem. pdfgrep has an option --unac. But if I install pdfgrep with sudo apt-get install pdfgrep, command --unac will report "pdfgrep: UNAC support disabled at compile time!" pdfgrep: Installed: 2.1.2-1build1 Candidate: 2.1.2-1build1 Version table: *** 2.1.2-1build1 500 500 http://mirrors.huaweicloud.com/ubuntu focal/universe amd64 Packages 100 /var/lib/dpkg/status
la la (21 rep)
Aug 29, 2020, 04:05 AM • Last activity: Dec 28, 2020, 11:56 PM
-4 votes
2 answers
171 views
Can we search in a pdf file for pages containing several words in no particular order?
I would like to search in a pdf file for all the pages, each containing several given words in no particular order. For example, I want to find all the pages which contain both "hello" and "world" in no particular order. I am not sure if `pdfgrep` can do it. I am trying to do something similar to ho...
I would like to search in a pdf file for all the pages, each containing several given words in no particular order. For example, I want to find all the pages which contain both "hello" and "world" in no particular order. I am not sure if pdfgrep can do it. I am trying to do something similar to how we can search for several words in a book shown in Google Books. Thanks.
Tim (106422 rep)
Apr 20, 2019, 02:15 AM • Last activity: Apr 20, 2019, 05:33 AM
0 votes
0 answers
391 views
Split pdf based on keyword
Is there an utility that would split PDF file based on keyword? I can only find split by pages (e.g. QPDF). I can also see pdfgrep, but I don't know whether this has been already combined in some other utility or not. I can write the bash script but how do I return the pages to split by from pdfgrep...
Is there an utility that would split PDF file based on keyword? I can only find split by pages (e.g. QPDF). I can also see pdfgrep, but I don't know whether this has been already combined in some other utility or not. I can write the bash script but how do I return the pages to split by from pdfgrep?
Tomas Greif (379 rep)
Feb 27, 2019, 01:16 AM
3 votes
2 answers
4665 views
Is there a way to search (grep/find) a specific word within multiple pdf files located on a specific drive?
I am trying to locate a client's pdf file that was saved on an external backup drive, which contains a little over 8000 pdf files and hundreds of folders. For example, if I want to search all pdf files on drive X: that contains my client's name "Sequoia Group", what are some useful command lines and...
I am trying to locate a client's pdf file that was saved on an external backup drive, which contains a little over 8000 pdf files and hundreds of folders. For example, if I want to search all pdf files on drive X: that contains my client's name "Sequoia Group", what are some useful command lines and/or tools to achieve relevant output results? I'm using MacOS High Sierra, with zsh, I've also installed GNU grep, ack, and pdfgrep via homebrew. However, I haven't been able to find the file yet. Filename is unkown, since all files were saved as PDF-Backup-0001, PDF-Backup-0002...etc., I used the following commands so far with no luck: #grep -wirl "sequoia group" ./ #pdfgrep -iHncRZ "sequoia group" #mdfind "sequoia group" Also, this command line was suggested, however, I am not sure where to put the name, so I replaced /path with the drive's path, and pattern with "sequoia", still did not find any matches #find /path -iname '*.pdf' -exec pdfgrep pattern {} + #find /Volumes/X Backup -iname '*.pdf' -exec pdfgrep "sequoia" {} +
DiFrag (31 rep)
Jan 26, 2019, 04:03 PM • Last activity: Jan 26, 2019, 06:45 PM
3 votes
2 answers
1525 views
How can I get the page numbers only of a pattern in a pdf file, regardless if the pattern is multiline?
I find the page numbers of a multiline pattern in a pdf file, by https://unix.stackexchange.com/questions/457834/how-shall-i-grep-a-multi-line-pattern-in-a-pdf-file-and-in-a-text-file and https://unix.stackexchange.com/questions/457778/how-can-i-search-a-string-in-a-pdf-file-and-find-the-physical-pa...
I find the page numbers of a multiline pattern in a pdf file, by https://unix.stackexchange.com/questions/457834/how-shall-i-grep-a-multi-line-pattern-in-a-pdf-file-and-in-a-text-file and https://unix.stackexchange.com/questions/457778/how-can-i-search-a-string-in-a-pdf-file-and-find-the-physical-page-number-of-ea/457780#457780 $ pdfgrep -Pn '(?s)image\s+?not\s+?available' main_text.pdf 49: image not available 51: image not available 53: image not available 54: image not available 55: image not available I would like to extract the page number only, but because the pattern is multiline, I get $ pdfgrep -Pn '(?s)image\s+?not\s+?available' main_text.pdf | awk -F":" '{print $1}' 49 not available 51 not available 53 not available 54 not available 55 not available instead of 49 51 53 54 55 I wonder how I can extract the page numbers only, regardless if the pattern is multiline? Thanks.
Tim (106422 rep)
Jul 22, 2018, 11:26 PM • Last activity: Jul 22, 2018, 11:43 PM
1 votes
0 answers
152 views
How shall I grep a multi-line pattern in a pdf file and in a text file?
In the output of `less my.pdf`, a string `image not available` appears multiple times, for example: ... Lastly, what remains to ^L image not available ^L Implementations and Systems I would like to grep the string in the pdf file, for example, by `pdfgrep`. But `pdfgrep -ni "image not available" my....
In the output of less my.pdf, a string `image not available` appears multiple times, for example: ... Lastly, what remains to ^L image not available ^L Implementations and Systems I would like to grep the string in the pdf file, for example, by pdfgrep. But pdfgrep -ni "image not available" my.pdf doesn't find anything. What shall I do then? Related question: Since pdfgrep has similar user interface to grep, how does grep handle a pattern which spans more than one lines? Thanks.
Tim (106422 rep)
Jul 22, 2018, 09:58 PM
Showing page 1 of 16 total questions