How can I get the page numbers only of a pattern in a pdf file, regardless if the pattern is multiline?
3
votes
2
answers
1526
views
I find the page numbers of a multiline pattern in a pdf file, by https://unix.stackexchange.com/questions/457834/how-shall-i-grep-a-multi-line-pattern-in-a-pdf-file-and-in-a-text-file and https://unix.stackexchange.com/questions/457778/how-can-i-search-a-string-in-a-pdf-file-and-find-the-physical-page-number-of-ea/457780#457780
$ pdfgrep -Pn '(?s)image\s+?not\s+?available' main_text.pdf
49: image
not
available
51: image
not
available
53: image
not
available
54: image
not
available
55: image
not
available
I would like to extract the page number only, but because the pattern is multiline, I get
$ pdfgrep -Pn '(?s)image\s+?not\s+?available' main_text.pdf | awk -F":" '{print $1}'
49
not
available
51
not
available
53
not
available
54
not
available
55
not
available
instead of
49
51
53
54
55
I wonder how I can extract the page numbers only, regardless if the pattern is multiline? Thanks.
Asked by Tim
(106420 rep)
Jul 22, 2018, 11:26 PM
Last activity: Jul 22, 2018, 11:43 PM
Last activity: Jul 22, 2018, 11:43 PM