Sample Header Ad - 728x90

How can I get the page numbers only of a pattern in a pdf file, regardless if the pattern is multiline?

3 votes
2 answers
1526 views
I find the page numbers of a multiline pattern in a pdf file, by https://unix.stackexchange.com/questions/457834/how-shall-i-grep-a-multi-line-pattern-in-a-pdf-file-and-in-a-text-file and https://unix.stackexchange.com/questions/457778/how-can-i-search-a-string-in-a-pdf-file-and-find-the-physical-page-number-of-ea/457780#457780 $ pdfgrep -Pn '(?s)image\s+?not\s+?available' main_text.pdf 49: image not available 51: image not available 53: image not available 54: image not available 55: image not available I would like to extract the page number only, but because the pattern is multiline, I get $ pdfgrep -Pn '(?s)image\s+?not\s+?available' main_text.pdf | awk -F":" '{print $1}' 49 not available 51 not available 53 not available 54 not available 55 not available instead of 49 51 53 54 55 I wonder how I can extract the page numbers only, regardless if the pattern is multiline? Thanks.
Asked by Tim (106420 rep)
Jul 22, 2018, 11:26 PM
Last activity: Jul 22, 2018, 11:43 PM