How can I get the page numbers only of a pattern in a pdf file, regardless if the pattern is multiline?

3 votes

2 answers

1526 views

                          I find the page numbers of a multiline pattern in a pdf file, by https://unix.stackexchange.com/questions/457834/how-shall-i-grep-a-multi-line-pattern-in-a-pdf-file-and-in-a-text-file  and   https://unix.stackexchange.com/questions/457778/how-can-i-search-a-string-in-a-pdf-file-and-find-the-physical-page-number-of-ea/457780#457780 

    $ pdfgrep -Pn '(?s)image\s+?not\s+?available'  main_text.pdf 
    49: image
       not
    available
    51: image
       not
    available
    53: image
       not
    available
    54: image
       not
    available
    55: image
       not
    available
    
 I would like to extract the page number only,  but because the pattern is multiline, I get
   
    $ pdfgrep -Pn '(?s)image\s+?not\s+?available'  main_text.pdf | awk -F":" '{print $1}'
    49
       not
    available
    51
       not
    available
    53
       not
    available
    54
       not
    available
    55
       not
    available

instead of

    49
    51
    53
    54
    55

I wonder how I can extract the page numbers only, regardless if the pattern is multiline? Thanks.
                        

Asked by Tim (106420 rep)

Jul 22, 2018, 11:26 PM
Last activity: Jul 22, 2018, 11:43 PM

How can I get the page numbers only of a pattern in a pdf file, regardless if the pattern is multiline?

Related Questions