Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

8 votes

1 answers

516 views

Is there any way to see the string that was matched in grep?

**I'm not talking about -o option.** [Posix](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03) says: > The search for a matching sequence starts at the beginning of a string and stops when the **first sequence matching the expression is found**, where "first" is def...

                                  **I'm not talking about -o option.** 
[Posix](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03)  says:
> The search for a matching sequence starts at the beginning of a string and stops when the **first sequence matching the expression is found**, where "first" is defined to mean "begins earliest in the string". If the pattern permits a variable number of matching characters and thus there is more than one such sequence starting at that point, the **longest such sequence is matched**. For example, the BRE "bb*" matches the second to fourth characters of the string "abbbc", and the ERE "(wee|week)(knights|night)" matches all ten characters of the string "weeknights".

And I want to **verify** what is being said in posix and this tutorial [regTutorialSite](https://www.regular-expressions.info/posix.html) :
> A POSIX-compliant engine will still find the **leftmost match**. If you **apply** Set|SetValue to Set or SetValue **once**, it **will match Set**. 

How to "apply once"?
When i run **grep -o** the result is two strings: Set and SetValue, but not just " one leftmost " . That is, I read about one thing, but in practice I get something else.  So, how to see what string was matched by regex?

(Perhaps the question was formulated incorrectly or could have been better)
                                

Mark (99 rep)

Aug 1, 2025, 12:59 PM • Last activity: Aug 2, 2025, 07:19 AM

2 votes

4 answers

136 views

Grep (BRE) on surrounding delimiters w/o consuming the delimiter? Counting delimiter-separated strings between filename and extension

bash shell grep regular-expression

I have a dataset of images labeled/classified by characteristics, where an image can have more than one label. I want to count how many of each identifier I have. A toy dataset is created below, with different colors being the labels. ```bash bballdave025@MY-MACHINE /home/bballdave025/toy $ touch el...

bballdave025@MY-MACHINE /home/bballdave025/toy
$ touch elephant_grey.jpg && touch zebra_white_black.jpg && touch rubik-s_cube_-_1977-first-prod_by_ErnoRubik-_red_orange_yellow_white_blue_green.jpg && touch Radio_Hotel.Washington_Heights.NYC-USA_green_yellow_blue_red_orange_reddish-purple_teal_grey-brown.jpg && touch Big_Bird__yellow_orange_red.jpg

Let's make it more easily visible. The files in the initially labeled dataset are shown below. (The | awk -F'/' '{print $NF}' is just meant to take off the ./ or path/to/where/the/jpegs/are/ that would otherwise be before the filename.)

$ find . -type f | awk -F'/' '{print $NF}'
Big_Bird__yellow_orange_red.jpg
elephant_grey.jpg
Radio_Hotel.Washington_Heights.NYC-USA_green_yellow_blue_red_orange_reddish-purple_teal_grey-brown.jpg
rubik-s_cube_-_1977-first-prod_by_ErnoRubik-_red_orange_yellow_white_blue_green.jpg
zebra_white_black.jpg

Those are the filenames for labeled versions of the images. The corresponding originals are below:

$ find ../toy_orig_bak/ -type f | awk -F'/' '{print $NF}'
Big_Bird_.jpg
elephant.jpg
Radio_Hotel.Washington_Heights.NYC-USA.jpg
rubik-s_cube_-_1977-first-prod_by_ErnoRubik-.jpg
zebra.jpg

This is to show that the color labels are inserted between the filename and the dot extension. They are separated from each other and from the original filename by a (delimiting) _ character. (There are rules for the label names and for the filenames1.) The only allowed color strings at this initial point are any of {black, white, grey, red, orange, yellow, green, blue, reddish-purple, teal, grey-brown}. I further want to show that other labels may be added, as long as they're part of my controlled vocabulary, something which can be changed only by me. Imagine a file named rainbox.jpg gets put in with the original filenames ( touch ../toy_orig_bak/rainbow.jpg, for those of you following along for reproducibility ). I decide that I want to add indigo and violet to my controlled vocabulary list, so I can create the labeled filename,

$ touch rainbow_red_orange_yellow_green_blue_indigo_violet.jpg

Desired Output Again, I want a count of each of the labels. For the dataset I've set up (including that last labeled picture of a rainbow), the correct output would be

1 black
      3 blue
      3 green
      1 grey
      1 grey-brown
      1 indigo
      4 orange
      4 red
      1 reddish-purple
      1 teal
      1 violet
      2 white
      4 yellow

(The counts were performed somewhat manually, due to my grep confusion.) Attempts and a note on the details of the solution I want Research below My first thought (although I did worry about delimiter consumption) was to look at the surrounding delimiters: '_' before and '_' or '.' after. Here's first my grep attempt

find . -type f -iname "*.jpg" | \
 \
    grep -o "[_]\(black\|white\|grey\|red\|orange\|yellow\|green\|blue\|"\
"reddish-purple\|teal\|grey-brown\|indigo\|violet\)[_.]" | \
 \
        tr -d [_.] | sort | uniq -c

and its output

3 blue
      1 green
      1 grey
      1 orange
      3 red
      1 teal
      1 violet
      1 white
      3 yellow

Which is not the same as before. Here's the comparison.

Before              |   Now
-----------------------|---------------------
      1 black          |
      3 blue           |      3 blue
      3 green          |      1 green
      1 grey           |      1 grey
      1 grey-brown     |
      1 indigo         |
      4 orange         |      1 orange
      4 red            |      3 red
      1 reddish-purple |
      1 teal           |      1 teal
      1 violet         |      1 violet
      2 white          |      1 white
      4 yellow         |      3 yellow
                       |

I know this is happening because the regex engine consumes the second delimiter2. Here is the crux of my main question: (I do want to solve my count problem, and I'll talk about some solutions I've researched and considered myself, but) the detail I want to know is about truly regular expressions and consuming the delimiter. I want to get a count of each identifier string, and I'm wondering if I can do it with approach and (POSIX) Basic Regular Expressions – BRENote 2 and [reddit thread](https://www.reddit.com/r/askscience/comments/5rttyo/do_extended_regular_expressions_still_denote_the/) ([archived as a gist](https://gist.github.com/bballdave025/b2f7a190907146151696eed394079a64)) , specifically with grep . Any of sed, awk, IFS with read, etc. are welcome, too. I'm sure someone has a way solve this problem with Perl (dermis and feline can be divorced by manifold methods), and I'd be glad to get that one, too. Basically, I am absolutely okay with other solutions to the task of getting a count of each identifier string. However, if it's true that there's no way of stepping back the engine with a Basic Regular Expression engine (that's truly regular), I want to know. I've thought of zero-width matches, lookaheads, and look-behinds, but I don't know how these play out in POSIX Basic Regular Expressions or in mathematically/grammatically regular language parsers. One thing I realize I wasn't taking into account The point of the rules (see note \[2\]) was to allow the regex to take advantage of the fact that we should be able to assure ourselves that we're only getting the part of the classified filename with labels, as we only allow one of a finite set of strings preceded by an underscore and followed by either an underscore or a dot, with the dot only happening before the file extension. (I guess we can't be absolutely certain, as the original, pre-labeled filename could have one of the labels immediately preceding the dot - something like a_sunburn_that_is_bright_red.jpg, but that's something for which I check and correct by adding a specific non-label string before the dot and extension.) My regex, imagining that it could get past the delimiter being consumed, would still allow the following example problems

the_new_red_car_-_1989_red_black_silver.jpg

- would return {red, red , silver } as is, - {red, red, black , silver } if working without consuming the 2nd '_', - whereas {red, black , silver } is desired

parrot_at_blue_gold_banquet_-_a_black_tie_affair_yellow_red_green.jpg

- would return {blue, black, yellow, green} as is, - {blue, gold, black, yellow, green} if not consuming the 2nd '_', - whereas {yellow, red, green} is desired Extra points for answers and discussions that take that into account. ; ) Research and ideas There are a few discussions on different StackExchange sites, like [this one](https://web.archive.org/web/20230925145242/https://stackoverflow.com/questions/63821591/how-to-split-a-string-by-underscore-and-extract-an-element-as-a-variable-in-bash) , [that one](https://web.archive.org/web/20250602171231/https://stackoverflow.com/questions/49784912/regex-of-underscore-delimited-string) , [another one](https://web.archive.org/web/20250602171046/https://unix.stackexchange.com/questions/267677/add-quotes-and-new-delimiter-around-space-delimited-words) , but I think the [Unix & Linux discussion here](https://unix.stackexchange.com/a/334551/291375) ([archived](https://web.archive.org/web/20230324152449/https://unix.stackexchange.com/questions/334549/how-do-i-extract-multiple-strings-which-are-comma-delimited-from-a-log-file)) is the best one. I think that one of the approaches in this answer from @terdon ♦ or in the answer with hashes – from @Sobrique – might be useful. I keep thinking that some version of ^.*$[_][]$\+[.]jpg$ might be key to the situation, but I haven't been able to put together that solution today. If you know how it can help, you're welcome to give an answer using it; I'm going to wait for a fresh brain tomorrow morning. Edit: @ilkkachu successfully used this idea. Why am I doing this? I'm training a CNN to recognize different occurrences (not colors) in pictures of old and often handwritten books. I want to make sure the classes are balanced as I want. Also, I'll compare this with another method that doesn't look at the delimiter to make sure I don't have any problems like a '_yllow' (instead of '_yellow'), or a '_whiteorange' _instead of '_white_orange'). Most of the labels are put on through a Java program I've put together, but I've given a little leeway for people to change the filenames themselves in case of multiple labels for one file. Having given that permission, I have the responsibility of verifying legal labeled filenames. Notes \[1\] The rules for the identifying/classifying labels are: The identifiers can be any of a finite set of strings which can contain only characters in [A-Za-z0-9-] but not underscores.
The bare filenames (without dot and extension) can consist of any ASCII characters except: 1) non-printable/control characters; 2) spaces or tabs ; OR 3) any of [!"#$%&/)(\]\[}{*?] See the next paragraph for the real 3). (Note that this means the bare filenames CAN have an underscore, '_', or even several of them.)
Edit: I had my no-no list of characters as is now crossed (struck) out above when @ilkkachu gave the accepted answer. One option of that answer makes excellent use of the '@' which was then not in the excluded character group, but which I actually don't allow in my filenames. There are other omissions in the original character group. As I actually want it, the above paragraph should be amended with the following. 3) any of

'[] ~@#$%^&|/)(}{[*?>\
Edit: Now this compiles as a BRE. (This was the simplest and most-readable BRE I could come up with.)

that beautifully crazy character group means that any of

{   [, ],  , ~, @, #, $, %, ^, &, |, \, /, ), (, }, {, [, *, ?, >, `}

is not allowed – and neither is any tab (\t, ...), nor any non-printing/control characters. Some of these are already standard on the no-no list for filenames on different OSs, but I give _my_ complete set (when I'm in charge of creating the filenames).







\[2\] Here is what I mean by the delimiter being consumed. I'll do my best to illustrate an example with our (Basic) Reg(ular)Ex(pression), 

"[_]\(black\|white\|grey\|red\|orange\|yellow\|green\|blue\|"\
"reddish-purple\|teal\|grey-brown\|indigo\|violet\)[_.]"

Here goes.

This missing of some of the color strings is happening because the regex engine consumes the second delimiter.

For example, using O to denote part of a miss (non-match) and X to denote part of a hit (match), with YYYYY denoting a complete match for the whole regex pattern, we get the following behavior.

Engine goes along looking for '_'

engine is here
       |
       v
rainbow_red_orange_yellow_green_blue_indigo_violet.jpg
OOOOOOO

Matches 
      [_]
          with '_'
engine is at
        |
        v
rainbow_red_orange_yellow_green_blue_indigo_violet.jpg
OOOOOOOX

Matches
 \(...\|red\|...\)
                  with 'red'    
  engine is at
           |
           v
rainbow_red_orange_yellow_green_blue_indigo_violet.jpg
OOOOOOOXXXX

Matches
          [_.]
               with '_'
   engine is at
            |
            v
rainbow_red_orange_yellow_green_blue_indigo_violet.jpg
OOOOOOOXXXXX

We have a whole match! 

rainbow_red_orange_yellow_green_blue_indigo_violet.jpg
OOOOOOOYYYYY

Given the  -o  flag, the engine outputs

'_red_'

The  'tr -d [_.]' takes off the surrounding underscores, 
and our output line becomes 

'red'

The problem now is that the engine cannot go back to
find the  '_' before  'orange', or at least it can't do
so using any process I know about from my admittedly
imperfect knowledge of Basic Regular Expressions. As far
as a REGULAR expression engine, using a REGULAR grammar and
a REGULAR language parser knows, the whole universe in which 
it's searching now consists of 

orange_yellow_green_blue_indigo_violet.jpg

(I don't know if this statement is correct from a mathematical/formal-language point of view, and I'd be interested to know.)

And the process continues as from the first, beginning with Engine goes along looking for '_'

orange_yellow_green_blue_indigo_violet.jpg
OOOOOOXXXXXXXX

Match!

orange_yellow_green_blue_indigo_violet.jpg
OOOOOOYYYYYYYY

Engine spits out  '_yellow_'  which is 'tr -d [_.]'-ed

Engine cannot go back, so its search universe is now

green_blue_indigo_violet.jpg

and we continue with

green_blue_indigo_violet.jpg
OOOOOXXXXXX

Match!

green_blue_indigo_violet.jpg
OOOOOYYYYYYOOOOOOYYYYYYYY

That last match being on the '.' from [_.]






 More formally, I want to know if it can be done with a real regular expression, i.e. one which can define a regular language and whose language is a context-free language, cf. Wikipedia's Regex article (archived). I think this is the same as a POSIX regular expression, but I'm not sure. 




Refs. [A] (archived), [B] (archived), [C] (archived), 





Dang it, I know there's a missing ending parenthesis up there in the text, somewhere, because I noticed it and went up to fix it. When I got up into the text, I couldn't remember the context of the parenthesis, so it's still there, just mocking me. I found it, and I bolded it! I'll probably take the bold formatting and this note down, soon, but I'm sharing my happiness right now.


                          
                          
                        
                        
                        
                          
                            
                            bballdave025
                            (418 rep)
                          
                          
                            
                            Jun 3, 2025, 04:56 AM
                            
                              • Last activity: Jul 25, 2025, 03:50 AM



              
              
            
              
                
                  
                    
                      
                        
                          
                            -1
                            votes
                          
                          
                            3
                            answers
                          
                          
                            62
                            views
                          
                        
                      
                      
                        
                          
                            How to remove markdown links from headers with sed?
                          
                          
                        
                        
                        
                          
                            
                              sed
                            
                          
                            
                              regular-expression
                            
                          
                        

                        
                        
                          
                            I'm trying to use sed to remove links like and leave just the title: ```markdown ## [Some title](#some-title) ``` This is my command: ```bash sed 's/^\(\#*\) *\[\([^\]]*\)\].*/\1 \2/' ``` I expect to have just the text without the link: ```markdown ## Some title ``` But it doesn't work. What I do wr...
                          
                          
                          
                          
                            
                              
                                
                                  I'm trying to use sed to remove links like and leave just the title:

## [Some title](#some-title)

This is my command:
sed 's/^\(\#*\) *\[\([^\]]*\)\].*/\1 \2/'

I expect to have just the text without the link:

## Some title

But it doesn't work. What I do wrong?

I'm using Linux with GNU sed.
                                
                              
                            
                          
                          
                          
                        
                        
                        
                          
                            
                            jcubic
                            (10310 rep)
                          
                          
                            
                            Jul 17, 2025, 11:35 PM
                            
                              • Last activity: Jul 19, 2025, 06:22 AM
                            
                          
                        
                      
                    
                  
                
              

              
              
                
                  
                
              
            
              
                
                  
                    
                      
                        
                          
                            3
                            votes
                          
                          
                            2
                            answers
                          
                          
                            543
                            views
                          
                        
                      
                      
                        
                          
                            Grep command with the side effect of adding a trailing newline character in the last line of file
                          
                          
                        
                        
                        
                          
                            
                              grep
                            
                          
                            
                              regular-expression
                            
                          
                            
                              newlines
                            
                          
                        

                        
                        
                          
                            I've been doing some research on how to correctly read lines from a file whose last line may not have a trailing newline character. Have found the answer in [Read a line-oriented file which may not end with a newline](https://unix.stackexchange.com/questions/418060/read-a-line-oriented-file-which-ma...
                          
                          
                          
                          
                            
                              
                                
                                  I've been doing some research on how to correctly read lines from a file whose last line may not have a trailing newline character. Have found the answer in [Read a line-oriented file which may not end with a newline](https://unix.stackexchange.com/questions/418060/read-a-line-oriented-file-which-may-not-end-with-a-newline) .

However, I have a second goal that is to exclude the comments at the beginning of lines and have found a [grep](http://man7.org/linux/man-pages/man1/grep.1.html)  command that achieves the goal

    $ grep -v '^ *#' file

But I have noticed that this command has a (for me unexpected) side behavior: it adds a trailing newline character in the last line if it does not exist

$ cat file
# This is a commentary
aaaaaa
# This is another commentary
bbbbbb
cccccc

$ od -c file
0000000   #       T   h   i   s       i   s       a       c   o   m   m
0000020   e   n   t   a   r   y  \n   a   a   a   a   a   a  \n   #
0000040   T   h   i   s       i   s       a   n   o   t   h   e   r
0000060   c   o   m   m   e   n   t   a   r   y  \n   b   b   b   b   b
0000100   b  \n   c   c   c   c   c   c  \n
0000111

$ truncate -s -1 file

$ od -c file
0000000   #       T   h   i   s       i   s       a       c   o   m   m
0000020   e   n   t   a   r   y  \n   a   a   a   a   a   a  \n   #
0000040   T   h   i   s       i   s       a   n   o   t   h   e   r
0000060   c   o   m   m   e   n   t   a   r   y  \n   b   b   b   b   b
0000100   b  \n   c   c   c   c   c   c
0000110

$ od -c <(grep -v '^ *#' file)
0000000   a   a   a   a   a   a  \n   b   b   b   b   b   b  \n   c   c
0000020   c   c   c   c  \n
0000025
Notice that besides removing the line beginning comments it also adds a  trailing newline character in the last line.

How could that be?
                                
                              
                            
                          
                          
                          
                        
                        
                        
                          
                            
                            Paulo Tom&#233;
                            (3832 rep)
                          
                          
                            
                            Jan 17, 2020, 06:00 PM
                            
                              • Last activity: Jul 18, 2025, 07:13 AM
                            
                          
                        
                      
                    
                  
                
              

              
              
            
              
                
                  
                    
                      
                        
                          
                            2
                            votes
                          
                          
                            1
                            answers
                          
                          
                            2689
                            views
                          
                        
                      
                      
                        
                          
                            LFTP exclude file extensions
                          
                          
                        
                        
                        
                          
                            
                              centos
                            
                          
                            
                              regular-expression
                            
                          
                            
                              lftp
                            
                          
                        

                        
                        
                          
                            I am trying to mirror directories with lftp but I don't want to download filetypes that are notoriously large like .mp4 and .swf. But I am having trouble with the regex - and seeming like the exclude-glob too. Both of them download all files. What I tried: `/usr/local/bin/lftp -u user,pass -e 'mirro...
                          
                          
                          
                          
                            
                              
                                
                                  I am trying to mirror directories with lftp but I don't want to download filetypes that are notoriously large like .mp4 and .swf. But I am having trouble with the regex - and seeming like the exclude-glob too. Both of them download all files.

What I tried:

/usr/local/bin/lftp -u user,pass  -e 'mirror -x ^(\.mp4|\.swf)$ $src $dest' ftp.host  

&&

/usr/local/bin/lftp -u user,pass -e 'mirror -X swf $src $dest' ftp.host  
                                
                              
                            
                          
                          
                          
                        
                        
                        
                          
                            
                            Carter
                            (121 rep)
                          
                          
                            
                            Jul 22, 2015, 07:37 PM
                            
                              • Last activity: Jul 9, 2025, 10:08 PM
                            
                          
                        
                      
                    
                  
                
              

              
              
            
              
                
                  
                    
                      
                        
                          
                            10
                            votes
                          
                          
                            4
                            answers
                          
                          
                            10705
                            views
                          
                        
                      
                      
                        
                          
                            What is wrong with using "\t" to grep for tab-separated values?
                          
                          
                        
                        
                        
                          
                            
                              grep
                            
                          
                            
                              regular-expression
                            
                          
                            
                              tabulation
                            
                          
                        

                        
                        
                          
                            I have a .tsv file (values separated by tabs) with four values. So each line should have only three tabs and some text around each tab like this: value value2 value3 value4 But it looks that some lines are broken (there is more than three tabs). I need to find out these lines. --- I came up with fol...
                          
                          
                          
                          
                            
                              
                                
                                  I have a .tsv file (values separated by tabs) with four values. So each line should have only three tabs and some text around each tab like this:

    value	value2	value3	value4

But it looks that some lines are broken (there is more than three tabs). I need to find out these lines.

---

I came up with following grep pattern. 

grep -v "^[^\t]+\t[^\t]+\t[^\t]+\t[^\t]+$"

My thinking:
- first ^ matches the beggining
- [^\t]+ matches more than one "no tab character"
- \t matches single tab character
- $ matches end

And than I just put it into right order with correct number of times. That should match correct lines. So I reverted it by -v option to get the wrong lines.

But with the -v option it matches any line in the file and also some random text I tried that don't have any tabs inside. 

**What is my mistake please?**


EDIT: I am using debian and bash.

                                
                              
                            
                          
                          
                          
                        
                        
                        
                          
                            
                            TGar
                            (297 rep)
                          
                          
                            
                            Aug 16, 2022, 11:47 AM
                            
                              • Last activity: Jun 27, 2025, 08:47 AM
                            
                          
                        
                      
                    
                  
                
              

              
              
                
                  
                
              
            
              
                
                  
                    
                      
                        
                          
                            6
                            votes
                          
                          
                            5
                            answers
                          
                          
                            2245
                            views
                          
                        
                      
                      
                        
                          
                            How to make grep for a regex that appear multiple times in a line
                          
                          
                        
                        
                        
                          
                            
                              sed
                            
                          
                            
                              grep
                            
                          
                            
                              regular-expression
                            
                          
                        

                        
                        
                          
                            I want to grep a regex. The pattern I am searching for may appear multiple times in a line. If the pattern appeared many times, I want to separate each occurrence by a comma and print **the match only** not the full line in a new file. If it did not appear in a line I want to print **n.a.** Example....
                          
                          
                          
                          
                            
                              
                                
                                  I want to grep a regex. The pattern I am searching for may appear multiple times in a line. If the pattern appeared many times, I want to separate each occurrence by a comma and print **the match only** not the full line in a new file. If it did not appear in a line I want to print **n.a.**

Example. I want to use this regex to find numbers in the pattern: [12.123.1.3]. 

    grep -oh "\[\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\]" 'filename'

input file (input.txt)

    blabla [11.335.2.33] xyuoeretrete [43.22.11.88] jfdfjkfbs [55.66.77.88]
    blabla [66.223.44.33]
    foo bar
    blabla [1.2.33.3] xyuoeretrete  bla[1.32.2.4]

intended result in a new file (output.csv):

    11.335.2.33,43.22.11.88,55.66.77.88
    66.223.44.33
    n.a.
    1.2.33.3,1.32.2.4

**Note: I use Ubuntu**
                                
                              
                            
                          
                          
                          
                        
                        
                        
                          
                            
                            randomname
                            (161 rep)
                          
                          
                            
                            Jun 24, 2022, 09:08 AM
                            
                              • Last activity: Jun 24, 2025, 09:28 AM
                            
                          
                        
                      
                    
                  
                
              

              
              
            
              
                
                  
                    
                      
                        
                          
                            38
                            votes
                          
                          
                            8
                            answers
                          
                          
                            79398
                            views
                          
                        
                      
                      
                        
                          
                            In a regular expression, which characters need escaping?
                          
                          
                        
                        
                        
                          
                            
                              shell
                            
                          
                            
                              regular-expression
                            
                          
                        

                        
                        
                          
                            In general, which characters in a regular expression need escaping? For example, the following is not syntactically correct: echo '[]' | grep '[]' grep: Unmatched [ or [^ This, however, *is* syntatically correct: echo '[]' | grep '\[]' [] Is there any documentation on which characters should be esca...
                          
                          
                          
                          
                            
                              
                                
                                  In general, which characters in a regular expression need escaping?

For example, the following is not syntactically correct:

	echo '[]' | grep '[]'
	grep: Unmatched [ or [^


This, however, *is* syntatically correct:

	echo '[]' | grep '\[]'
	[]

Is there any documentation on which characters should be escaped in a regular expression, and which should not?
                                
                              
                            
                          
                          
                          
                        
                        
                        
                          
                            
                            LanceBaynes
                            (41465 rep)
                          
                          
                            
                            Sep 15, 2011, 06:25 PM
                            
                              • Last activity: Jun 20, 2025, 02:41 PM
                            
                          
                        
                      
                    
                  
                
              

              
              
            
              
                
                  
                    
                      
                        
                          
                            0
                            votes
                          
                          
                            0
                            answers
                          
                          
                            25
                            views
                          
                        
                      
                      
                        
                          
                            Borg files in `~/.local/share/` occur a “dir_open: [Errno 13] Permission denied” error
                          
                          
                        
                        
                        
                          
                            
                              permissions
                            
                          
                            
                              regular-expression
                            
                          
                            
                              directory
                            
                          
                            
                              borgbackup
                            
                          
                        

                        
                        
                          
                            # General overview With BorgBackup, I do a backup of my whole $HOME. However, some files (all located on `~/.local/share/`) occur a rights error. Probably cause it’s ephemeral files belonging to root, I think. Here is an example of these files: ``` /home/fauve/.local/share/tracker/data/tracker-store...
                          
                          
                          
                          
                            
                              
                                
                                  # General overview 
With BorgBackup, I do a backup of my whole $HOME. However, some files (all located on ~/.local/share/) occur a rights error. Probably cause it’s ephemeral  files belonging to root, I think. 

Here is an example of these files:

/home/fauve/.local/share/tracker/data/tracker-store.ontology.journal: open: [Errno 13] Permission denied: 'tracker-store.ontology.journal'
E /home/fauve/.local/share/tracker/data/tracker-store.ontology.journal
/home/fauve/.local/share/tracker/data/tracker-store.journal: open: [Errno 13] Permission denied: 'tracker-store.journal'
E /home/fauve/.local/share/tracker/data/tracker-store.journal
/home/fauve/.local/share/tracker/data/.meta.isrunning: open: [Errno 13] Permission denied: '.meta.isrunning'
E /home/fauve/.local/share/tracker/data/.meta.isrunning
/home/fauve/.local/share/keyrings: dir_open: [Errno 13] Permission denied: 'keyrings'
E /home/fauve/.local/share/keyrings
/home/fauve/.local/share/gnome-shell: dir_open: [Errno 13] Permission denied: 'gnome-shell'
E /home/fauve/.local/share/gnome-shell
/home/fauve/.local/share/evolution: dir_open: [Errno 13] Permission denied: 'evolution'
E /home/fauve/.local/share/evolution
/home/fauve/.local/share/sounds: dir_open: [Errno 13] Permission denied: 'sounds'


# The problem
All the other files are fine saved. But these files occur a shell error.

So, it’s a bit annoying cause I always get a “terminating with warning status, rc 1” at the end of backup process. As I use Borg inside a script, if the script get any error, it don’t continue.

# What I did
## Exclude them explicitly
I first try to explicitly exclude them with Borg’s --exclude’s option. But it’s not really useful, cause it’s not always the same files. It depend from several circumstances as  I see. So I can’t really match them all.

## Exclude the whole directory ~/.local/share/
As all the annoying files comme from ~/.local/share/, I thank to fully ignore it. But after [sevral](https://unix.stackexchange.com/questions/797136/local-share-is-set-for-what-and-can-i-ignore-it-at-backups/797194#797194)  asks, it seems not really possible.

# The question
So is it possible with Borg’s options to just ignore all files|directories according to this two conditions: 
1. Located inside ~/.local/share (or child) ;
2. The current user have no right to read them.

                                
                              
                            
                          
                          
                          
                        
                        
                        
                          
                            
                            fauve
                            (1529 rep)
                          
                          
                            
                            Jun 20, 2025, 01:33 PM
                            
                          
                        
                      
                    
                  
                
              

              
              
                
                  
                
              
            
              
                
                  
                    
                      
                        
                          
                            1
                            votes
                          
                          
                            2
                            answers
                          
                          
                            304
                            views
                          
                        
                      
                      
                        
                          
                            How to delete lines with sed if they contain a phrase?
                          
                          
                        
                        
                        
                          
                            
                              sed
                            
                          
                            
                              regular-expression
                            
                          
                        

                        
                        
                          
                            I tried this sed -i '/^already satisfied$/d' loggocd.txt but lines like this one Requirement already satisfied: cryptography in /home/go/.pyenv/versions/3.9.1/lib/python3.9/site-packages are not deleted. I am using git bash,but I guess that should not be a problem. Or maybe it is?
                          
                          
                          
                          
                            
                              
                                
                                  I tried this

    sed -i '/^already satisfied$/d' loggocd.txt

but lines like this one

    Requirement already satisfied: cryptography in /home/go/.pyenv/versions/3.9.1/lib/python3.9/site-packages 

are not deleted.
I am using git bash,but I guess that should not be a problem. Or maybe it is?
                                
                              
                            
                          
                          
                          
                        
                        
                        
                          
                            
                            MJoao
                            (47 rep)
                          
                          
                            
                            Jun 6, 2025, 10:35 AM
                            
                              • Last activity: Jun 9, 2025, 01:51 AM
                            
                          
                        
                      
                    
                  
                
              

              
              
            
              
                
                  
                    
                      
                        
                          
                            -2
                            votes
                          
                          
                            1
                            answers
                          
                          
                            68
                            views
                          
                        
                      
                      
                        
                          
                            How to extract an unknown number of file lines between two regex patterns?
                          
                          
                        
                        
                        
                          
                            
                              bash
                            
                          
                            
                              regular-expression
                            
                          
                        

                        
                        
                          
                            For example I have file with many of lines and it has such part ... pattern1 line1 line2 line3 pattern2 ... how can I extract `line[1-3]` via one-liner command (`awk` or something like that)?
                          
                          
                          
                          
                            
                              
                                
                                  For example I have file with many of lines and it has such part

    ...
    pattern1
    line1
    line2
    line3
    pattern2
    ...

how can I extract line[1-3] via one-liner command (awk or something like that)?
                                
                              
                            
                          
                          
                          
                        
                        
                        
                          
                            
                            Aleksey
                            (57 rep)
                          
                          
                            
                            Jun 3, 2025, 11:47 AM
                            
                              • Last activity: Jun 8, 2025, 06:23 PM
                            
                          
                        
                      
                    
                  
                
              

              
              
            
              
                
                  
                    
                      
                        
                          
                            3
                            votes
                          
                          
                            3
                            answers
                          
                          
                            5814
                            views
                          
                        
                      
                      
                        
                          
                            Regex search in PDF reader
                          
                          
                        
                        
                        
                          
                            
                              regular-expression
                            
                          
                            
                              pdf
                            
                          
                            
                              zathura
                            
                          
                            
                              documents
                            
                          
                            
                              pdfgrep
                            
                          
                        

                        
                        
                          
                            I am using zathura, as I enjoy its minimalist approach, but I would also switch to mupdf or anything else if this would solve my problem. I need to highlight every word (in PDF and epub documents) one by one from start to finish because I can concentrate better on the text if I have some kind of mot...
                          
                          
                          
                          
                            
                              
                                
                                  I am using zathura, as I enjoy its minimalist approach, but I would also switch to mupdf or anything else if this would solve my problem.

I need to highlight every word (in PDF and epub documents) one by one from start to finish because I can concentrate better on the text if I have some kind of motion in it. My approach would have been to perform a regex search that matches every word, but neither zathura nor mupdf support regex in searches. Is there a way to do this? 

I would try to fork zathura but to be honest I don't really want to spend that amount of time if there is another minimal Gnu/Linux compatible document viewer that does what I need. And if there is any way to use terminal tools like pdfgrep for highlighting the results in zathura that would also do the job.
                                
                              
                            
                          
                          
                          
                        
                        
                        
                          
                            
                            luca
                            (152 rep)
                          
                          
                            
                            Mar 29, 2020, 03:38 PM
                            
                              • Last activity: Jun 6, 2025, 09:30 AM
                            
                          
                        
                      
                    
                  
                
              

              
              
                
                  
                
              
            
              
                
                  
                    
                      
                        
                          
                            0
                            votes
                          
                          
                            1
                            answers
                          
                          
                            632
                            views
                          
                        
                      
                      
                        
                          
                            regex pattern issue for digit validation in ksh
                          
                          
                        
                        
                        
                          
                            
                              regular-expression
                            
                          
                        

                        
                        
                          
                            I was writing a ksh script to validate the column is numerical. The regex pattern is defined in a config file like `\d+.\d+`. But this is not working when I use `d` pattern. However `[0-9]{1,9}` is working. Any insights into this? * Here is the ksh version I am using: ``` $ ksh --version version sh...
                          
                          
                          
                          
                            
                              
                                
                                  I was writing a ksh script to validate the column is numerical. The regex pattern is defined in a config file like \d+.\d+. But this is not working when I use d pattern. However [0-9]{1,9} is working. Any insights into this?

* Here is the ksh version I am using:
    $ ksh --version
      version         sh (AT&T Research) 93u+ 2012-08-01
* Code snippet for the pattern comparison. If I provide $col_patt as \d+ it would not work but [0-9]{1,} will work 
    val=$(awk -F "$sep" -v n="$col_pos" -v m="$col_patt" 'NR!=1 && $n !~ "^" m "$" {
                             printf "%s:%s:%s\n", FILENAME, FNR, $n > "/dev/stderr"
                             count++
                           }
                           END {print count+0}' "$cp_input" 2>> $script_path/errors_${file_name_patt}.log
                           )
* Here is the pattern used: \d*\.\d+
                                
                              
                            
                          
                          
                          
                        
                        
                        
                          
                            
                            daturm girl
                            (49 rep)
                          
                          
                            
                            Jun 15, 2021, 02:06 PM
                            
                              • Last activity: Jun 5, 2025, 05:16 PM
                            
                          
                        
                      
                    
                  
                
              

              
              
            
              
                
                  
                    
                      
                        
                          
                            -1
                            votes
                          
                          
                            3
                            answers
                          
                          
                            159
                            views
                          
                        
                      
                      
                        
                          
                            grep capture from beginning until first 2 chars found
                          
                          
                        
                        
                        
                          
                            
                              bash
                            
                          
                            
                              grep
                            
                          
                            
                              regular-expression
                            
                          
                        

                        
                        
                          
                            I have this list: ```Bash list="aa bb cc dd ee ff ab cd ef" ``` What I'm trying so far: ```Bash $ grep -o "^[^cd]*" <<<"$list" aa bb ``` Expected output: ```Bash $ grep -o "^[^cd]*" <<<"$list" aa bb cc dd ee ff ab ```
                          
                          
                          
                          
                            
                              
                                
                                  I have this list:
list="aa bb cc dd ee ff ab cd ef"

What I'm trying so far:
$ grep -o "^[^cd]*" <<<"$list"
aa bb

Expected output:
$ grep -o "^[^cd]*" <<<"$list"
aa bb cc dd ee ff ab
                                
                              
                            
                          
                          
                          
                        
                        
                        
                          
                            
                            Zero
                            (39 rep)
                          
                          
                            
                            May 25, 2025, 07:47 PM
                            
                              • Last activity: May 31, 2025, 11:54 PM
                            
                          
                        
                      
                    
                  
                
              

              
              
            
              
                
                  
                    
                      
                        
                          
                            0
                            votes
                          
                          
                            4
                            answers
                          
                          
                            4145
                            views
                          
                        
                      
                      
                        
                          
                            sed: regex input buffer length larger than INT_MAX
                          
                          
                        
                        
                        
                          
                            
                              text-processing
                            
                          
                            
                              sed
                            
                          
                            
                              scripting
                            
                          
                            
                              regular-expression
                            
                          
                        

                        
                        
                          
                            I have a big file to which I am doing various operations, and this error just came up. I tried googling it but there didn't find any result with this. sed: regex input buffer length larger than INT_MAX My purpose is to quote every line, appending a comma, and subsequently enclose the entirety of the...
                          
                          
                          
                          
                            
                              
                                
                                  I have a big file to which I am doing various operations, and this error just came up. I tried googling it but there didn't find any result with this.

    sed: regex input buffer length larger than INT_MAX

My purpose is to quote every line, appending a comma,
and subsequently enclose the entirety of the file with square brackets
(as a single line). 
For example, an input of

    The quick brown fox
    jumps over
    the lazy dog.

should yield a result of

    ["The quick brown fox","jumps over","the lazy dog.",]

Assume that the input file doesn’t contain any quotes.

The code I run is this:

    cat "${FILE}" | sed -e 's/.*/"&",/' | sponge "${FILE}"

    truncate --size=-1 "${FILE}"

    cat "${FILE}" |  sed -z 's/.*/[&]/' | tr --delete '\n' | sponge "${FILE}"

sed version:

    sed --version
    sed (GNU sed) 4.5


Any thoughts?
                                
                              
                            
                          
                          
                          
                        
                        
                        
                          
                            
                            Chris
                            (141 rep)
                          
                          
                            
                            Jul 4, 2018, 11:35 PM
                            
                              • Last activity: May 28, 2025, 05:43 PM
                            
                          
                        
                      
                    
                  
                
              

              
              
                
                  
                
              
            
              
                
                  
                    
                      
                        
                          
                            50
                            votes
                          
                          
                            8
                            answers
                          
                          
                            51931
                            views
                          
                        
                      
                      
                        
                          
                            Non-greedy match with SED regex (emulate perl's .*?)
                          
                          
                        
                        
                        
                          
                            
                              text-processing
                            
                          
                            
                              sed
                            
                          
                            
                              regular-expression
                            
                          
                        

                        
                        
                          
                            I want to use `sed` to replace anything in a string between the first `AB` and the *first* occurrence of `AC` (inclusive) with `XXX`. For **example**, I have this string (this string is for a test only): ssABteAstACABnnACss and I would like output similar to this: `ssXXXABnnACss`. ---- I did this wi...
                          
                          
                          
                          
                            
                              
                                
                                  I want to use sed to replace anything in a string
between the first AB and the *first* occurrence of AC (inclusive)
with XXX.

For **example**, I have this string (this string is for a test only):

    ssABteAstACABnnACss

and I would like output similar to this: ssXXXABnnACss.

----
I did this with perl:

    $ echo 'ssABteAstACABnnACss' | perl -pe 's/AB.*?AC/XXX/'
    ssXXXABnnACss

------
but I want to implement it with sed.
The following (using the Perl-compatible regex) does not work:

    $ echo 'ssABteAstACABnnACss' | sed -re 's/AB.*?AC/XXX/'
    ssXXXss
                                
                              
                            
                          
                          
                          
                        
                        
                        
                          
                            
                            Baba
                            (3479 rep)
                          
                          
                            
                            Jul 22, 2016, 10:30 PM
                            
                              • Last activity: May 27, 2025, 08:14 PM
                            
                          
                        
                      
                    
                  
                
              

              
              
            
              
                
                  
                    
                      
                        
                          
                            1
                            votes
                          
                          
                            4
                            answers
                          
                          
                            2854
                            views
                          
                        
                      
                      
                        
                          
                            Matching both <space> and <tab> in a line with multiple coloumns in unix
                          
                          
                        
                        
                        
                          
                            
                              grep
                            
                          
                            
                              regular-expression
                            
                          
                        

                        
                        
                          
                            There are 200 plus files in a folder where some of the files consisting the following pattern in their records `ABCD ,EFGH, ,`. Without amending or replacing it, I just want to know the number of files with this format.
                          
                          
                          
                          
                            
                              
                                
                                  There are 200 plus files in a folder where some of the files consisting the following pattern in their records

 ABCD,EFGH,,. 

Without amending or replacing it, I just want to know the number of files with this format. 
                                
                              
                            
                          
                          
                          
                        
                        
                        
                          
                            
                            gobinathk2390
                            (11 rep)
                          
                          
                            
                            Aug 16, 2018, 07:40 AM
                            
                              • Last activity: May 22, 2025, 09:03 AM
                            
                          
                        
                      
                    
                  
                
              

              
              
            
              
                
                  
                    
                      
                        
                          
                            71
                            votes
                          
                          
                            7
                            answers
                          
                          
                            28225
                            views
                          
                        
                      
                      
                        
                          
                            Replacing Multiple blank lines with a single blank line in vim / sed
                          
                          
                        
                        
                        
                          
                            
                              vim
                            
                          
                            
                              sed
                            
                          
                            
                              regular-expression
                            
                          
                        

                        
                        
                          
                            Question more or less says it all. I'm aware that `/^$/d` will remove all blank lines, but I can't see how to say 'replace two or more blank lines with a single blank line' Any ideas?
                          
                          
                          
                          
                            
                              
                                
                                  Question more or less says it all. I'm aware that /^$/d will remove all blank lines, but I can't see how to say 'replace two or more blank lines with a single blank line'

Any ideas?
                                
                              
                            
                          
                          
                          
                        
                        
                        
                          
                            
                            Andrew Bolster
                            (1005 rep)
                          
                          
                            
                            May 7, 2011, 08:19 PM
                            
                              • Last activity: May 16, 2025, 08:23 AM
                            
                          
                        
                      
                    
                  
                
              

              
              
                
                  
                
              
            
              
                
                  
                    
                      
                        
                          
                            0
                            votes
                          
                          
                            1
                            answers
                          
                          
                            1026
                            views
                          
                        
                      
                      
                        
                          
                            bash + grep only the time stamp from output
                          
                          
                        
                        
                        
                          
                            
                              linux
                            
                          
                            
                              grep
                            
                          
                            
                              regular-expression
                            
                          
                            
                              perl
                            
                          
                            
                              date
                            
                          
                        

                        
                        
                          
                            we want to capture the time from the output and not the time from the command sshpass expected output Sun Jul 14 12:47:49 UTC 2019 Sun Jul 14 12:47:49 UTC 2019 Sun Jul 14 12:47:49 UTC 2019 but from the command we get: sshpass -p customer pssh -H "presto01 presto02 presto03" -l root -A -i "date" | gr...
                          
                          
                          
                          
                            
                              
                                
                                  we want to capture the time from the output
and not the time from the command sshpass

expected output

    Sun Jul 14 12:47:49 UTC 2019
    Sun Jul 14 12:47:49 UTC 2019
    Sun Jul 14 12:47:49 UTC 2019

but from the command we get:

    sshpass -p customer pssh  -H   "presto01 presto02 presto03"  -l root -A -i "date"  | grep '[0-9][0-9]:[0-9][0-9]:[0-9][0-9]'
     12:45:45 [SUCCESS] presto01
    Sun Jul 14 12:45:45 UTC 2019
     12:45:45 [SUCCESS] presto03
    Sun Jul 14 12:45:45 UTC 2019
     12:45:45 [SUCCESS] presto02
    Sun Jul 14 12:45:45 UTC 2019


we can do

    sshpass -p customer pssh  -H   "presto01 presto02 presto03"  -l root -A -i "date"  | grep '[0-9][0-9]:[0-9][0-9]:[0-9][0-9]' | grep -v "^\["
    Sun Jul 14 12:50:24 UTC 2019
    Sun Jul 14 12:50:24 UTC 2019
    Sun Jul 14 12:50:24 UTC 2019

but this is ugly way
                                
                              
                            
                          
                          
                          
                        
                        
                        
                          
                            
                            yael
                            (13936 rep)
                          
                          
                            
                            Jul 14, 2019, 12:50 PM
                            
                              • Last activity: May 15, 2025, 10:57 PM
                            
                          
                        
                      
                    
                  
                
              

              
              
            
              
                
                  
                    
                      
                        
                          
                            3
                            votes
                          
                          
                            3
                            answers
                          
                          
                            242
                            views
                          
                        
                      
                      
                        
                          
                            Sed - Help Matching with Multi-Line Match
                          
                          
                        
                        
                        
                          
                            
                              sed
                            
                          
                            
                              regular-expression
                            
                          
                        

                        
                        
                          
                            I am fairly new to regexes and it appears I am missing something in my understanding. I have the following file, and I am trying to insert a value if it is missing. The file looks like ```ini [composefs] enabled = yes [sysroot] readonly = true ``` I would like to insert the following only if it is a...
                          
                          
                          
                          
                            
                              
                                
                                  I am fairly new to regexes and it appears I am missing something in my understanding. I have the following file, and I am trying to insert a value if it is missing.

The file looks like
[composefs]
enabled = yes
[sysroot]
readonly = true
I would like to insert the following only if it is absent
[etc]
transient = true
My sed search and replace looks like

    sed -zi '/^[etc]\n^transient = true/!s/$/[etc]\ntransient = true/' file

When I run sed it adds the value as expected, but it keeps adding the value un subsequent runs so I assume my matching is not working. 

How can I do this correctly? 
                                
                              
                            
                          
                          
                          
                        
                        
                        
                          
                            
                            adam
                            (33 rep)
                          
                          
                            
                            May 10, 2025, 12:07 PM
                            
                              • Last activity: May 14, 2025, 12:12 PM


          
          

          
          
            
              
                
                  
                    Previous
                  
                
                
                
                  Page 1
                
                
                
                  
                    
                      Next
                    
                  
                
              
            
            
            
              
              Showing page 1 of 20 total questions