Extract text starting at specific category header to next category header from a text file

3 votes

3 answers

1046 views

I have a TOML file in the following format (categories may have any name, the sequential numbering is just an example and not guaranteed):

[CATEGORY_1]
A=1
B=2

[CATEGORY_2]
C=3
D=4

E=5

...

[CATEGORY_N]
Z=26

What I want to achieve is to retrieve the text inside a given category. So, if I specify, let's say, [CATEGORY_1] I want it to give me the output:

A=1
B=2

I tried using grep to achieve this task, with the z flag, so it could interpret newlines as null-byte characters and using this regular expression:

(^\[.*])             # Match the category 
  ((.*\n*)+?         # Match the category content in a non-greedy way
    (?=\[|$))        # Lookahead to the start of other category or end of line

It wasn't working unless I removed the ^ at beginning of the expression. However, if I do this, it will misinterpret loose pairs of brackets as a category. Is there a way to do it correctly? If not with grep, with other tool, such as sed or awk.

Asked by Educorreia (225 rep)

Jul 29, 2021, 10:51 AM
Last activity: Aug 25, 2021, 06:24 PM

Extract text starting at specific category header to next category header from a text file

Related Questions