Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

-2 votes

1 answers

73 views

Parse txt file on basis of occurrence of a tag in Linux

I am trying to parse a txt file containing xml "messages"in linux, something like this ``` xyz xyz xyz xyz and so on ``` The code will read file, extract each section from ``` ``` till ``` ``` and put each section into a separate file. My code for this is as below ``` input_file="input_file.txt" # E...

I am trying to parse a txt file containing xml "messages"in linux, something like this

xyz     xyz     xyz     xyz   and so on

The code will read file, extract each section from

till

and put each section into a separate file. My code for this is as below

input_file="input_file.txt"

# Extracting Document parts
sed -n '//p' "$input_file" > temp_output.txt

# Splitting into Different Files
csplit -f output -b %d.txt -z temp_output.txt '//' '{*}'

# Cleaning up temporary files
rm temp_output.txt

However, this code is extracting several xml messages into one file, particularly the ones with no line break. Could someone suggest what can be rectified in the above code?

python6 (1 rep)

May 15, 2024, 01:29 PM • Last activity: May 15, 2024, 02:09 PM

3 votes

1 answers

8509 views

Split a text file into multiple files, beyond the {99} limit of csplit

osx csplit

I'd like to split the contents of a .txt file into multiple files, but I'm encountering two questions about limitations of csplit: (1) can anyone offer a way around csplit's maximum limit of '99' file splits? I have a file with up to 384 splits based on a recurring blank line or character. I'd like...

                                  I'd like to split the contents of a .txt file into multiple files, but I'm encountering two questions about limitations of csplit:

(1) can anyone offer a way around csplit's maximum limit of '99' file splits?  I have a file with up to 384 splits based on a recurring blank line or character.  I'd like csplit to be able to accomodate this with {*}, but this exceeds csplit's intrinsic file generation capacity.

(2) does anyone know of a way to pass the contents of a file to csplit (pipe to csplit), or can csplit only be used in its conventional way of calling a file in place?  i.e. csplit -f split_name file_to_split.txt /split/ {*} vs. [series of commands] | csplit -f split_name /split/ {*}

Thank you for any suggestions, or alternatives to accomplish a similar task.

kehmsen (59 rep)

Mar 25, 2016, 10:42 PM • Last activity: Feb 26, 2024, 07:10 PM

1 votes

2 answers

151 views

Split file into specific output filenames by pattern match

csplit

I have a file with this content: # new file text in file 1 # new file text in file 2 # new file text in file 3 The pattern here is `# new file`. I instead of saving each file to xx00, xx01 and xx02, save to specific files: `another file`, `file new`, `last one`. The 3 files exist in current director...

                                  I have a file with this content:

    # new file
    text in file 1
    # new file
    text in file 2
    # new file
    text in file 3

The pattern here is # new file.

I instead of saving each file to xx00, xx01 and xx02, save to specific files: another file, file new, last one.

The 3 files exist in current directory, so I want to provide them as array, overwrite them:

    csplit -z infile '/# new file/' "${array[*]}"

The array can be provided directly

    array=('another file' 'file new' 'last one')
    echo ${array[*]}
    another file file new last one

Or list current directory

    array=($(find . -type f))
    echo ${array[*]}
    ./another file ./file new ./last one

A modification of this script  could be the solution:

    awk -v file="1" -v occur="2" '
    {
      print > (file".txt")
    }
    /^\$\$\$\$$/{
      count++
      if(count%occur==0){
        if(file){
          close(file".txt")
          ++file
        }
      }
    }
    '  Input_file
                                

Smeterlink (295 rep)

Dec 12, 2023, 06:47 AM • Last activity: Dec 13, 2023, 11:14 AM

0 votes

1 answers

300 views

using csplit to split a file based on a regular expression to multiple files

csplit

I have a text file that has the contents of the example below, and I would like to split the file to multiple files. ``` [TXT] /path/to/[TXT] [BAT] /path/to/[BAT] [TXT] /path/to/blah/[TXT] [BAT] /path/to/blah/[BAT] ``` So I have figured out I can use `csplit` to at least partially do what I wanted t...

I have a text file that has the contents of the example below, and I would like to split the file to multiple files.

[TXT]	/path/to/[TXT]
[BAT]	/path/to/[BAT]
[TXT]	/path/to/blah/[TXT]
[BAT]	/path/to/blah/[BAT]

So I have figured out I can use csplit to at least partially do what I wanted to achieve. csplit -f 'paths-' -b '%04d.txt' 'path/to/filelist.txt' '/^\[(.*)]\t/' '{*}' However this splits to paths-0000.txt. I was hoping for something more like paths-txt.txt and paths-bat.txt. Is there anyway I can get the regex match into the prefix match at all? I did try things like -f 'paths-$1.txt' and -f 'paths-\1.txt'. But neither of those did what I was hoping for them to do.

AeroMaxx (227 rep)

Jul 12, 2023, 11:21 PM • Last activity: Jul 13, 2023, 04:31 AM

4 votes

1 answers

4497 views

How do I use modern coreutils on Mac?

macos coreutils csplit

How do I get modern coreutils on mac? --- I ran into this problem using `csplit`: `foo.txt`: ``` foo 1 foo 2 foo 3 ``` ``` $: csplit foo '^foo$' '{*}' # error ``` Double checking the `manpage`, `man csplit`, `csplit` on Mac is the FreeBSD version and does not offer the `'{*}'` option. In fact, I mus...

How do I get modern coreutils on mac? --- I ran into this problem using csplit: foo.txt:

foo
1
foo
2
foo
3

$: csplit foo '^foo$' '{*}'
# error

Double checking the manpage, man csplit, csplit on Mac is the FreeBSD version and does not offer the '{*}' option. In fact, I must provide the exact number of splits ahead of time. This will either trigger a czplit re-implementation by me, or maybe I can get GNU coreutils on mac. Is there a way?

Chris (1075 rep)

Dec 20, 2022, 03:32 PM • Last activity: Dec 20, 2022, 08:07 PM

3 votes

1 answers

1255 views

csplit regex with pipe (|)

text-processing regular-expression split csplit

                                  i want to split file by regular expression, i have file format as below 

    0|t| lorem ...
    some text 
    138|t| title 
    some text 

if i execute egrep "[0-9]+\|t\|" file | wc -l it counts occurrence correctly but if i execute csplit filename /[0-9]+\|t\|/ then it says no match found and does not split file. 

seems some issue with pipe in pattern but not able to figure out solution. 
                                

Jigar Parekh (133 rep)

Mar 30, 2017, 05:58 AM • Last activity: Oct 15, 2022, 08:52 PM

1 votes

2 answers

215 views

Divide a fasta file with scaffolds into same lenght files respecting the scaffold ID and the sequence

awk grep split csplit

I am currently working with a large fasta file (3.7GB) that has scaffolds in it. Each scaffold has a unique identifier that starts with `>` on the first line and on the consecutive line it has the DNA sequence like this: ``` >9999992:0-108 AAAGAATTGTATTCCCTCCAGGTAGGGGGGATAGTTGAGGGGATACATAG TGGGAAGGC...

I am currently working with a large fasta file (3.7GB) that has scaffolds in it. Each scaffold has a unique identifier that starts with > on the first line and on the consecutive line it has the DNA sequence like this:

>9999992:0-108
AAAGAATTGTATTCCCTCCAGGTAGGGGGGATAGTTGAGGGGATACATAG
TGGGAAGGCTTTTCATGCGGAGGGACTAGAATGTGCTCCCGACTGACAAA
GCAGCTTG
>9999993:0-118
AGGGACTAGAAATGAGATTAAAAAGAGTAAAAGCACTGATACAAGTACAA
AAACAAATTGCTTCACCTCCAAAACCCCAGAAACTGCCCCACTTGGCTCC
CATTTAACCTACCTTCAA
>9999994:0-113
CCATCCTCATCCTTTCCTCCCCATATCTTCCTCTGACCCCAAAGCTCAGG
TTTCCTGTCTTGTTTCCCAGAATCTGTACCTCATGGTAGTTAAACCTTCC
CCTCTGGCAGCCA
>9999997:0-87
AACATCCCTGTGGCCTGAGAGACTGCCAGCCACAGCGGTGACAGTCCCTG
CGAGAGGCTGCTGCAAAAAGACTGGAGAGAAAGCAGA
>9999998:0-100
AAACATCAGCGCCAAGTCCCCGAAACCAGCAGGGTCACTGGGCGGCCGGC
CTGAAATACCCCAGCAGGCCAGCAGTGCCGGGTGCCTGGGGAGGTGTCCT
>9999999:0-94
AAGAAACTTTTCCCTTAACCAATGAAGAGTTTTATGTAAAGGAAATTTAG
TAATTTTTTAAAAAATGGTAATGACAGATTTAAGTAATTTAATT

I want to split the file into small files preferably of the same length to process it, but I need to respect the ID and the sequence together, and obtain something like this:

file1.fa
>9999992:0-108
AAAGAATTGTATTCCCTCCAGGTAGGGGGGATAGTTGAGGGGATACATAG
TGGGAAGGCTTTTCATGCGGAGGGACTAGAATGTGCTCCCGACTGACAAA
GCAGCTTG
>9999993:0-118
AGGGACTAGAAATGAGATTAAAAAGAGTAAAAGCACTGATACAAGTACAA
AAACAAATTGCTTCACCTCCAAAACCCCAGAAACTGCCCCACTTGGCTCC
CATTTAACCTACCTTCAA

file2.fasta
>9999994:0-113
CCATCCTCATCCTTTCCTCCCCATATCTTCCTCTGACCCCAAAGCTCAGG
TTTCCTGTCTTGTTTCCCAGAATCTGTACCTCATGGTAGTTAAACCTTCC
CCTCTGGCAGCCA
>9999997:0-87
AACATCCCTGTGGCCTGAGAGACTGCCAGCCACAGCGGTGACAGTCCCTG
CGAGAGGCTGCTGCAAAAAGACTGGAGAGAAAGCAGA

file3.fasta
>9999998:0-100
AAACATCAGCGCCAAGTCCCCGAAACCAGCAGGGTCACTGGGCGGCCGGC
CTGAAATACCCCAGCAGGCCAGCAGTGCCGGGTGCCTGGGGAGGTGTCCT
>9999999:0-94
AAGAAACTTTTCCCTTAACCAATGAAGAGTTTTATGTAAAGGAAATTTAG
TAATTTTTTAAAAAATGGTAATGACAGATTTAAGTAATTTAATT

Please help me. I have tried to use csplit and grep but I get the wrong outputs.

Nadia Tamayo (13 rep)

Oct 13, 2022, 02:02 AM • Last activity: Oct 13, 2022, 10:45 AM

0 votes

1 answers

336 views

Help me understand a script that uses csplit and sed

sed coreutils csplit

I wanted a simple way to export notes from the reference manager, Zotero. I start by selecting multiple notes and dragging them into a blank text file. I also want achieve "atomicity" of my notes, so I need to split the resulting text files which contain the individual notes in sections separated by lines of dashes. I then want to use the heading I gave to each note to name the new files i.e.: rename with the first line of each section. I want to save these new files as markdown files. The script I have put together is made up of suggestions for each of these functions by contributors on the web. I am trying to make sure that I understand the commands in the script correctly before sharing it with colleagues who have a similar use case to mine. My understanding (from reading Gilles' answer to another question - see reference link below) of the need for quote marks around the "$f" in the "head" command does not seem to be correct. I tried the script without the quotes and got the same result. Are the double quotes not really needed because "$f" appears on the right-hand side of an assignment? Are they only there because it is easier to double quote by default than to remember when they are not needed? Any further explanation would be much appreciated. An example of the input file would be the following in Notes_test.txt

This is note 1

It has some notes

--------------------------------------------------

This is note 2

It has some more notes

The output from this should be two files:

This is note 1.md
This is note 2.md

This is the script I am using on the command line:

csplit Notes_test.txt -f_ -z -b'%03d.md' /--------------------------------------------------/1 {*} && sed -i '/./,$!d' *.md && for f in *.md
    do
    f1=$(head -n1 "$f")
    mv -n "$f" "$f1.md"
    done

and this is my understanding of the commands so far: -fPREFIX Use PREFIX as the output file name prefix. In this case an underscore is specified: "_" which I see is just a placeholder. -z Suppress the generation of zero-length output files. I think this is necessary because otherwise csplit will produce an empty file at the end of each run through splitting the original files. -bSuffix Use SUFFIX as the output file name suffix. In this case: "md" %03d puts a 3 digit number as a placeholder for the file name. I added the zero before the 3 at the suggestion of FelixJN. /--------------------------------------------------/1 specifies the delimiter for the split, with the split being made 2 lines below the the line of "-"s (count starts from 0). {\*} tells bash to run the split until the end of the file. As Felix points out, "{n}" is the number of splits to be executed. In this case "*" means do as many as possible. && means execute the following command on the condition that the previous command has completed sed -i directs sed to operate on files with a particular suffix '/./,$!d' means "remove blank lines at head of file" Thanks to Felix again for explaining that that this is to specify the range on which sed works: A "." means any character, so it specifies the first character that occurs in the document. Since empty lines do not have any characters, we will need to apply the negative "!" after defining the range. The range is defined by the pattern /"start"/,/"end"/ to apply the command between the strings "start" and "end". $ refers to the last line, so the range is all the non-empty lines in the document. To apply the negative use "!" meaning "NOT", i.e. tell sed to select the opposite of the previous range. In this case all lines before the first line with any character. "d" then deletes these lines. *.md means "which has any name with suffix .md" f1=$(head -n1 "$f") means: define f1 as the first line ("head" means "first line") of the file. This is done by using the variable signifier "$" to define "f1" which will be a placeholder (in the next line of the script) for the new file names (minus suffix). "head" is a bash command that normally outputs the first 10 lines of each file: head [OPTION]... [FILE]... The option -n1 specifies to output one line only. Here, instead of specifying a particular FILE, "$f" specifies "all files." The quote marks around "$f" are needed so that whitespace is ignored (otherwise $f uses whitespace as field separator and further splits the files - see reference link below). mv -n "$f" "$f1.md" means: rename each file as "f1.md" The bash command "mv" takes options and arguments: mv [OPTION]... [-T] SOURCE DEST i.e.: "Rename SOURCE to DEST." The -n option stands for --no-clobber "do not overwrite an existing file." I think this is just in case there are files (notes) that have the same first line. See https://www.tutorialspoint.com/unix_commands/csplit.htm and coreutils for unix-like operating stems at https://www.gnu.org/software/coreutils/manual/coreutils.pdf and https://www.howtoforge.com/linux-csplit-command/ Q2.How to split files using regular expressions? and https://unix.stackexchange.com/questions/131766/why-does-my-shell-script-choke-on-whitespace-or-other-special-characters https://unix.stackexchange.com/questions/68694/when-is-double-quoting-necessary

Christopher J Poor (3 rep)

Sep 6, 2021, 04:35 AM • Last activity: Sep 11, 2021, 09:43 PM

4 votes

2 answers

764 views

How to split a file into multiple file after N appearence of a pattern?

text-processing awk csplit

I have a file on Linux, containing the coordinates of thousands of molecules. Each molecule starts with a line containing always the same pattern: @ MOLECULE And then continues with other lines. I would like to split the file into multiple files, each containing a certain number of molecules. What i...

                                  I have a file on Linux, containing the coordinates of thousands of molecules. Each molecule starts with a line containing always the same pattern: 

    @MOLECULE

And then continues with other lines. 
I would like to split the file into multiple files, each containing a certain number of molecules.
What is the easiest way to do this?

ginopino (380 rep)

May 21, 2021, 09:04 AM • Last activity: May 22, 2021, 02:16 PM

0 votes

1 answers

1107 views

How to make csplit start outputing files with filenames starting from 001?

csplit

I use csplit to divide a complex file named ```file.docked.pdb``` to small files. ``` csplit -k -s -n 3 -f file.docked. file.docked.pdb '/^ENDMDL/+1' '{'7'}' ``` ```man csplit``` explains the following the code perfectly ``` NAME csplit - split a file into sections determined by context lines -k, --...

I use csplit to divide a complex file named

.docked.pdb

to small files.

csplit -k -s -n 3 -f file.docked. file.docked.pdb '/^ENDMDL/+1' '{'7'}'

csplit

explains the following the code perfectly

NAME
       csplit - split a file into sections determined by context lines


       -k, --keep-files
              do not remove output files on errors

      -s, --quiet, --silent
              do not print counts of output file sizes
      -n, --digits=DIGITS
              use specified number of digits instead of 2

       -f, --prefix=PREFIX
              use PREFIX instead of 'xx'

   Each PATTERN may be:
      

       /REGEXP/[OFFSET]
              copy up to but not including a matching line

       {*}    repeat the previous pattern as many times as possible

My doubt is that the output files are starting to be named from

.docked.000

and extending forward How to make the numbering start from

.docked.001

??? If the tooling does not support this at all, please give a workaround.

Praveen Kumar-M (622 rep)

May 31, 2020, 03:55 PM • Last activity: Jun 1, 2020, 02:08 AM

2 votes

3 answers

2036 views

csplit multiple files into multiple files

bash for csplit

folks- I'm a bit stumped, on this one. I'm trying to write a bash script that will use csplit to take multiple input files and split them according to the same pattern. (For context: I have multiple TeX files with questions in them, separated by the \question command. I want to extract each question...

                                  folks-

I'm a bit stumped, on this one. I'm trying to write a bash script that will use csplit to take multiple input files and split them according to the same pattern. (For context: I have multiple TeX files with questions in them, separated by the \question command. I want to extract each question into their own file.)

The code I have so far:

    #!/bin/bash
    # This script uses csplit to run through an input TeX file (or list of TeX files) to separate out all the questions into their own files.
    # This line is for the user to input the name of the file they need questions split from.
    
    read -ep "Type the directory and/or name of the file needed to split. If there is more than one file, enter the files separated by a space. " files
    
    read -ep "Type the directory where you would like to save the split files: " save
    
    read -ep "What unit do these questions belong to?" unit
    
    # This is a check for the user to confirm the file list, and proceed if true:
    
    echo "The file(s) being split is/are $files. Please confirm that you wish to split this file, or cancel."
    select ynf in "Yes" "No"; do
    	case $ynf in 
    		No ) exit;;
    		Yes ) echo "The split files will be saved to $save. Please confirm that you wish to save the files here."
    			select ynd in "Yes" "No"; do
    			case $ynd in
    				Yes )
    #					This line will create a loop to conduct the script over all the files in the list.
    					for i in ${files[@]}
    					do
    #					Mass re-naming is formatted to give "guestion###.tex' to enable processing a large number of questions quickly.
    #					csplit is the utility used here; run "man csplit" to learn more of its functionality.
    #					the structure is "csplit [name of file] [output options] [search filter] [separator(s)].
    #					this script calls csplit, will accept the name of the file in the argument, searches the files for calls of "question", splits the file everywhere it finds a line with "question", and renames it according to the scheme [prefix]#[suffix] (the %03d in the suffix-format is what increments the numbering automatically).
    #					the '\\question' allows searching for \question, which eliminates the split for \end{questions}; eliminating the \begin{questions} split has not yet been understood.
    						csplit $i --prefix=$save'/'$unit'q' --suffix-format='%03d.tex' /'\\question'/ '{*}'
    					done; exit;;
    				No ) exit;;
    			esac
    		done
    	esac
    done
    
    return

I can confirm it does do the loop as I intended for the input files I have. However, the behavior I'm noticing is that it'll split the first file into "q1.tex q2.tex q3.tex" as expected, and when it moves on to the next file in the list, it'll split the questions and overwrite the old files, and the third file it will overwrite the second file's splits, etc. What I would like to happen is that, say, if File1 has 3 questions, it will output:

    q1.tex
    q2.tex
    q3.tex

And then if File2 has 4 questions, it will then continue incrementing to:

    q4.tex
    q5.tex
    q6.tex
    q7.tex

Is there a way for csplit to detect the numbering that has already been done in this loop, and increment appropriately?

Thanks for any help you folks can offer!
                                

Wayne (35 rep)

Jan 3, 2020, 01:35 PM • Last activity: Jan 5, 2020, 07:06 PM

0 votes

1 answers

895 views

Split file into n files using csplit (or similar tool)

split csplit

I have a huge file with the following pattern: ABC line 1 line 2 line 3 ABC line 1 line 2 ABC line1 ABC line 1 line 3 Using `csplit` tool I'm able to split the file above according to `/ABC/` pattern into 4 subfiles: csplit -z input.txt /ABC/ {*} I wonder how to manually specify the number of desire...

                                  I have a huge file with the following pattern:

    ABC
    line 1
    line 2
    line 3
    ABC
    line 1
    line 2
    ABC
    line1
    ABC
    line 1
    line 3

Using csplit tool I'm able to split the file above according to /ABC/ pattern into 4 subfiles:

    csplit -z input.txt /ABC/ {*}

I wonder how to manually specify the number of desired output files.
                                

Andrej (353 rep)

Dec 17, 2019, 06:41 AM • Last activity: Dec 17, 2019, 10:08 AM

2 votes

1 answers

550 views

Bash - extract an indented code block into new file

text-processing awk csplit

I have a bunch of [LilyPond](http://www.lilypond.org) files in the following format: \score { \new StaffGroup = "" \with { instrumentName = \markup { \bold \huge \larger "1." } } > \layout {} \midi {} } How would one extract the `\relative c {...}` block into a new file, so it would look like this:...

                                  I have a bunch of [LilyPond](http://www.lilypond.org)  files in the following format:
 
    \score {
      \new StaffGroup = "" \with {
        instrumentName = \markup { \bold \huge \larger "1." }
      }
      >
      \layout {}
      \midi {}
    }

How would one extract the \relative c {...} block into a new file, so it would look like this:


    \relative c {
      \clef bass
      \key c \major
      \time 3/4

      \tuplet 3/2 4 {
        c8(\downbow\f b c e g e)
      } c'4                                         | %01
      \tuplet 3/2 4 {c,8( b c e f a) } c4           | %02
      \tuplet 3/2 4 { g,8( d' f g f d) } b'4        | %03
    }

A fix of the indentation is not necessarily needed in this case. Would that be an awk or csplit task? What would it look like? 

                                

nath (6094 rep)

Dec 1, 2019, 08:24 PM • Last activity: Dec 2, 2019, 01:13 AM

4 votes

4 answers

892 views

text processing rows to columns for a block

text-processing awk solaris csplit

I have a file containing lists on Solaris: List A hi hello hw r u List B Hi Yes List C Hello I need to transpose the lists as shown below: List A List B List C hi Hi Hello hello Yes hw r u How can I do this on Solaris?

                                  I have a file containing lists on Solaris:

    List A
    hi
    hello
    hw r u

    List B
    Hi
    Yes

    List C
    Hello

I need to transpose the lists as shown below:
    
    List A    List B    List C
    hi        Hi        Hello
    hello     Yes
    hw r u
    
How can I do this on Solaris?
                                

John (51 rep)

Sep 7, 2017, 10:44 AM • Last activity: Apr 9, 2019, 10:12 AM

0 votes

1 answers

208 views

Splitting a file based on values next to matching pattern

awk csplit

I am having a file input.txt which include ~50,000 rows and ~100 column. I want to split is according to matching entry followed by the matching patter. File separator are both space and tab. input.txt #information #dateofcreation #file type AA BB CC DD EE FF GG HH II AA bb ac aD FF GG hg ad DA ga D...

                                  I am having a file input.txt which include ~50,000 rows and ~100 column. I want to split is according to matching entry followed by the matching patter. File separator are both space and tab.

input.txt

    #information  
    #dateofcreation  
    #file type
    AA	BB	CC DD EE FF GG HH II 
    AA	bb	ac aD FF GG hg ad 
    DA	ga	Dt pp Ee	FF gg pm	TT
    DA	bR	AT GT Gg	FF GG Hb	Yh
    NM	gt	Jh GT FF	hb TH KM MM

In the input file there a matching field FF in all the lines followed by the entry matches in some lines. I want to have three output file from this input file

GG.txt

    AA	BB	CC DD EE FF GG HH II
    AA	bb	ac aD FF GG hg ad
    DA	bR	AT GT Gg	FF GG Hb Yh

gg.txt

    DA	ga	Dt pp Ee	FF gg pm	TT

hb.txt

    NM	gt	Jh GT FF	hb TH KM MM

Thanks.

user3377241 (103 rep)

Oct 18, 2018, 08:42 PM • Last activity: Oct 18, 2018, 09:38 PM

1 votes

1 answers

1391 views

alternative to csplit - splitting after the pattern

csplit

I want to split a file after a delimiter, not before the delimiter, which is what csplit does. I can't find anything anywhere! (Also, why would there be a tool that specifically splits before a pattern, but not one that splits after it?) File: a b c d split at c output: file1: a b c file 2 d

                                  I want to split a file after a delimiter, not before the delimiter, which is what csplit does. I can't find anything anywhere! (Also, why would there be a tool that specifically splits before a pattern, but not one that splits after it?)

File:  
a  
b  
c  
d

split at c

output:
file1:  
a  
b  
c  

file 2  
d
                                

LizzAlice (113 rep)

Apr 26, 2018, 01:28 PM • Last activity: Apr 26, 2018, 01:48 PM

5 votes

2 answers

1186 views

csplit not recognizing provided regexp

osx csplit

I'm working on this big file (**DATA.DAT**, ~900MB) which contains several other files. It's from a PS2 game. Sound samples (which are in **.AIFF** format), precisely what I'm after, make up most of its size. After searching the web for PS2 **.DAT** extractors I found out that they're basically deve...

                                  I'm working on this big file (**DATA.DAT**, ~900MB) which contains several other files. It's from a PS2 game.

Sound samples (which are in **.AIFF** format), precisely what I'm after, make up most of its size.

After searching the web for PS2 **.DAT** extractors I found out that they're basically developer dependent and since this game/tool is rather obscure and not finding much about it online, I thought about automating the process myself.

Inspecting the file on a hex editor I came across some **.AIFF** headers, cloned the chunks to new **.AIFF** files and without any further work, they were playable.

Having spent a while getting the rust out of my VERY limited bash knowledge and having read similar questions here, I came up with this expression:

    gcsplit -f "sample-" -b "%04d.aif" DATA.DAT /FORM/ '{*}'

(I'm on OSX using coreutils, hence the g- prefix on csplit)

Given that **.AIFF** files start with the string "FORM" and given that basically all samples in the file are next to each other (spaced apart by disregardable amounts of data that won't generate unwanted end noise on the samples), I thought that the regexp

    /FORM/

would suffice to split the files up.

However, every split file is being output with junk data that sits in between sound samples before the **.AIFF** header, rendering it unplayable. 

Screenshots of the hex data of a split sound sample below:

This actual sample begins roughly around the 1500 bytes mark:

What's making this expression split the files with an offset?

João (53 rep)

Nov 26, 2017, 04:02 AM • Last activity: Nov 26, 2017, 08:15 PM

2 votes

2 answers

831 views

How to split a file based on context?

text-processing osx csplit

I have some files that contain the results of the `lldpneighbors` command from all our servers. I would like to split these files into individual files for each server in order to make it easier to import this data into our inventory system. **Sample Input** === Output from 00000000-0000-0000-0000-0...

                                  I have some files that contain the results of the lldpneighbors command from all our servers.  I would like to split these files into individual files for each server in order to make it easier to import this data into our inventory system.

**Sample Input**

    === Output from 00000000-0000-0000-0000-000000000000 (SERVERNAME1):
    Interface 'ixgbe0' has 1 LLDP Neighbors: 
    Neighbor 1:
    	Chassis ID: 		MAC Address - 00 01 02 03 04 05 
    	Port ID: 		Interface Name - TenGigabitEthernet 0/6
    	Time To Live: 		120 seconds
    	System Name: 		name-of-switch-01
    	End Of LLDPDU: 	
    Interface 'igb0' has 1 LLDP Neighbors: 
    Neighbor 1:
    	Chassis ID: 		MAC Address - 00 01 02 03 04 05 
    	Port ID: 		Interface Name - TenGigabitEthernet 0/23
    	Time To Live: 		120 seconds
    	System Name: 		name-of-switch-02
    	End Of LLDPDU: 	
    === Output from 00000000-0000-0000-0000-000000000000 (SERVERNAME2):
    Interface 'ixgbe0' has 1 LLDP Neighbors: 
    Neighbor 1:
    	Chassis ID: 		MAC Address - 00 01 02 03 04 05 
    	Port ID: 		Interface Name - TenGigabitEthernet 0/2
    	Time To Live: 		120 seconds
    	System Name: 		name-of-switch-01
    	End Of LLDPDU: 	
    Interface 'igb0' has 1 LLDP Neighbors: 
    Neighbor 1:
    	Chassis ID: 		MAC Address - 00 01 02 03 04 05 
    	Port ID: 		Interface Name - TenGigabitEthernet 0/19
    	Time To Live: 		120 seconds
    	System Name: 		name-of-switch-02
    	End Of LLDPDU: 

This is roughly what all the results look like with some variation(They are not all the same length, some are several lines longer because of more interfaces).  The delimiting string I would like to match on is:

    === Output from [UUID] ([HOSTNAME]):

Ideally I would like each file to be named the hostname(this would just be convenience and is not necessary), so above results would be split into files like:

**SERVERNAME1**

    === Output from 00000000-0000-0000-0000-000000000000 (SERVERNAME1):
    Interface 'ixgbe0' has 1 LLDP Neighbors: 
    Neighbor 1:
        Chassis ID:         MAC Address - 00 01 02 03 04 05 
        Port ID:        Interface Name - TenGigabitEthernet 0/6
        Time To Live:       120 seconds
        System Name:        name-of-switch-01
        End Of LLDPDU:  
    Interface 'igb0' has 1 LLDP Neighbors: 
    Neighbor 1:
        Chassis ID:         MAC Address - 00 01 02 03 04 05 
        Port ID:        Interface Name - TenGigabitEthernet 0/23
        Time To Live:       120 seconds
        System Name:        name-of-switch-02
        End Of LLDPDU: 
**SERVERNAME2**

    === Output from 00000000-0000-0000-0000-000000000000 (SERVERNAME2):
    Interface 'ixgbe0' has 1 LLDP Neighbors: 
    Neighbor 1:
        Chassis ID:         MAC Address - 00 01 02 03 04 05 
        Port ID:        Interface Name - TenGigabitEthernet 0/2
        Time To Live:       120 seconds
        System Name:        name-of-switch-01
        End Of LLDPDU:  
    Interface 'igb0' has 1 LLDP Neighbors: 
    Neighbor 1:
        Chassis ID:         MAC Address - 00 01 02 03 04 05 
        Port ID:        Interface Name - TenGigabitEthernet 0/19
        Time To Live:       120 seconds
        System Name:        name-of-switch-02
        End Of LLDPDU: 

I'm trying to use csplit to accomplish this but I'm not able to match the regex for some reason.   The commands I've tried:

    $ csplit jbutryn_us-west-a_neighbors %===.*:% '{20}'
    csplit: ===.*:: no match

    $ csplit jbutryn_us-west-a_neighbors /===.*:/ '{20}'
    552
    552
    552
    csplit: ===.*:: no match

    $ csplit jbutryn_us-west-a_neighbors '/===.*:/' '{20}'
    552
    552
    552
    csplit: ===.*:: no match

    $ csplit -ks -f test jbutryn_us-west-a_neighbors '/===.*:/' '{20}'
    csplit: ===.*:: no match

Any suggestions?
                                

jesse_b (41447 rep)

Aug 24, 2017, 03:44 PM • Last activity: Aug 24, 2017, 04:41 PM

1 votes

2 answers

87 views

Select the contents respective to some specific content from a file and move it to an output file

shell-script files oracle csplit

I have a file tnsnames.ora and its contents are as below. NEWDB = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = linuxerp.de.mph.com)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = linuxerp.de.mph.com)(PORT = 1550)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = newdb) ) ) LISTENER_DG11G...

                                  I have a file tnsnames.ora and its contents are as below.

    NEWDB =
      (DESCRIPTION =
        (ADDRESS = (PROTOCOL = TCP)(HOST = linuxerp.de.mph.com)(PORT = 1521))
        (ADDRESS = (PROTOCOL = TCP)(HOST = linuxerp.de.mph.com)(PORT = 1550))
        (CONNECT_DATA =
          (SERVER = DEDICATED)
          (SERVICE_NAME = newdb)
        )
      )
    
    LISTENER_DG11G =
      (ADDRESS_LIST =
        (ADDRESS = (PROTOCOL = TCP)(HOST = linuxerp.de.mph.com)(PORT = 1521))
        (ADDRESS = (PROTOCOL = TCP)(HOST = linuxerp.de.mph.com)(PORT = 1550))
      )
    
    LISTENER_SABDB =
      (ADDRESS_LIST =
        (ADDRESS = (PROTOCOL = TCP)(HOST = linuxerp.de.mph.com)(PORT = 1521))
        (ADDRESS = (PROTOCOL = TCP)(HOST = linuxerp.de.mph.com)(PORT = 1550))
      )
    
    STEST =
      (DESCRIPTION =
        (ADDRESS = (PROTOCOL = TCP)(HOST = linuxerp.de.mph.com)(PORT = 1521))
        (CONNECT_DATA =
          (SERVER = DEDICATED)
          (SERVICE_NAME = STEST)
        )
      )
    
    RBSDB =
      (DESCRIPTION =
        (ADDRESS = (PROTOCOL = TCP)(HOST = linuxerp.de.mph.com)(PORT = 1521))
        (CONNECT_DATA =
          (SERVER = DEDICATED)
          (SERVICE_NAME = RBSDB)
         )
        )

In the above file NEWDB = LISTENER_DG11G = LISTENER_SABDB = STEST = RBSDB = are the database names and the respective service  names are included in SERVICE_NAME =

So, From the above file I am trying to extract the Database name and respective service names and put it into a file or .xls in linux.

The output file should be like

    NEWDB   newdb
    STEST   STEST
    RBSDB   RBSDB

And what all databases that don't have service name should not be added into the output file.

I tried using CSPLIT and move the first set of lines  to a file "X" and select the first line and SERVICE_NAME using cat X | grep -i "SERVICE_NAME" | cut -d "=" -f2 | rev | cut -d ")" -f2 | rev | awk "NF"
and move it to a file and append the same way to rest of database names.

But it seems so complicated. Any other idea how it can be done will be appreciated.

                                

sabarish jackson (628 rep)

Mar 21, 2017, 04:00 AM • Last activity: Mar 21, 2017, 12:27 PM

7 votes

3 answers

2751 views

Exclude delimiter with csplit

csplit

Is it possible to remove the delimiter with csplit? Example: $ cat in abc --- def --- ghi $ csplit -q in /-/ '{*}' $ ls x* xx00 xx01 xx02 $ head xx* ==> xx00 xx01 xx02 xx00 xx01 xx02 <== ghi While it can be done in two steps as above, can it be done in one step? If it cannot be done with csplit, is...

                                  Is it possible to remove the delimiter with csplit? Example:

    $ cat in
    abc
    ---
    def
    ---
    ghi
    $ csplit -q in /-/ '{*}'
    $ ls x*
    xx00  xx01  xx02
    $ head xx*
    ==> xx00  xx01  xx02  xx00  xx01  xx02 <==
    ghi

While it can be done in two steps as above, can it be done in one step?

If it cannot be done with csplit, is there a one-step way that is shorter compared to the two invocations (csplit + sed) above? No preference to a tool used as long as it's reasonably readable.
                                

levant pied (231 rep)

May 5, 2016, 07:33 PM • Last activity: May 11, 2016, 01:31 PM

Showing page 1 of 20 total questions