csplit not recognizing provided regexp

5 votes

2 answers

1186 views

                          I'm working on this big file (**DATA.DAT**, ~900MB) which contains several other files. It's from a PS2 game.

Sound samples (which are in **.AIFF** format), precisely what I'm after, make up most of its size.

After searching the web for PS2 **.DAT** extractors I found out that they're basically developer dependent and since this game/tool is rather obscure and not finding much about it online, I thought about automating the process myself.

Inspecting the file on a hex editor I came across some **.AIFF** headers, cloned the chunks to new **.AIFF** files and without any further work, they were playable.

Having spent a while getting the rust out of my VERY limited bash knowledge and having read similar questions here, I came up with this expression:

    gcsplit -f "sample-" -b "%04d.aif" DATA.DAT /FORM/ '{*}'

(I'm on OSX using coreutils, hence the g- prefix on csplit)

Given that **.AIFF** files start with the string "FORM" and given that basically all samples in the file are next to each other (spaced apart by disregardable amounts of data that won't generate unwanted end noise on the samples), I thought that the regexp

    /FORM/

would suffice to split the files up.

However, every split file is being output with junk data that sits in between sound samples before the **.AIFF** header, rendering it unplayable. 

Screenshots of the hex data of a split sound sample below:

This actual sample begins roughly around the 1500 bytes mark:

What's making this expression split the files with an offset?

Asked by João (53 rep)

Nov 26, 2017, 04:02 AM
Last activity: Nov 26, 2017, 08:15 PM

csplit not recognizing provided regexp

Related Questions