Sample Header Ad - 728x90

csplit multiple files into multiple files

2 votes
3 answers
2036 views
folks- I'm a bit stumped, on this one. I'm trying to write a bash script that will use csplit to take multiple input files and split them according to the same pattern. (For context: I have multiple TeX files with questions in them, separated by the \question command. I want to extract each question into their own file.) The code I have so far: #!/bin/bash # This script uses csplit to run through an input TeX file (or list of TeX files) to separate out all the questions into their own files. # This line is for the user to input the name of the file they need questions split from. read -ep "Type the directory and/or name of the file needed to split. If there is more than one file, enter the files separated by a space. " files read -ep "Type the directory where you would like to save the split files: " save read -ep "What unit do these questions belong to?" unit # This is a check for the user to confirm the file list, and proceed if true: echo "The file(s) being split is/are $files. Please confirm that you wish to split this file, or cancel." select ynf in "Yes" "No"; do case $ynf in No ) exit;; Yes ) echo "The split files will be saved to $save. Please confirm that you wish to save the files here." select ynd in "Yes" "No"; do case $ynd in Yes ) # This line will create a loop to conduct the script over all the files in the list. for i in ${files[@]} do # Mass re-naming is formatted to give "guestion###.tex' to enable processing a large number of questions quickly. # csplit is the utility used here; run "man csplit" to learn more of its functionality. # the structure is "csplit [name of file] [output options] [search filter] [separator(s)]. # this script calls csplit, will accept the name of the file in the argument, searches the files for calls of "question", splits the file everywhere it finds a line with "question", and renames it according to the scheme [prefix]#[suffix] (the %03d in the suffix-format is what increments the numbering automatically). # the '\\question' allows searching for \question, which eliminates the split for \end{questions}; eliminating the \begin{questions} split has not yet been understood. csplit $i --prefix=$save'/'$unit'q' --suffix-format='%03d.tex' /'\\question'/ '{*}' done; exit;; No ) exit;; esac done esac done return I can confirm it does do the loop as I intended for the input files I have. However, the behavior I'm noticing is that it'll split the first file into "q1.tex q2.tex q3.tex" as expected, and when it moves on to the next file in the list, it'll split the questions and overwrite the old files, and the third file it will overwrite the second file's splits, etc. What I would like to happen is that, say, if File1 has 3 questions, it will output: q1.tex q2.tex q3.tex And then if File2 has 4 questions, it will then continue incrementing to: q4.tex q5.tex q6.tex q7.tex Is there a way for csplit to detect the numbering that has already been done in this loop, and increment appropriately? Thanks for any help you folks can offer!
Asked by Wayne (35 rep)
Jan 3, 2020, 01:35 PM
Last activity: Jan 5, 2020, 07:06 PM