Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

0 votes

3 answers

335 views

How do I extract some pages of a PDF into another PDF file?

I have a PDF file with multiple pages, and I want to write a command which extracts some of these pages into a new, separate PDF file; and the pages of interest are not necessarily a contiguous range. How do I do that? Notes: * It has to be a command I can put in a (shell) script, not a GUI applicat...

                                  I have a PDF file with multiple pages, and I want to write a command which extracts some of these pages into a new, separate PDF file; and the pages of interest are not necessarily a contiguous range. 

How do I do that?

Notes:

* It has to be a command I can put in a (shell) script, not a GUI application I can interact with to achieve this effect.
* The pages to be extracted do not necessarily form a contiguous range.
* If you want a concrete example, let's assume I want to extract pages 1 through 4 and page 6 of input file foo.pdf, with the result placed in bar.pdf.
                                

einpoklum (10753 rep)

May 26, 2025, 11:10 AM • Last activity: May 27, 2025, 03:01 AM

0 votes

1 answers

41 views

Spliting PDF while keeping index in the new file

pdf split pdftk

I have got a PDF file with many tomes in it. Because it contains a lot (>5,000) of pages I want to split it. I have used `pdftk` like this: ```bash pdftk input.pdf cat 487-2987 output second_tome.pdf ``` It works, but somehow `pdftk` doesn't put index in the output file. Because the content has many...

I have got a PDF file with many tomes in it. Because it contains a lot (>5,000) of pages I want to split it. I have used pdftk like this:

pdftk input.pdf cat 487-2987 output second_tome.pdf

It works, but somehow pdftk doesn't put index in the output file. Because the content has many chapters I would like it to keep index, so I could quickly skip to a chapter in my PDF viewer. I tried gs, but it behaves similar to pdftk: it doesn't write index. And works very slow. I tried qpdf, which **do** keep the index, but it puts the *entire* index of the input PDF, which results in the output file having information of all old contents. Also, if (like in the example above) I want to separate a range of pages, the "first" page in the output PDF will not start from 1. Is there any way to do a split with an index?

Felix.leg (103 rep)

May 24, 2025, 10:35 AM • Last activity: May 24, 2025, 12:54 PM

7 votes

3 answers

3214 views

Tar piped to split piped to scp

bash shell tar scp split

So I'm trying to transfer a bunch of files via SCP. Some of these are too large to be stored on the recipient (Android phone, 4GB file size limit). The sender is almost out of space, so I can't create intermediate files locally. I'd like to tar up the bunch and stream it through split so that I can...

                                  So I'm trying to transfer a bunch of files via SCP. Some of these are too large to be stored on the recipient (Android phone, 4GB file size limit).

The sender is almost out of space, so I can't create intermediate files locally.

I'd like to tar up the bunch and stream it through split so that I can get smaller segments that'll be accepted by the phone, i.e. local command:

    tar -cvf - ~/batch/ | split --bytes=1024m - batch.tar.seg

But I'm not sure how I'd pipe that into scp to get it the phone. According the comment on [this post](http://gnuru.org/article/1522/copying-with-scp-stdin) , it's possible, but I first of all don't quite get what he's saying, second of all I'm not sure how to accomplish this as there'll be multiple files output from split.

Any ideas?

FlamingKitties (5029 rep)

Feb 8, 2013, 07:06 AM • Last activity: May 9, 2025, 10:53 AM

-1 votes

2 answers

62 views

Split string with 0-2 / (or determine there's none) (Bash)

bash shell-script string split

Update: Up to 2 "/" in the string. String structure is either: Character set name/LF Character set name/CRLF Character set name/CRLF/(unknown purpose, likely a number) Character set name Example: "UTF-8/CRLF" "UCS-2/CRLF/21" That is there may be only Character set name (unknown beforehand) without a...

                                  Update: Up to 2 "/" in the string.

String structure is either:

Character set name/LF

Character set name/CRLF
Character set name/CRLF/(unknown purpose, likely a number)

Character set name

Example: "UTF-8/CRLF"

"UCS-2/CRLF/21"

That is there may be only Character set name (unknown beforehand) without any "/" separator.

Character set name may contain "-" and "_" (no need to separate here).

Need to assign to:

VAR1=Character set name

VAR2=CRLF or LF part between 1st "/" and 2nd "/" (or empty string if there's no "/").

VAR3=Remainer after 2nd "/".

 Some kind of true/false (0/1) for VAR2 is OK also (it will be processed with if/else later in script).

Tried cut -d/ -f, but cut -d/ -f 2 returns "Character set name" even **if there's no "/"**, so it doesn't work for me.

For **Bash** script a **Faster** solution is preferred as it will be run many times.

I do need to call a function as /bin/bash -c  b/c its called in find -exec.

Code (mostly based on Choroba's answer):

    #!/bin/bash
    shopt -s extglob
    
    function convert_single_text_file_to_utf8(){
    
        CUR_FILE_ENCODING_WITH_CRLF=$1
        echo "CUR_FILE_ENCODING_WITH_CRLF=${CUR_FILE_ENCODING_WITH_CRLF}"
    
        CUR_FILE_ENCODING_ONLY=${CUR_FILE_ENCODING_WITH_CRLF%%/*} # Remove everything starting from the last slash.
        LINE_FEED=${CUR_FILE_ENCODING_WITH_CRLF##$CUR_FILE_ENCODING_ONLY?(/)} # Remove the charset, followed by a slash if any.
    
        echo "CUR_FILE_ENCODING_ONLY=${CUR_FILE_ENCODING_ONLY}   LINE_FEED=${LINE_FEED}"
    }
    
    export -f convert_single_text_file_to_utf8
    
    for ENCODING in ASCII UTF-8/CRLF ISO-8859-2/LF EBCDIC-CA-FR; do
    
        echo "ENCODING=$ENCODING"
        export ENCODING
    
        /bin/bash -c 'shopt -s extglob; convert_single_text_file_to_utf8 "$ENCODING" '
    done

strider (43 rep)

Feb 27, 2025, 05:11 PM • Last activity: Feb 27, 2025, 11:28 PM

7 votes

4 answers

10401 views

Split PDF into documents with several pages each

scripting pdf split

There are several resources on the web explaining how one can split a PDF into many files with on page per file. But how can you split them into chunks of, say, five pages each? I have looked into the standard tools such as `pdftk` but could not find an option doing what I want.

                                  There are several resources on the web explaining how one can split a PDF into many files with on page per file.

But how can you split them into chunks of, say, five pages each? I have looked into the standard tools such as pdftk but could not find an option doing what I want.

Raphael (2095 rep)

Mar 5, 2013, 02:25 PM • Last activity: Jan 24, 2025, 12:39 PM

10 votes

5 answers

35007 views

Bash: split multi line input into array

bash array split

I've got a file with strings and base64 encoded data over multiple lines, that are sepearated by a comma. Example: 1,meV9ivU4PqEKNpo5Q2u2U0h9owUn4Y8CF83TTjUNWTRQs7dEgVxnsMgf4lvg9kvxcIaM3yB4Ssim z46M/C7YlovNUmrjOByhV1SCb/bGyv1yL7SYFnw1GHbYjdH0b6UZ7nQzJHU6VmwMo0V77vFNy6nx rmJZ4KqW9EcjdV1plQmsVXSiZVi61...

                                  I've got a file with strings and base64 encoded data over multiple lines, that are sepearated by a comma.

Example: 

    1,meV9ivU4PqEKNpo5Q2u2U0h9owUn4Y8CF83TTjUNWTRQs7dEgVxnsMgf4lvg9kvxcIaM3yB4Ssim
    z46M/C7YlovNUmrjOByhV1SCb/bGyv1yL7SYFnw1GHbYjdH0b6UZ7nQzJHU6VmwMo0V77vFNy6nx
    rmJZ4KqW9EcjdV1plQmsVXSiZVi61+fNOHCMDmVtJ4q097geWxf4bT0/k/yRyRwi5Zr8BC64htVS
    AdwOSo4PIk7xDLOzLywAYOCDQvD/zuErf1L0e8nHGz2LKdApHdEWB7Y2yM3iZyXuQ4sMx0+oX66+
    FxwUulvHj+EpXtLJx5rmV7AUjr/GsNw/1aYAGPCfz0S+//Ic5pXX5rY1fZ96oFGw4a9vRiAmxe/w
    ZOza6LtwuF+WUHjbIeWTUKKQGgFIM81dwVHHY7xdRnQhK5J0Zf3Xz0GzzZj5/2YFbI8q7lVkJ3ZQ
    7Oqt0qdfk3aj+BQhOxmn1F55yACPBZoPUw6K8ExTHHGVGdCEiIDTu5qKHcUwK0hGAZA9Mun5KTO0
    gPs9JxF8FJjkQBF7rEa6TP3pH5OwdkATH2uf+Zcmp1t6NbBymXVlsLzWZookVsaT1DNXf1I1H8Xz
    8dnfh6Yl63jSr2PAhDrcOqJNM8Z9/XhBGxtlD1ela3nq6N1ErR1Gv1YZKNeNcL7O2Z3Vl2oyyDw=,U2FsdGVkX1/c8rTTO41zVT7gB+KL+n7KoNCgM3vfchOyuvBngdXDGjXTvXTK0jz6

Now, I'd like to split the content into an array, so that each multi-line string is an array element.

I tried to use IFS, but that only reads the first line:

    filecontent=$(cat myfile)
    IFS=',' read -a myarray <<< "$filecontent"
Result:

    $myarray = 1 
    $myarray = meV9ivU4PqEKNpo5Q2u2U0h9owUn4Y8CF83TTjUNWTRQs7dEgVxnsMgf4lvg9kvxcIaM3yB4Ssim
Expected:

    $myarray = 1
    $myarray = meV9ivU4PqEKNpo5Q2u2U0h9owUn4Y8CF83TTjUNWTRQs7dEgVxnsMgf4lvg9kvxcIaM3yB4Ssim
    z46M/C7YlovNUmrjOByhV1SCb/bGyv1yL7SYFnw1GHbYjdH0b6UZ7nQzJHU6VmwMo0V77vFNy6nx
    rmJZ4KqW9EcjdV1plQmsVXSiZVi61+fNOHCMDmVtJ4q097geWxf4bT0/k/yRyRwi5Zr8BC64htVS
    AdwOSo4PIk7xDLOzLywAYOCDQvD/zuErf1L0e8nHGz2LKdApHdEWB7Y2yM3iZyXuQ4sMx0+oX66+
    FxwUulvHj+EpXtLJx5rmV7AUjr/GsNw/1aYAGPCfz0S+//Ic5pXX5rY1fZ96oFGw4a9vRiAmxe/w
    ZOza6LtwuF+WUHjbIeWTUKKQGgFIM81dwVHHY7xdRnQhK5J0Zf3Xz0GzzZj5/2YFbI8q7lVkJ3ZQ
    7Oqt0qdfk3aj+BQhOxmn1F55yACPBZoPUw6K8ExTHHGVGdCEiIDTu5qKHcUwK0hGAZA9Mun5KTO0
    gPs9JxF8FJjkQBF7rEa6TP3pH5OwdkATH2uf+Zcmp1t6NbBymXVlsLzWZookVsaT1DNXf1I1H8Xz
    8dnfh6Yl63jSr2PAhDrcOqJNM8Z9/XhBGxtlD1ela3nq6N1ErR1Gv1YZKNeNcL7O2Z3Vl2oyyDw=
    $myarray = U2FsdGVkX1/c8rTTO41zVT7gB+KL+n7KoNCgM3vfchOyuvBngdXDGjXTvXTK0jz6

Could someone help me out here?
                                

Yoda (101 rep)

Feb 3, 2016, 08:47 PM • Last activity: Jan 15, 2025, 08:14 PM

0 votes

0 answers

81 views

Rocky 9 split screen settings

desktop split settings rocky-linux

Rocky 9 has this nice feature that allows windows to snap into split screen layouts, although by default it's dividing the screen into 3 parts. Also the borders are a bit thick. Is there a setting panel to customize these settings?

                                  Rocky 9 has this nice feature that allows windows to snap into split screen layouts, although by default it's dividing the screen into 3 parts.
Also the borders are a bit thick.

Is there a setting panel to customize these settings?
                                

ytrox (1 rep)

Sep 26, 2024, 07:40 PM

49 votes

5 answers

110383 views

Split a file into two

text-processing awk split

I have a big file and need to split into two files. Suppose in the first file the 1000 lines should be selected and put into another file and delete those lines in the first file. I tried using `split` but it is creating multiple chunks.

                                  I have a big file and need to split into two files. Suppose in the first file the 1000 lines should be selected and put into another file and delete those lines in the first file.

I tried using split but it is creating multiple chunks.

Aravind (1679 rep)

Oct 21, 2014, 03:50 PM • Last activity: Sep 13, 2024, 04:13 PM

11 votes

3 answers

7363 views

What is the state of the art of splitting a binary file by size?

history split

### Some background you can happily skip Twenty years ago or so, when navigating the web costed a lot, when I was a Windows-only user, and when CDs/DVDs were a large storage means, and when sharing video files with a friend or relative would sometimes require to split the file over multiple CDs/DVDs...

                                  ### Some background you can happily skip

Twenty years ago or so, when navigating the web costed a lot, when I was a Windows-only user, and when CDs/DVDs were a large storage means, and when sharing video files with a friend or relative would sometimes require to split the file over multiple CDs/DVDs, copying them on the other computer's person and then rejoining the pieces, I used to use [HJSplit](https://hjsplit.it.softonic.com/) . Worked like a charm.

### The motivation

Fast-forward 20 years, I recently found myself in need for such a utility on Linux, due to the slow/unreliable connection not allowing me to easily scp stuff across physically very distant Linux systems. The solution that came to mind was to split the file and transfer the pieces, then rejoin them.

That's how I found HJSplit was Windows-only and that [lxsplit](https://lxsplit.sourceforge.net/)  existed and worked like a charm as well, so all is good.

### My question

But [lxsplit](https://lxsplit.sourceforge.net/)  is abandoned since 2008, so maybe some other (better?) solution has come up in these 15 years.

What is the state of the art in this field, i.e. splitting and rejoining big binary files, on Linux?¹

### Additional motivation

I also thought that conceptually speaking, splitting a file and rejoining it is a very simple task, so I wondered whether I could write my own program for doing so. I tried, and got something working in a few hours, but it's at least ~5 times slower than lxsplit. Before diving into profiling and benchmarking, I wanted to know whether there's other similar programs that have even better performance than lxsplit.

---

(¹) I'm not interested in alternative workflows for accomplishing the original task of transfering a big file between two systems. Yes, today you'd probably upload it to Dropbox/Onedrive/GoogleDrive/whatever from one system and download it from the other.

Enlico (2258 rep)

Jul 16, 2023, 06:46 AM • Last activity: Sep 9, 2024, 03:42 PM

1 votes

2 answers

911 views

Which version of split supports flag -p?

split portability

This command does not work in GNU Coreutils split, split of Cern Linux 5 (Redhat) and BSD (Apple Yosemite 10.10.3): split -p'\0' input.txt where input.txt is `masi\0hello\0world`. Some comments about the versions follow: - I do `split -p'\0' input.txt` in BSD Split but I get nothing as output in OSX...

                                  This command does not work in GNU Coreutils split, split of Cern Linux 5 (Redhat) and BSD (Apple Yosemite 10.10.3):

    split -p'\0' input.txt

where input.txt is masi\0hello\0world. 
Some comments about the versions follow:

- I do split -p'\0' input.txt in BSD Split but I get nothing as output in OSX Yosemite 10.10.3, GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin14). 
- I do echo 'masi\0hello' | split -p'\\0' in split 5.97 GNU 2012 in CERN Linux 5 (Redhat). Output split: unrecognized option --p\\0'. 
- no option -p in GNU Coreutils split 

I have forgot where I successfully used the option -p with split. 
Which version of split *does* support the flag -p?

                                

Léo Léopold Hertz 준영 (7138 rep)

Jun 24, 2015, 11:02 AM • Last activity: Aug 2, 2024, 07:15 PM

0 votes

2 answers

1179 views

Split file into 4 chunks using macOS version of split package

macos gnu split

In GNU/Linux in order to split file into 4 equal chunks we can do something like: split temp -n 4 PREFIX_ But seems like the macOS' BSD version of `split` utility doesn't have `-n` option. What would be the equivalent of GNU split command in macOS?

                                  In GNU/Linux in order to split file into 4 equal chunks we can do something like:

    split temp -n 4 PREFIX_

But seems like the macOS' BSD version of split utility doesn't have -n option.
What would be the equivalent of GNU split command in macOS?

Drew (253 rep)

May 10, 2018, 10:00 PM • Last activity: Jul 1, 2024, 04:37 PM

46 votes

4 answers

40247 views

How to split an image vertically using the command line?

command-line images imagemagick split image-manipulation

Say I have a large 800x5000 image; how would I split that into 5 separate images with dimensions 800x1000 using the command line?

                                  Say I have a large 800x5000 image; how would I split that into 5 separate images with dimensions 800x1000 using the command line?
                                

shley (1121 rep)

Nov 23, 2014, 01:40 PM • Last activity: May 23, 2024, 02:59 PM

0 votes

3 answers

99 views

Read file till special char, copy that section into another file, and continue till eof

linux shell-script split

I am trying to read a file in Linux and as soon as a "&" character is encountered, I am writing the output to another file, sending that file to another folder and then continuing to read the original file till next "&" and so on Input xml file- ``` & & ``` My code snippet - ``` while IFS= read -r l...

&




&

My code snippet -

while IFS= read -r line;do
     if [["$line" =="$delimeter"]];then
         echo "$line" | sed "s/delimeter.*//">> "$output_file"
         cp "$output_file" "$TARGET_FOLDER" 
         break
     else
         echo "$line" >> "$output_file"
     fi
done  to

(till &) section is put in output file, which is copied to TARGET_FOLDER. Then the next

to

section is copied and so on. Thankyou for your help!

python6 (1 rep)

Oct 11, 2023, 04:34 AM • Last activity: May 20, 2024, 11:16 AM

0 votes

1 answers

96 views

Archiving stdout to multiple tapes

pipe tar stdin split tape

I have large files which are generated on the fly to stdout, one every 24hours. I would like to archive these files progressively on tapes, ideally in a single archive which potentially spans multiple tapes. Tar is very good for managing the tapes, as it has built-in functionalities to append to an archive and to load the next tape. But it is very poor at accepting data from stdin. No matter what I do, it ends up writing a special file (link or named pipe) to the archive, instead of its content. Here is the example command, that I have been trying. The first day, generate a new archive:

ln -s /dev/stdin day1 # or use the --transform option of tar
data_generator | tar -c -h -M -f /dev/nst0 -H posix -F 'mtx -f /dev/sch0 next' day1

the next day, I would like to just change -c to -A and save the new stream into a new file appended to the tar archive, loading a new tape when it becomes necessary.

data_generator | tar -A -h -M -f /dev/nst0 -H posix -F 'mtx -f /dev/sch0 next' day2

As I said, all I find in the archive is a named pipe (with -h) or a symlink (without -h). Some ideas that I have tried and are not good: 1. Using split instead of tar is not viable, because it is too basic. It can only split to pre-defined dimension (not good if I do not start from the beginning of the tape), and it cannot concatenate the different days in an unpackable archive. Tar does not need to know the size of the data nor the tape, it will just switch to a new tape when it gets a write error. 2. I've read the manuals of cpio, star and dar. I do not get the impression that they cope with pipes better than tar. Thank you for any hints. Edit: I'm starting to think that it is impossible with tar, because it needs to know the size of the file before starting to write. In fact, an archive that can be expanded, appending is very tricky if you do write the size before the content.

lorenzo (1 rep)

Mar 1, 2024, 09:13 PM • Last activity: Mar 5, 2024, 04:55 PM

9 votes

7 answers

5826 views

How to efficiently split up a large text file wihout splitting multiline records?

text-processing sed awk split wc

I have a big text file (~50Gb when gz'ed). The file contains ``4*N`` lines or ``N`` records; that is every record consists of 4 lines. I would like to split this file into 4 smaller files each sized roughly 25% of the input file. How can I split up the file at the record boundary? A naive approach w...

                                  I have a big text file (~50Gb when gz'ed). The file contains `4*N lines or N` records; that is every record consists of 4 lines. I would like to split this file into 4 smaller files each sized roughly 25% of the input file. How can I split up the file at the record boundary?

A naive approach would be `zcat file | wc -l to get the line count, divide that number by 4 and then use split -l  file`. However, this goes over the file twice and the line-counte is extremely slow (36mins). Is there a better way?

This  comes close but is not what I am looking for. The accepted answer also does a line count.

**EDIT:**

The file contains sequencing data in fastq format. Two records look like this (anonymized):

    @NxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxGCGA+ATAGAGAG
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxTTTATGTTTTTAATTAATTCTGTTTCCTCAGATTGATGATGAAGTTxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    +
    AAAAA#FFFFFFFFFFFFAFFFFF#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF /dev/null`` takes 31mins.

**EDIT3:**
Onlye the first line starts with `@`. None of the others will ever. See here . Records need to stay in order. It's not ok to add anything to the resulting file.

Rolf (932 rep)

Jun 16, 2015, 07:55 AM • Last activity: Nov 27, 2023, 07:55 AM

0 votes

1 answers

93 views

Slow down a `split`

find xargs split

I have a really large archive consisting of really small files, concatenated into a single text file, with a " " dilimiter. For smaller archives, I would `split` the archive using " " as a pattern, and then work on the resulting files. However, in this archive there are on the order of magnitude of...

                                  I have a really large archive consisting of really small files, concatenated into a
single text file, with a "" dilimiter.  For smaller archives, I would split
the archive using "" as a pattern, and then work on the resulting files. 
However, in this archive there are on the order of magnitude of a hundred million
such files -- clearly, too much for putting them all into a single directory.  I
have created folders aa, ab, etc. for trying to move them into directories as
they are created.  However, I ran into issues.  Things I've tried:

1) There is no command for split to execute any command on the resulting file.  So
I have to do it by hand.

2) Moving the files into the ** directory using `find . -name "xaa*" -exec mv {}
aa \+ does not work because {}` is not at the end of the line.

3) The -t flag, for reversing the source and the destination, is not available in
my version of Unix.

4) I had to pipe the output of find into xargs, for it to work out.

However, this is too slow -- files are being created way faster than they are moved
away.  

5) I suspect that xargs is processing less files at a time than using a \+ after
find -exec.  I tried adding a `-R 6000' flag, for running 6000 entries at a time;
however, I don't think it made a difference.

6) I decreased the priority of split to the lowest possible.  No change in the
amount of CPU it consumed, so probably no effect either.

7) I open up to seven command prompts for running the mv commands (last four
letters per command prompt) -- however, this is still not nearly enough.  I would
open more, but once the system gets to seven, the response is so slow that I have to
stop the split.  For example, the source archive got copied to a USB all while
waiting for a ls -l | tail command to return something.

So what I've been doing is, stopping the split at that point, waiting for the mv
commands to catch up, and then restarting the split.  At that point I would use
find -exec rm {} \+ to delete the files I already have; this is a bit faster, so
when it gets to the files I don't have, there's less files around.

So the first such iteration lasted ~3 million files, the next one ~2 million, the
next ~1.5.  I am sure there should be a better way, though.  Any ideas for what else
to try?
                                

Alex (1220 rep)

Nov 10, 2023, 02:31 PM • Last activity: Nov 10, 2023, 03:17 PM

1 votes

4 answers

1493 views

streaming split/merge command?

ssh split streaming

Does a streaming version of `split` exist somewhere in Linux? I'm trying to back up a large amount of data via SSH, but SSH's single threaded encryption is limiting the transfer. This machine doesn't have hardware AES support so I'm using ChaCha encryption, but the cpu is still not keeping up with t...

                                  Does a streaming version of split exist somewhere in Linux?

I'm trying to back up a large amount of data via SSH, but SSH's single threaded encryption is limiting the transfer. This machine doesn't have hardware AES support so I'm using ChaCha encryption, but the cpu is still not keeping up with the network.

So I thought I could solve this by splitting the data stream in two, and sending each over a separate SSH connection, and then merging the streams together at the destination. That way the encryption load can be shared over multiple cpu cores. This looked like a general enough idea that it should already exist, but I can't find it.

*edit*: for some numbers, I'm backing up data from an old computer, a few hundred GB over a gigiabit wired network. I'm copying an image from a partition, as that is faster than doing individual file access on the spinning rust drive, so technically it is random access data, but it is too large to treat it as such. I tried compressing it, but that doesn't help a lot. The data isn't very compressible.

So what I'm looking for is a split (and corresponding merge) that will split a stream of binary data into multiple streams (probably splitting by fixed chunks).

JanKanis (1421 rep)

Mar 28, 2021, 04:29 PM • Last activity: Oct 12, 2023, 12:28 PM

1 votes

2 answers

227 views

Does splitting a file in more files necessarily mean that some/all of the overall content will not be where it was?

linux files filesystems split

I guess that given a file of a certain size, not all of its bytes will be contiguous on disk (or will they? Just for existence of the phrase "defragmenting a disk" I assume they will not). But at least from the application point of view, they are. I mean, I can use head -c [-]n+tail -c [-]n to extract a portion of a file thinking of it as a contiguous sequence of bytes. So say that a file is 10 bytes long and contains all equal bytes, e.g.

$ cat someFile
AAAAAAAAAA

is it possible to spilt it in two files, say someFile.part1 and someFile.part2, such that

$ cat someFile.part1
AAAAA
$ cat someFile.part2
AAAAA

and that actually no byte has been moved anywhere, in the sense that those 10 bytes are still precisely where they were before? After all, the name someFile must one way or another map to the physical position (on the disk or on some kind of virtual memory that the OS (or the kernel?) makes me deal with) where the content actually starts. In a way, I imagine someFile not being much different from a pointer and length, say 0xabc/10, whereas the target files would be 0xabc/5 and 0xac1/5. Maybe I'm being to much infuenced by my C++ experience and really inexistent file system experience :D --- I'm not interesting in doing it per se. I'm curious about understanding how programs like [lxsplit](https://lxsplit.sourceforge.net/) work, and where their strengths are. Mostly for curiosity, but maybe, why not, to play at writing one myself.

Enlico (2258 rep)

Apr 4, 2023, 06:16 PM • Last activity: Aug 6, 2023, 10:01 AM

62 votes

3 answers

126738 views

Split a file by line and have control over resulting files extension

shell filenames split

There is a standard command for file splitting - `split`. For example, if I want to split a `words` file in several chunks of 10000 lines, I can use: split -dl 10000 words wrd It would generate several files of the form `wrd.01`, `wrd.02` and so on. But I want to have a specific extension for those...

                                  There is a standard command for file splitting - split.

For example, if I want to split a words file in several chunks of 10000 lines, I can use:

    split -dl 10000 words wrd

It would generate several files of the form wrd.01, wrd.02 and so on.

But I want to have a specific extension for those files - for example, I want to get wtd.01.txt, wrd.02.txt files.

Is there a way to do it?

Rogach (6533 rep)

Feb 25, 2012, 04:47 AM • Last activity: Jul 26, 2023, 12:45 PM

3 votes

3 answers

2048 views

How to create split tar archive in multiple stages to save space?

tar split archive

I have a very large folder that I am trying to create a tar archive of. The issue is I don't have enough extra free space to store the entire archive so I want to create say 100-200GB chunks of the archive at a time and transfer those individually to cloud storage. I need to be able to control when...

                                  I have a very large folder that I am trying to create a tar archive of. The issue is I don't have enough extra free space to store the entire archive so I want to create say 100-200GB chunks of the archive at a time and transfer those individually to cloud storage. I need to be able to control when new chunks are created so my HDD doesn't fill up but all of the commands i've found to create split tarballs always create it all at once, in the same directory.

The closest solution I found was from [this question](https://unix.stackexchange.com/questions/197464/how-to-create-multi-tar-archives-for-a-huge-folder)  but all the responses base the archives on number of files, not size which is important for my use case as my file sizes are unevenly distributed.

Josh Harrison (53 rep)

Jan 7, 2021, 10:48 PM • Last activity: Jul 24, 2023, 03:05 PM

Showing page 1 of 20 total questions