Unix & Linux Stack Exchange
Q&A for users of Linux, FreeBSD and other Unix-like operating systems
Latest Questions
22
votes
5
answers
44095
views
Compress a large number of large files fast
I have about 200 GB of log data generated daily, distributed among about 150 different log files. I have a script that moves the files to a temporary location and does a tar-bz2 on the temporary directory. I get good results as 200 GB logs are compressed to about 12-15 GB. The problem is that it tak...
I have about 200 GB of log data generated daily, distributed among about 150 different log files.
I have a script that moves the files to a temporary location and does a tar-bz2 on the temporary directory.
I get good results as 200 GB logs are compressed to about 12-15 GB.
The problem is that it takes forever to compress the files. The cron job runs at 2:30 AM daily and continues to run till 5:00-6:00 PM.
Is there a way to improve the speed of the compression and complete the job faster? Any ideas?
Don't worry about other processes and all, the location where the compression happens is on a NAS , and I can run mount the NAS on a dedicated VM and run the compression script from there.
Here is the output of top for reference:
top - 15:53:50 up 1093 days, 6:36, 1 user, load average: 1.00, 1.05, 1.07
Tasks: 101 total, 3 running, 98 sleeping, 0 stopped, 0 zombie
Cpu(s): 25.1%us, 0.7%sy, 0.0%ni, 74.1%id, 0.0%wa, 0.0%hi, 0.1%si, 0.1%st
Mem: 8388608k total, 8334844k used, 53764k free, 9800k buffers
Swap: 12550136k total, 488k used, 12549648k free, 4936168k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7086 appmon 18 0 13256 7880 440 R 96.7 0.1 791:16.83 bzip2
7085 appmon 18 0 19452 1148 856 S 0.0 0.0 1:45.41 tar cjvf /nwk_storelogs/compressed_logs/compressed_logs_2016_30_04.tar.bz2 /nwk_storelogs/temp/ASPEN-GC-32459:nkp-aspn-1014.log /nwk_stor
30756 appmon 15 0 85952 1944 1000 S 0.0 0.0 0:00.00 sshd: appmon@pts/0
30757 appmon 15 0 64884 1816 1032 S 0.0 0.0 0:00.01 -tcsh
anu
(362 rep)
May 4, 2016, 11:00 PM
• Last activity: Jun 17, 2025, 09:12 AM
1
votes
1
answers
102
views
Parallel processing of single huge .bz2 or .gz file
I would like to use GNU Parallel to process a huge .gz or .bz2 file. I know I can do: bzcat huge.bz2 | parallel --pipe ... But it would be nice if there was a way similar to `--pipe-part` that can read multiple parts of the file in parallel. One option is to decompress the file: bzcat huge.bz2 > hug...
I would like to use GNU Parallel to process a huge .gz or .bz2 file.
I know I can do:
bzcat huge.bz2 | parallel --pipe ...
But it would be nice if there was a way similar to
--pipe-part
that can read multiple parts of the file in parallel. One option is to decompress the file:
bzcat huge.bz2 > huge
parallel --pipe-part -a huge ...
but huge.bz2
is huge, and I would much prefer decompressing it multiple times than storing it uncompressed.
Ole Tange
(37348 rep)
Mar 28, 2025, 11:58 AM
• Last activity: Mar 29, 2025, 10:33 AM
247
votes
4
answers
99373
views
Why are tar archive formats switching to xz compression to replace bzip2 and what about gzip?
More and more [`tar`][1] archives use the [`xz`][2] format based on LZMA2 for compression instead of the traditional [`bzip2(bz2)`][3] compression. In fact *kernel.org* made a late "*Good-bye bzip2*" [announcement, 27th Dec. 2013][4], indicating kernel sources would from this point on be released in...
More and more
tar
archives use the xz
format based on LZMA2 for compression instead of the traditional bzip2(bz2)
compression. In fact *kernel.org* made a late "*Good-bye bzip2*" announcement, 27th Dec. 2013 , indicating kernel sources would from this point on be released in both tar.gz and tar.xz format - and on the main page of the website what's directly offered is in tar.xz
.
Are there any specific reasons explaining why this is happening and what is the relevance of gzip
in this context?
user44370
Jan 6, 2014, 06:39 PM
• Last activity: Jan 21, 2025, 02:31 PM
0
votes
1
answers
65
views
Is it possible to compress a tar ball with gzip/bzip2/xz after tar ball file has been created?
If we create a tar ball file by giving the following command tar -cvf Docs.tar $HOME/Documents/* then post creation of the tar ball is it possible to use gzip or bzip2 or xz or some other compression utility to compress the tar file? I know that we can give the option `--bzip2` or `--xz` or `--gzip`...
If we create a tar ball file by giving the following command
tar -cvf Docs.tar $HOME/Documents/*
then post creation of the tar ball is it possible to use gzip or bzip2 or xz or some other compression utility to compress the tar file?
I know that we can give the option
--bzip2
or --xz
or --gzip
while creating the tar along with -cvf
option but what if that is not done. And after the tar is created then the compression is sought to be applied. Is it possible? If yes then how?
KDM
(116 rep)
Oct 2, 2024, 01:50 PM
• Last activity: Oct 2, 2024, 02:15 PM
0
votes
0
answers
140
views
vmlinuz to vmlinux ERROR
``` $ file vmlinuz vmlinuz: Linux kernel x86 boot executable bzImage, version 4.14.244 (root@d0ea4514eda5) #1 SMP Thu Aug 31 01:23:02 PDT 2023, RO-rootFS, swap_dev 0x3, Normal VGA ``` I try to use `extract_vmlinux` and `vmlinux-to-elf` to extract vmlinux from vmlinuz, but report the following errors...
$ file vmlinuz
vmlinuz: Linux kernel x86 boot executable bzImage, version 4.14.244 (root@d0ea4514eda5) #1 SMP Thu Aug 31 01:23:02 PDT 2023, RO-rootFS, swap_dev 0x3, Normal VGA
I try to use extract_vmlinux
and vmlinux-to-elf
to extract vmlinux from vmlinuz, but report the following errors respectively:
$ vmlinux-to-elf vmlinuz vmlinux
Traceback (most recent call last):
File "/usr/local/bin/vmlinux-to-elf", line 63, in
ElfSymbolizer(
File "/usr/local/lib/python3.8/dist-packages/vmlinux_to_elf/elf_symbolizer.py", line 44, in __init__
kallsyms_finder = KallsymsFinder(file_contents, bit_size)
File "/usr/local/lib/python3.8/dist-packages/vmlinux_to_elf/kallsyms_finder.py", line 177, in __init__
self.find_linux_kernel_version()
File "/usr/local/lib/python3.8/dist-packages/vmlinux_to_elf/kallsyms_finder.py", line 225, in find_linux_kernel_version
raise ValueError('No version string found in this kernel')
ValueError: No version string found in this kernel
$ ./extract_vmlinux vmlinuz > vmlinux
extract_vmlinux: Cannot find vmlinux.
Then I tried manual extraction:
$ od -A d -t x1 vmlinuz | grep 'fd 37 7a 58 5a 00'
3254032 fd 37 7a 58 5a 00 44 65 73 74 69 6e 61 74 69 6f
$ dd if=vmlinuz of=vmlinuz_unxz bs=1 skip=3254032
116928+0 records in
116928+0 records out
116928 bytes (117 kB, 114 KiB) copied, 1.10249 s, 106 kB/s
$ xz -d vmlinuz_unxz
xz: vmlinuz_unxz: Compressed data is corrupt
What went wrong? Any suggestions to extract vmlinux?
Thank you!
pipik
(1 rep)
Aug 16, 2024, 02:14 AM
• Last activity: Aug 16, 2024, 02:31 AM
1
votes
1
answers
1932
views
How to extract all uncorrupted file from bzip2 compression?
I am trying to uncompresss a bzip2 file (~55 GB) with the command ```tar -jxvf file.tar.bz2``` However I found that the decompression process gets stuck at a certain file and, after waiting a long duration, gives the error message shown below without decompressing the other files. ``` bzip2: Compres...
I am trying to uncompresss a bzip2 file (~55 GB) with the command
-jxvf file.tar.bz2
However I found that the decompression process gets stuck at a certain file and, after waiting a long duration, gives the error message shown below without decompressing the other files.
bzip2: Compressed file ends unexpectedly;
perhaps it is corrupted? *Possible* reason follows.
bzip2: Inappropriate ioctl for device
Input file = (stdin), output file = (stdout)
It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.
You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
The last file where decompression stucks happens to be tar file. Is it possible to bypass this tar file and continue to extract other files if I'm not interested in that tar file?
Raghvender
(11 rep)
Aug 15, 2022, 02:38 PM
• Last activity: Aug 15, 2022, 03:47 PM
1
votes
0
answers
725
views
Compressing small files with gzip makes its size smaller than with bzip2, why?
I have a question, just a little thing couch my eyes it's about compression with gzip and bzip2: If I understood correctly - bzip2 requires more processing power but compresses files smaller and more efficient than gzip. When I tried to compress a 6MB file with bzip2 its size got smaller than with g...
I have a question, just a little thing couch my eyes
it's about compression with gzip and bzip2:
If I understood correctly - bzip2 requires more processing power but compresses files smaller and more efficient than gzip.
When I tried to compress a 6MB file with bzip2 its size got smaller than with gzip as I expected.
But when I tried to compress a file with a size of 5 bytes with bzip2 its size got larger than with gzip.
5_bytes_file
-b 5_bytes_file.bz2
result: 42 bytes
5_bytes_file
-b 5_bytes_file.gz
result 38 bytes
Why is it happening? am I doing something wrong?
Karuch
(11 rep)
Aug 9, 2022, 09:27 AM
• Last activity: Aug 9, 2022, 09:27 AM
2
votes
1
answers
3155
views
php build error: please reinstall BZip2 distribution
I tried to build PHP v8.0.0 from its source but after running `./configure` it says: ``` ... checking for BZip2... not found configure: error: please reinstall BZip2 distribution ``` But I have `bzip2` installed already. How do I fix that?
I tried to build PHP v8.0.0 from its source but after running
./configure
it says:
...
checking for BZip2... not found
configure: error: please reinstall BZip2 distribution
But I have bzip2
installed already. How do I fix that?
Ar Rakin
(189 rep)
Jul 17, 2021, 05:28 AM
• Last activity: Mar 5, 2022, 04:56 PM
0
votes
2
answers
397
views
using bzip gzip zip in bash
``` lang-bash #!/bin/bash # check if the user put a file name if [ $# -gt 0 ]; then # check if the file is exist in the current directory if [ -f "$1" ]; then # check if the file is readable in the current directory if [ -r "$1" ]; then echo "File:$1" echo "$(wc -c <"$1")" # Note the following two l...
lang-bash
#!/bin/bash
# check if the user put a file name
if [ $# -gt 0 ]; then
# check if the file is exist in the current directory
if [ -f "$1" ]; then
# check if the file is readable in the current directory
if [ -r "$1" ]; then
echo "File:$1"
echo "$(wc -c <"$1")"
# Note the following two lines
comp=$(bzip2 -k $1)
echo "$(wc -c <"$comp")"
else
echo "$1 is unreadable"
exit
fi
else
echo "$1 is not exist"
exit
fi
fi
Currently my problem is that I can compress the $1
file into a $1.c.bz2
file by bzip, but what if I come to capture the size of the compressed file. My code shows no such files.
alan
(1 rep)
Sep 17, 2021, 12:06 AM
• Last activity: Sep 21, 2021, 07:11 AM
0
votes
0
answers
249
views
Compression vs. redundancy: do they cancel each other out?
Does it make sense to compress a tarball (or any kind of file, really), e.g., using `gzip` or `bzip2`, while at the same time creating redundancy files for it, e.g., a `par2` file? The context is that I am reasoning about how to best backup my personal files. My #1 priority is to avoid data loss due...
Does it make sense to compress a tarball (or any kind of file, really), e.g., using
gzip
or bzip2
, while at the same time creating redundancy files for it, e.g., a par2
file?
The context is that I am reasoning about how to best backup my personal files.
My #1 priority is to avoid data loss due to bitrot, hence, the par2
files.
Compression would be nice but is not my main concern.
The reason I am doubting the combined application of a compression algorithm and a erasure code algorithm is that the former works by eliminating redundancy (thereby creating smaller files) while the latter works by adding redundancy (thereby adding the capability to perform data recovery operations).
Don't they cancel each other out?
Is this a reasonable assumption or am I missing something here?
pygumby
(111 rep)
May 23, 2021, 10:37 PM
• Last activity: May 23, 2021, 11:53 PM
1
votes
2
answers
1979
views
Check tar file for errors
If there any way to see if there is a problem with the `.tar.bz2` file? As you can see, I can get a list of files, but neither `xjvf` nor `xzvf` works in this case. $ tar tf pytorch.20210702.tar.bz2 | head -n 5 pytorch/ pytorch/BUILD.bazel pytorch/requirements-flake8.txt pytorch/NOTICE pytorch/WORKS...
If there any way to see if there is a problem with the
.tar.bz2
file? As you can see, I can get a list of files, but neither xjvf
nor xzvf
works in this case.
$ tar tf pytorch.20210702.tar.bz2 | head -n 5
pytorch/
pytorch/BUILD.bazel
pytorch/requirements-flake8.txt
pytorch/NOTICE
pytorch/WORKSPACE
$ tar xjvf pytorch.20210702.tar.bz2
bzip2: (stdin) is not a bzip2 file.
tar: Child returned status 2
tar: Error is not recoverable: exiting now
$ tar xzvf pytorch.20210702.tar.bz2
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
mahmood
(1271 rep)
Mar 17, 2021, 09:30 AM
• Last activity: Mar 17, 2021, 10:00 AM
14
votes
3
answers
37329
views
bunzip2 to a different directory
Say I have a file `foo.tbz2` in a directory. I want to extract the `tar` file from the archive, but to a different directory. It seems like `bunzip2` will only extract the archive to the same directory as the archive. This works, but I'm wondering if there is a better way: cd /another/directory bunz...
Say I have a file
foo.tbz2
in a directory. I want to extract the tar
file from the archive, but to a different directory. It seems like bunzip2
will only extract the archive to the same directory as the archive.
This works, but I'm wondering if there is a better way:
cd /another/directory
bunzip2 -k /original/directory/foo.tbz2
longneck
(430 rep)
Aug 15, 2012, 08:11 PM
• Last activity: May 28, 2020, 09:37 AM
13
votes
2
answers
20136
views
How to check/test .tar.bz archives?
I've been using tar with its "--use-compress-prog=pbzip2" function to archive my files then compress them with pbzip2 to get an "*.tar.bz" archive. Afterwards I checked the resulting file with pbzip2's "-t" switch, and it passed the test. However, to great surprise, I got "file incomplete" or other...
I've been using tar with its "--use-compress-prog=pbzip2" function to archive my files then compress them with pbzip2 to get an "*.tar.bz" archive.
Afterwards I checked the resulting file with pbzip2's "-t" switch, and it passed the test. However, to great surprise, I got "file incomplete" or other integrity errors when trying to extract the archive!
Is it because there might be something wrong with the tar file, but not when it was compressed by pbzip2? If so, is there a way to check the tar file itself? If not, what other problem might this be? Also, are there ways to recover data from tar files with errors?
I am afraid I might have already lost some important data through this process...
The point is, I would like to know a method to test the integrity of my archives after they are created.
hpy
(4597 rep)
Apr 19, 2012, 02:19 PM
• Last activity: May 12, 2020, 02:54 AM
2
votes
1
answers
9260
views
tar (child): : Cannot open: Is a directory
I know thats a pretty dumb question but I didn't found this precise question on internet I try to `tar -cvjf` all the contents of a directory (`/*`) and directly redirect that to a file (`> file`) but the title error message occurs. I compress both files and directories here
I know thats a pretty dumb question but I didn't found this precise question on internet
I try to
tar -cvjf
all the contents of a directory (/*
) and directly redirect that to a file (> file
) but the title error message occurs. I compress both files and directories here
wxi
(189 rep)
Mar 24, 2020, 12:08 PM
• Last activity: Mar 24, 2020, 12:42 PM
8
votes
1
answers
2167
views
Is there a compression tool with an arbitrarily large dictionary?
I am looking for a compression tool with an arbitrarily large dictionary (and "block size"). Let me explain by way of examples. First let us create 32MB random data and then concatenate it to itself to make a file of twice the length of length 64MB. head -c32M /dev/urandom > test32.bin cat test32.bi...
I am looking for a compression tool with an arbitrarily large dictionary (and "block size"). Let me explain by way of examples.
First let us create 32MB random data and then concatenate it to itself to make a file of twice the length of length 64MB.
head -c32M /dev/urandom > test32.bin
cat test32.bin test32.bin > test64.bin
Of course
test32.bin
is not compressible because it is random but the first half of test64.bin
is the same as the second half, so it should be compressible by roughly 50%.
First let's try some standard tools. test64.bin is of size exactly 67108864.
- gzip -9. Compressed size 67119133.
- bzip2 -9. Compressed size 67409123. (A really big overhead!)
- xz -7. Compressed size 67112252.
- xz -8. Compressed size 33561724.
- zstd --ultra -22. Compressed size 33558039.
We learn from this that gzip and bzip2 can never compress this file. However with a big enough dictionary xz and zstd can compress the file and in that case zstd does the best job.
However, now try:
head -c150M /dev/urandom > test150.bin
cat test150.bin test150.bin > test300.bin
test300.bin is of size exactly 314572800. Let's try the best compression algorithms again at their highest settings.
- xz -9. Compressed size 314588440
- zstd --ultra -22. Compressed size 314580017
In this case neither tool can compress the file.
> Is there a tool that has an arbitrarily large dictionary size so it
> can compress a file such as test300.bin?
---
Thanks to the comment and answer it turns out both zstd and xz can do it. You need zstd version 1.4.x however.
- zstd --long=28. Compressed size 157306814
- xz -9 --lzma2=dict=150MiB. Compressed size 157317764.
Simd
(371 rep)
Jan 18, 2020, 09:50 PM
• Last activity: Jan 19, 2020, 06:11 PM
7
votes
2
answers
6719
views
bzip2: Check file's decompressed size without actually decompressing it
I have a big `bzip2` compressed file and I need to check it's decompressed size without actually decompressing it (similar to `gzip -l file.gz` or `xz -l file.xz`). How can this be done using `bzip2`?
I have a big
bzip2
compressed file and I need to check it's decompressed size without actually decompressing it (similar to gzip -l file.gz
or xz -l file.xz
). How can this be done using bzip2
?
manifestor
(2563 rep)
Oct 12, 2019, 11:22 AM
• Last activity: Oct 12, 2019, 04:34 PM
2
votes
3
answers
990
views
Difference between .bz2 and .tar.bz2 files
I am supposed to find whether a file is compressed using .bz2 or .tar.bz2(without using extension of file) and and decompress it accordingly. I used the `file` command but it is giving same result for both .bz2 and .tar.bz2. Please suggest a way to identify .bz2 and .tar.bz2 files distinctly.
I am supposed to find whether a file is compressed using .bz2 or .tar.bz2(without using extension of file) and
and decompress it accordingly. I used the
file
command but it is giving same result for both .bz2 and .tar.bz2. Please suggest a way to identify .bz2 and .tar.bz2 files distinctly.
charan priyatham
(83 rep)
Sep 15, 2019, 09:17 PM
• Last activity: Sep 16, 2019, 09:07 AM
0
votes
2
answers
236
views
Copying files, verifying and then zipping with a shell script
I am looking to create a script - a linux or a python scrip so that it can create the following things for me as well as Verify the files for me which are copied from a folder. I Have two folders: FolderA has 300 .xls files - This folder is missing some files which are in folder B currently. FolderB...
I am looking to create a script - a linux or a python scrip so that it can create the following things for me as well as Verify the files for me which are copied from a folder.
I Have two folders:
FolderA has 300 .xls files - This folder is missing some files which are in folder B currently.
FolderB has 500 .xls files
I want to copy select few 100 files from Folder B to folder A. Then want the script to verify that all the files currently residing now in folder A(should be 400 now after copying 100 files from B) also exist in folder B.
Then I want the script to zip all these files separately as its own bzip2 file. Basically there will be 400 bzip2 files(one for each excel) in the end when the process is completed.
mywayz
(61 rep)
Jul 18, 2019, 08:22 PM
• Last activity: Aug 6, 2019, 01:31 PM
8
votes
1
answers
1017
views
Can files compressed with bzip2 be relied upon to be deterministic (reproducible)?
I am trying to determine if there are any potential issues using `bzip2` to compress files that need to be 100% reproducible. Specifically: can metadata (name / inode, lastmod date, etc) or anything else cause identical file contents to **produce a different checksum** on the resulting `.bz2` archiv...
I am trying to determine if there are any potential issues using
bzip2
to compress files that need to be 100% reproducible. Specifically: can metadata (name / inode, lastmod date, etc) or anything else cause identical file contents to **produce a different checksum** on the resulting .bz2
archive?
As an example, gzip is not by default deterministic unless -n
is used.
My crude tests so far suggest that bzip2 does indeed consistently produce identical files given identical input data (regardless of metadata, platform, filesystem, etc), but it would be nice to have more than anecdotal evidence.
Jonathan Cross
(258 rep)
Jul 22, 2019, 12:14 PM
• Last activity: Jul 22, 2019, 01:52 PM
2
votes
2
answers
6330
views
BZIP2 multiple files without losing original files
I want to bzip2 about 1000 files. However, I am tasked to not remove the old files and leave both the original and its bz2 file in the same folder. What is the quickest way to do this. Just to rephrase my question, suppose I have file1.txt, file2.txt, file3.txt...file1000.txt, I would need its bz2 v...
I want to bzip2 about 1000 files. However, I am tasked to not remove the old files and leave both the original and its bz2 file in the same folder. What is the quickest way to do this.
Just to rephrase my question, suppose I have file1.txt, file2.txt, file3.txt...file1000.txt, I would need its bz2 versions in the same folder without removing them.
How to achieve this?
mywayz
(61 rep)
Jun 28, 2019, 12:55 AM
• Last activity: Jun 29, 2019, 07:47 AM
Showing page 1 of 20 total questions