**Context**
I want to (try) unpack an MSI, Zip (or any archive), or EXE to inspect the contents.
I also want to recursively try unpack all extracted files.
Using 7zip, I have found I can unpack MSI, Zip, EXE and it just fails if it can't treat the file like an archive. This is good enough for me.
However, running 7z over every file is too slow. I want to quickly skip files which don't look like they're compressed.
I have read this: https://unix.stackexchange.com/questions/63918/determine-whether-a-particular-file-is-compressed
* Note: in contrast to comment here I actually would like to unpack anything that **could** be an archive (like an ODF document or EPUB)
This is what I have so far. It is correct, but inefficient (my script `
COMPRESSED=/home/wineuser/bin/compressed
` always returns true):
#!/bin/bash
set -euo pipefail
COMPRESSED=/home/wineuser/bin/compressed
# Initial archive and output directory
initial_archive=$1
output_dir=$2
# Function to extract an archive
extract_archive() {
local archive=$1
local output_dir=$2
# Extract the archive to the output directory
7z x "$archive" -o"$output_dir" -y || return
# Find all files in the output directory
idx=0
find "$output_dir" -type f -exec "$COMPRESSED" {} \; -print | while read line; do
idx=$((idx+1))
extract_archive "$line" "$output_dir"/"$idx" || true
done
}
# Extract the initial archive recursively
extract_archive "$initial_archive" "$output_dir"
**Requirements**
I want this to work in general, so thinking something like:
if (file "$f" | grep -q compressed ) ; then
# try uncompress with 7zip
fi
BUT, `file
doesn't have the word "compressed" anywhere for MSI. Neither does
file -k
`.
Example output for MSI:
Composite Document File V2 Document, Little Endian, Os: Windows, Version 6.2, MSI Installer, Code page: 1252, Title: Installation Database, Subject: DB Browser for SQLite, Author: DB Browser for SQLite Team, Keywords: Installer, Comments: This installer database contains the logic and data required to install DB Browser for SQLite., Template: x64;1033, Revision Number: {E6668432-CAB0-416D-9422-E24C9E71DB68}, Create Time/Date: Sun May 2 16:30:00 2021, Last Saved Time/Date: Sun May 2 16:30:00 2021, Number of Pages: 405, Number of Words: 2, Name of Creating Application: Windows Installer XML Toolset (3.11.1.2318), Security: 2
Example output for self-extracting exe:
PE32 executable (GUI) Intel 80386, for MS Windows, Nullsoft Installer self-extracting archive
I could maintain a word list, but feel this is rather brittle.
At this stage, I would be happy with a simple entropy check on first N bytes - ie, anything high entropy is "compressed".
Wondering if there may be a more straight-forward solution.
btw, I'm aware of this https://superuser.com/questions/307678/how-do-i-extract-files-from-an-msi-package - but I need to do this on linux not windows, and file names mean nothing for my use case so 7zip works as good as msiexec for my use case (and works for other archives too).
Asked by d.j.yotta
(101 rep)
Jul 31, 2024, 12:48 AM
Last activity: Aug 1, 2024, 09:33 PM
Last activity: Aug 1, 2024, 09:33 PM