Sample Header Ad - 728x90

Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

22 votes
5 answers
44096 views
Compress a large number of large files fast
I have about 200 GB of log data generated daily, distributed among about 150 different log files. I have a script that moves the files to a temporary location and does a tar-bz2 on the temporary directory. I get good results as 200 GB logs are compressed to about 12-15 GB. The problem is that it tak...
I have about 200 GB of log data generated daily, distributed among about 150 different log files. I have a script that moves the files to a temporary location and does a tar-bz2 on the temporary directory. I get good results as 200 GB logs are compressed to about 12-15 GB. The problem is that it takes forever to compress the files. The cron job runs at 2:30 AM daily and continues to run till 5:00-6:00 PM. Is there a way to improve the speed of the compression and complete the job faster? Any ideas? Don't worry about other processes and all, the location where the compression happens is on a NAS , and I can run mount the NAS on a dedicated VM and run the compression script from there. Here is the output of top for reference: top - 15:53:50 up 1093 days, 6:36, 1 user, load average: 1.00, 1.05, 1.07 Tasks: 101 total, 3 running, 98 sleeping, 0 stopped, 0 zombie Cpu(s): 25.1%us, 0.7%sy, 0.0%ni, 74.1%id, 0.0%wa, 0.0%hi, 0.1%si, 0.1%st Mem: 8388608k total, 8334844k used, 53764k free, 9800k buffers Swap: 12550136k total, 488k used, 12549648k free, 4936168k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7086 appmon 18 0 13256 7880 440 R 96.7 0.1 791:16.83 bzip2 7085 appmon 18 0 19452 1148 856 S 0.0 0.0 1:45.41 tar cjvf /nwk_storelogs/compressed_logs/compressed_logs_2016_30_04.tar.bz2 /nwk_storelogs/temp/ASPEN-GC-32459:nkp-aspn-1014.log /nwk_stor 30756 appmon 15 0 85952 1944 1000 S 0.0 0.0 0:00.00 sshd: appmon@pts/0 30757 appmon 15 0 64884 1816 1032 S 0.0 0.0 0:00.01 -tcsh
anu (362 rep)
May 4, 2016, 11:00 PM • Last activity: Jun 17, 2025, 09:12 AM
3 votes
1 answers
1285 views
How to pipe output of command to two separate commands and store outputs
I have a really long command that runs over a huge file and I am forced to run it twice which doubles the time it takes to run. This is what I am doing at the moment: ``` x=$(command | sort -u) y=$(command | sort -n) ``` I was wondering whether there is any way to redirect the output of command to b...
I have a really long command that runs over a huge file and I am forced to run it twice which doubles the time it takes to run. This is what I am doing at the moment:
x=$(command | sort -u)
y=$(command | sort -n)
I was wondering whether there is any way to redirect the output of command to both
-u
and
-n
and store output of each into separate variables or files like I did above with
and
. I tried to use tee to do the following but no luck:
command | tee >(sort -n > x.txt) >(sort -u > y.txt)
I tried to redirect output to text files but it just printed it to standard output instead. Any tips or ideas?
markovv.sim (35 rep)
Oct 27, 2020, 07:44 PM • Last activity: Feb 25, 2025, 06:09 PM
0 votes
1 answers
77 views
Could use docker for app isolation?
I am studying the use of Docker in a big scale project that is actually deployed on production. I never used docker before, but for what I read, it consinst about a new layout called "Container engine" that gives you the opportunity to deploy many aplications that are independent to each other and u...
I am studying the use of Docker in a big scale project that is actually deployed on production. I never used docker before, but for what I read, it consinst about a new layout called "Container engine" that gives you the opportunity to deploy many aplications that are independent to each other and use the resources of the host. In the case that I am working at, the machines that where our app is deployed can have different OS and architecture like; Windows, Linux, arm, Debian, etc... but they don't have any VM working on, just the OS and the aplications that we deployed. These machines can have 4-5 aplications running on the same system, having each one of them different dependencies. We had some problems already with that: for example with the file descriptors, where one app was taking the log writing from another app generating erroneous logs and crashing. These apps have communications with other parts of the machines via TCP/IP sockets and use gRPC, QPID and SFTP to communicate with another elements of the environment (external servers, own libraries, etc...). **I don't know if the use of this protocols would complicate the implementation of the docker in our system.** Talking with my work mates, they told me that is not worth as it would not bring any no optimisation or benefit, but I don't think so. I've been reading that by using containers, we get OS independence, making the app work on different OS using a docker image, library independence and therefore isolation between apps.
ShadowFurtive (13 rep)
Jul 4, 2024, 09:00 AM • Last activity: Jul 4, 2024, 09:28 AM
30 votes
4 answers
81406 views
How to compile without optimizations -O0 using CMake
I am using [Scientific Linux][1] (SL). I am trying to compile a project that uses a bunch of C++ (.cpp) files. In the directory `user/project/Build`, I enter `make` to compile and link all the .cpp files. I then have to go to `user/run/` and then type `./run.sh values.txt` To debug with GDB, I have...
I am using Scientific Linux (SL). I am trying to compile a project that uses a bunch of C++ (.cpp) files. In the directory user/project/Build, I enter make to compile and link all the .cpp files. I then have to go to user/run/ and then type ./run.sh values.txt To debug with GDB, I have to go to user/run and then type gdb ../project/Build/bin/Project and to run, I enter run -Project INPUT/inputfile.txt. However, I am trying to print out the value of variable using p variablename. However, I get the message s1 = . I have done some research online, and it seems I need to compile without optimizations using -O0 to resolve this. But where do I enter that? In the CMakeLists? If so, which CMakeLists? The one in project/Build or project/src/project?
user4352158 (471 rep)
Feb 28, 2015, 10:08 PM • Last activity: Jun 24, 2024, 07:12 AM
0 votes
0 answers
248 views
PostgreSQL: indexes and partitions
I have a PostgreSQL database and I noticed a weird behaviour while working with indexes and partitions. The engine version is 10.21. Now, I have a table with this structure: ``` guid varchar(50) PK guid_a varchar(50) data text part_key varchar(2) ``` There are other columns but they are irrelevant....
I have a PostgreSQL database and I noticed a weird behaviour while working with indexes and partitions. The engine version is 10.21. Now, I have a table with this structure:
guid varchar(50) PK
guid_a varchar(50)
data text
part_key varchar(2)
There are other columns but they are irrelevant. The query I have to run on this table looks like this'
select * from mytable where guid_a = 'jxxxxx-xxxxxxx' and data like '%7263628%';
Let me explain a things: The column guid_a contains a code that identifies a person in the format: 'jxxxx-xxxxxxx' where 'x' are numbers. The first two digits goes from 00 to 99, so, for example:
j01xxx-xxxxxx
j02xxx-xxxxxx
...
j99xxx-xxxxxx
I created an index on this column and then I also created an index using trgm module on the data column. Launching the query I get a giant improvement on the performance. Everything's good until now. I also decided to use partitions (the table has **6.4 million records**) and I created 99 partitions (by list) on the column part_key, which contains the first two digits only of the guid_a value. I obtained 99 partitions with each an average of 65 thousand rows. Each partition has the same indexes I talked about before. Improved the performance again. Obviously le query has another condition for the part_key, so that the engine knows which partition should query. Now the weird stuff. I removed the trgm index on the table without the partitions and, surprise surprise: it's faster. Even faster than the partitioned table. Even removing the trgm indexes on the partitioned table. What I noticed on the explain is that the query on the non-partitioned table is forcing the engine go for a index scan only (shouldn't then also make another scan for the second condition on the data table?). On the partitioned table, on the other hand, it goes for the hitman index scan, then it does a heap scan and then an append. This apparently costs more than indexing all the 6.4 million rows. I made different tests with different values but same results. **Performance**: On average: 11 ms on the partitioned table 9 ms on the non-partitioned table with one index only on the guid_a 20 ms on the non-partitioned table with two indexes, the second on the data column using trgm. What's going on here?
Federico Loro (1 rep)
Jan 20, 2023, 07:03 PM
1 votes
1 answers
191 views
Speed up grep usage inside bash script
I am currently working on creating a bash script that is supposed to process large log files from one of my programs. When I first started the script took around 15 seconds to complete which wasn't bad but I wanted to improve it. I implemented a queue with `mkfifo` and reduced the parse time to 6 se...
I am currently working on creating a bash script that is supposed to process large log files from one of my programs. When I first started the script took around 15 seconds to complete which wasn't bad but I wanted to improve it. I implemented a queue with mkfifo and reduced the parse time to 6 seconds. I wanted to ask you guys is there any way to improve the parsing speed of the script. The current version of the script:
#!/usr/bin/env bash
# $1 is server log file
# $2 is client logs file directory

declare -A orders_array

fifo=$HOME/.fifoDate-$$
mkfifo $fifo
# Queue for time conversion
exec 5> >(exec stdbuf -o0 date -f - +%s%3N >$fifo)
exec 6 >(exec stdbuf -o0 grep -oP '[0-9a-f]*-[0-9a-f]*-[0-9a-f]*-[0-9a-f]*-[0-9a-f]*' >$fifo)
exec 8&5 "${line:1:26}"
    read -t 1 -u6 converted_time
    orders_array[$order_id]=$converted_time
done &7 "$line"
    read -t 1 -u8 id
    echo >&5 "${line:1:26}"
    read -t 1 -u6 converted_time
    time_diff="$(($converted_time - orders_array[$id]))"
    echo "$id -> $time_diff ms"
done  GatewayCommon::States::Executed]
[2022-12-07 07:36:18.209567] [MarketOrderTransitionsa4ec2abf-059f-4452-b503-ae58da2ce1ff] [info] [log_action] [(lambda at ../subprojects/market_session/private_include/MarketSession/MarketOrderTransitions.hpp:57:25) for event: MarketMessages::OrderExecuted]
[2022-12-07 07:36:18.209574] [MarketOrderTransitionsa4ec2abf-059f-4452-b503-ae58da2ce1ff] [info] [log_process_event] [boost::sml::v1_1_0::back::on_entry]
the id is in square brackets after MarketOrderTransitions (a4ec2abf-059f-4452-b503-ae58da2ce1ff) Client
[2022-12-07 07:38:47.545433] [twap_algohawk] [info] [] [Event received (OrderExecuted): {"MessageType":"MarketMessages::OrderExecuted","averagePrice":"49.900000","counterPartyIds":{"activeId":"dIh5wYd/S4ChqMQSKMxEgQ**","executionId":"2295","inactiveId":"","orderId":"3dOKjIoURqm8JjWERtInkw**"},"cumulativeQuantity":"1200.000000","executedPrice":"49.900000","executedQuantity":"1200.000000","executionStatus":"Executed","instrument":[["Symbol","5"],["Isin","5"],["SecurityIDSource","4"],["Mic","MARS"]],"lastFillMarket":"MARS","leavesQuantity":"0.000000","marketSendTime":"07:38:31.972000000","orderId":"a4ec2abf-059f-4452-b503-ae58da2ce1ff","orderPrice":"49.900000","orderQuantity":"1200.000000","propagationData":[],"reportId":"Qx2k73f7QqCqcT0LTEJIXQ**","side":"Buy","sideDetails":"Unknown","transactionTime":"00:00:00.000000000"}]
The id in the client log is inside orderId tag (there is 2 of them and I use the second one) The wanted output is:
98ddcfca-d838-4e49-8f10-b9f780a27470 -> 854 ms
5a266ca4-67c6-4482-9068-788a3520b2f3 -> 18 ms
2e8d28de-eac0-4776-85ab-c75d9719b7c6 -> 58950 ms
409034eb-4e55-4e39-901a-eba770d497c0 -> 56172 ms
5b1dc7e8-fae0-43d2-86ea-d3df4dbe810b -> 52505 ms
5249ac24-39d2-40f5-8adf-dcf0410aebb5 -> 17446 ms
bef18cb3-8cef-4d8a-b244-47fed82f21ea -> 1691 ms
7c53c950-23fd-497e-a011-c07363d5fe02 -> 18194 ms
I am in particular concerned only about the "order executed" messages in the log files
Dzamba (11 rep)
Dec 12, 2022, 09:30 AM • Last activity: Dec 13, 2022, 03:32 PM
0 votes
2 answers
186 views
How to resize images on sequential functions?
I am trying to: - change the format of my images, - then resize their height and width of 40% - then optimize their quality to 35% max of the source and the total size of the image to 35% of the origin. Here my code: find . -name '*.png' -exec mogrify -format jpg {} + && find . -name '*.{jpeg,jpg}'...
I am trying to: - change the format of my images, - then resize their height and width of 40% - then optimize their quality to 35% max of the source and the total size of the image to 35% of the origin. Here my code: find . -name '*.png' -exec mogrify -format jpg {} + && find . -name '*.{jpeg,jpg}' -exec convert -resize 40% _resized.jpg {} + && find ./*.{jpeg,jpg} -exec jpegoptim -m 35% --size=35% {} \; The resizing -line 2- seems to fail. When I am looking at image property I am getting the same image dimensions. I expect that the new image: - is resized - have the original name + the "resized" word at the end of the name
Diagathe Josué (543 rep)
Apr 22, 2020, 10:11 PM • Last activity: Nov 12, 2022, 02:40 PM
0 votes
1 answers
353 views
Parse huge amounts of files efficiently
I have a folder that holds hunderds of thousands of files called `hp-temps.txt`. (There are also tons of subfolders) The content of these files looks like this for example: ``` Sensor Location Temp Threshold ------ -------- ---- --------- #1 PROCESSOR_ZONE 15C/59F 62C/143F #2 CPU#1 10C/50F 73C/163F...
I have a folder that holds hunderds of thousands of files called hp-temps.txt. (There are also tons of subfolders) The content of these files looks like this for example:
Sensor   Location              Temp       Threshold
------   --------              ----       ---------
#1        PROCESSOR_ZONE       15C/59F    62C/143F 
#2        CPU#1                10C/50F    73C/163F 
#3        I/O_ZONE             25C/77F    68C/154F 
#4        CPU#2                32C/89F    73C/163F 
#5        POWER_SUPPLY_BAY     9C/48F     55C/131F
I need to parse through all the files and find the highest entry for the Temperature in the #1 line. I have a working script but it takes a very long time, and I was wondering, if there is any way to improve it. Since I'm rather new in Shell Scripting, I imagine this code of mine is really inefficient:
#!/bin/bash
highesetTemp=0
temps=$(find $1 -name hp-temps.txt -exec cat {} + | grep 'PROCESSOR' | cut -c 32-33)
for t in $temps
do
  if [ $t -gt $highestTemp ]; then
    highestTemp=$t
  fi
done
**EDIT:** There has been a very efficient code but I forgot to mention that I not only need the biggest value. I would like to be able to loop through all the files, since I'd like to output the directory of the file and the temperature whenever a higher value is detected. So the output could look like this for example:
New MAX: 22 in /path/to/file/hp-temps.txt
New MAX: 24 in /another/path/hp-temps.txt
New MAX: 29 in /some/more/path/hp-temps.txt
Lumnezia (111 rep)
Sep 18, 2022, 01:32 PM • Last activity: Sep 19, 2022, 05:49 AM
2 votes
2 answers
1354 views
Set usb flash drive as non rotational drive
I'm trying to optimize the IO schedulers and to use a proper scheduler for rotational and for non rotational drives (different). When I run: cat /sys/block/sd*/queue/rotational I get: 1 <-- for sda 1 <-- for sdb although sdb is the usb flash drive and it shouldn't be rotational. $ udevadm info -a -n...
I'm trying to optimize the IO schedulers and to use a proper scheduler for rotational and for non rotational drives (different). When I run: cat /sys/block/sd*/queue/rotational I get: 1 <-- for sda 1 <-- for sdb although sdb is the usb flash drive and it shouldn't be rotational. $ udevadm info -a -n /dev/sda | grep queue ATTRS{queue_depth}=="31" ATTRS{queue_ramp_up_period}=="120000" ATTRS{queue_type}=="simple" $ udevadm info -a -n /dev/sdb | grep queue ATTRS{queue_depth}=="1" ATTRS{queue_type}=="none" so there is no such attribute as: ATTR{queue/rotational}=="0" or ...=="1"
user252842
Apr 21, 2018, 12:37 PM • Last activity: Jul 17, 2022, 03:48 PM
1 votes
1 answers
3859 views
Build the Linux kernel without gcc optimization
I follow one of many tutorials found on Google results to build and debug the Linux kernel with gcc and kgdb/gdb. And I end up by discovering that is all waste of time. Since I can't compile the kernel without gcc optimization -O0 neither -Og. There's no config option for removing optimization. And...
I follow one of many tutorials found on Google results to build and debug the Linux kernel with gcc and kgdb/gdb. And I end up by discovering that is all waste of time. Since I can't compile the kernel without gcc optimization -O0 neither -Og. There's no config option for removing optimization. And last but not leat, Linus said years ago that is against debugging. Saying that kgdb must exist for some reason. I was wondering if there is a way to get rid of variables/arguments "**optimized out**" and let the debugger step through the code sequentially and not jumping from everywhere to everywhere?
Joe Smith (11 rep)
Aug 27, 2020, 01:05 PM • Last activity: Jun 26, 2022, 02:02 PM
5 votes
3 answers
17700 views
Allocating Swap Space with KVM
Consider the following scenario: a host with 2 GiB runs a few guests using KVM.&#160; Each guest does usually not need much memory; they are given 256 MiB each and run services that mostly twiddle their thumbs.&#160; However, occasionally the guests need more memory.&#160; Right now, each guest has...
Consider the following scenario: a host with 2 GiB runs a few guests using KVM.  Each guest does usually not need much memory; they are given 256 MiB each and run services that mostly twiddle their thumbs.  However, occasionally the guests need more memory.  Right now, each guest has little RAM but its own swap space.  I noticed that a small portion of swap is used.  I never had problems with that configuration, but just out of curiosity: What is the optimal swap allocation strategy? 1. Assign each guest its own swap space from their respective disks, and assign the guests only little memory from the host.  (This is what I am doing now.) 2. Assign the host a larger amount of swap space and none to the guests, and assign more memory to the guests. Would memory ballooning help to improve memory performance?
countermode (7764 rep)
Jun 17, 2014, 01:01 AM • Last activity: May 17, 2022, 02:13 AM
0 votes
1 answers
832 views
CLI tool that compress the given image, whatever file type the image is (png, jpg, gif, webp, svg)?
I know that there are many tools to optimize an image: - pngquant - optipng - jpegoptim - gifsicle - exiftool - ecc but they are all specific for a certain file type. Is there a single command line that, whatever image type passed, it applies the right compression? Something similar to what https://...
I know that there are many tools to optimize an image: - pngquant - optipng - jpegoptim - gifsicle - exiftool - ecc but they are all specific for a certain file type. Is there a single command line that, whatever image type passed, it applies the right compression? Something similar to what https://compressor.io does but cli. With "optimize" I mean reducing the size of the overall file while keeping it visually nearly identical (thanks @Philippos).
nulll (235 rep)
May 2, 2022, 10:58 AM • Last activity: May 2, 2022, 03:44 PM
1 votes
0 answers
309 views
Pipewire using 2x processes when idling
I recently had my BT headphones stutter and cut out and realized that during CPU intensive processing, pipewire is dropping frames. My overall goal, then, is to streamline processing to make that situation happen rarely or never. With that in mind, I noticed today that pipewire has quite a bit of pr...
I recently had my BT headphones stutter and cut out and realized that during CPU intensive processing, pipewire is dropping frames. My overall goal, then, is to streamline processing to make that situation happen rarely or never. With that in mind, I noticed today that pipewire has quite a bit of processing going on, even while no audio is occurring: htop showing pipewire processes Combined, these processes are taking over 10% of one CPU. My question is twofold: 1. Is it normal to have 2 of each process running for pipewire (pipewire, pipewire-pulse, and pipewire-gnome-session)? If not, how can I reduce this to 1 each? 2. Why are these processes even taking any CPU when there is no system audio streaming anywhere? Is there a way to reduce CPU usage while idling?
Duane J (133 rep)
Dec 8, 2021, 07:28 PM
0 votes
1 answers
32 views
Timing desktop environment initialization
Is there a way to time desktop environment initialization and perhaps identify delaying candidates?
Is there a way to time desktop environment initialization and perhaps identify delaying candidates?
Sterling Butters (117 rep)
Aug 25, 2020, 08:55 PM • Last activity: Nov 2, 2021, 10:29 AM
1 votes
0 answers
430 views
Can't compile linux kernel with -Og/-O0 option for debugging purpoces
Having custom hardware running embedded Linux (OpenWrt) like a charm. CPU - is IMX6ULL (ArmV7) so it is supported by Jlink to debug over JTAG interface. Starting GDB server and step by step debugging Linux kernel shows lot of `optimized out` messages because of kernel compiled with `KBUILD_CFLAGS +=...
Having custom hardware running embedded Linux (OpenWrt) like a charm. CPU - is IMX6ULL (ArmV7) so it is supported by Jlink to debug over JTAG interface. Starting GDB server and step by step debugging Linux kernel shows lot of optimized out messages because of kernel compiled with KBUILD_CFLAGS += -O2 -fno-reorder-blocks -fno-tree-ch $(EXTRA_OPTIMIZATION) flag. So I am trying to compile it with -O0 that provides me following option:
$ make -j64 V=s all
  CHK     include/config/kernel.release
  CHK     include/generated/uapi/linux/version.h
  CC      scripts/mod/empty.o
  ....
  AR      built-in.o
  LD      vmlinux.o
  MODPOST vmlinux.o
  WARNING: modpost: Found 4 section mismatch(es).
  To see full details build your kernel with:
  'make CONFIG_DEBUG_SECTION_MISMATCH=y'
  arm-openwrt-linux-muslgnueabi-ld: arch/arm/kernel/setup.o: in function `setup_arch':
/opt/eclipse/imx6ull-openwrt/build_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/linux-imx6ull_cortexa7/linux-4.14.199/arch/arm/kernel/setup.c:1134: undefined reference to `psci_smp_ops'
  arm-openwrt-linux-muslgnueabi-ld: /opt/eclipse/imx6ull-openwrt/build_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/linux-imx6ull_cortexa7/linux-4.14.199/arch/arm/kernel/setup.c:1134: undefined reference to `psci_smp_ops'
  arm-openwrt-linux-muslgnueabi-ld: kernel/panic.o: in function `__xchg':
/opt/eclipse/imx6ull-openwrt/build_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/linux-imx6ull_cortexa7/linux-4.14.199/./arch/arm/include/asm/cmpxchg.h:110: undefined reference to `__bad_xchg'
  arm-openwrt-linux-muslgnueabi-ld: kernel/exit.o: in function `__xchg':
Checked for WARNING: modpost: Found x section mismatch(es). here . Seems that resulted binary file takes more space then configured by some settings. The vmlinux size built with -O2 option is 39Mb. Using -O1 gives me 37Mb image so I hope there is enough space in my DDR3 RAM (128Mb) to feet for even bigger image compiled with -O0 configuration. So I am wondering about way to provide more space for sections? Could someone please point me to place I can do it? Have limited knowledge about Linux kernel so was unable to find any linker script used for that.
user3583807 (111 rep)
Oct 22, 2021, 12:06 PM
1 votes
2 answers
682 views
How to Build Latest HandBrake on linux with FDO (PGO) + LTO?
Passing CFLAGS and CXXFLAGS to a HandBrake build for the latest version (v1.3.3 at the time of this writing) will work until you add `-flto` which will **FAIL** the whole build. How to build HandBrake with LTO option `-flto` and as a stretch goal, with FDO as well (feedback directed optimisation aka...
Passing CFLAGS and CXXFLAGS to a HandBrake build for the latest version (v1.3.3 at the time of this writing) will work until you add -flto which will **FAIL** the whole build. How to build HandBrake with LTO option -flto and as a stretch goal, with FDO as well (feedback directed optimisation aka FDO aka PGO)? Most of the codecs within HandBrake are developed with "hand-coded" assembly, so many assert that the compiler optimisation gains would not be that much. I would like to test and challenge that assertion!
DanglingPointer (262 rep)
Jun 23, 2021, 12:56 AM • Last activity: Aug 2, 2021, 12:52 AM
0 votes
0 answers
198 views
Why integer division is faster than bitwise shift in shell?
I'm comparing performance of `bash` and `dash` (default `sh` in Xubuntu 18.04). - I expect `sh` to be faster than `bash` - I expect bitwise shift to be faster than division operator. However, I'm getting inconsistencies: ``` λ hyperfine --export-markdown a.md -w 3 ./* Benchmark #1: ./calc-div.bash T...
I'm comparing performance of bash and dash (default sh in Xubuntu 18.04). - I expect sh to be faster than bash - I expect bitwise shift to be faster than division operator. However, I'm getting inconsistencies:
λ hyperfine --export-markdown a.md -w 3 ./*
Benchmark #1: ./calc-div.bash
  Time (mean ± σ):      2.550 s ±  0.033 s    [User: 2.482 s, System: 0.068 s]
  Range (min … max):    2.497 s …  2.595 s    10 runs

Benchmark #2: ./calc-div.sh
  Time (mean ± σ):      2.063 s ±  0.016 s    [User: 2.063 s, System: 0.000 s]
  Range (min … max):    2.043 s …  2.100 s    10 runs

Benchmark #3: ./calc-shift.bash
  Time (mean ± σ):      3.312 s ±  0.034 s    [User: 3.255 s, System: 0.057 s]
  Range (min … max):    3.274 s …  3.385 s    10 runs

Benchmark #4: ./calc-shift.sh
  Time (mean ± σ):      2.087 s ±  0.046 s    [User: 2.086 s, System: 0.001 s]
  Range (min … max):    2.058 s …  2.211 s    10 runs

Summary
  './calc-div.sh' ran
    1.01 ± 0.02 times faster than './calc-shift.sh'
    1.24 ± 0.02 times faster than './calc-div.bash'
    1.61 ± 0.02 times faster than './calc-shift.bash'
| Command | Mean [s] | Min [s] | Max [s] | Relative | |:---|---:|---:|---:|---:| | ./calc-div.bash | 2.550 ± 0.033 | 2.497 | 2.595 | 1.24 ± 0.02 | | ./calc-div.sh | 2.063 ± 0.016 | 2.043 | 2.100 | 1.00 | | ./calc-shift.bash | 3.312 ± 0.034 | 3.274 | 3.385 | 1.61 ± 0.02 | | ./calc-shift.sh | 2.087 ± 0.046 | 2.058 | 2.211 | 1.01 ± 0.02 | Here are the scripts I tested: calc-div.bash
#!/usr/bin/env bash

for i in {1..1000000}; do
    _=$(( i / 1024 ))
done
calc-div.sh
#!/usr/bin/env sh

i=1
while [ $i -le 1000000 ]; do
    _=$(( i / 1024 ))
    i=$(( i + 1 ))
done
calc-shift.bash
#!/usr/bin/env bash

for i in {1..1000000}; do
    _=$(( i >> 10 ))
done
calc-shift.sh
#!/usr/bin/env sh

i=1
while [ $i -le 1000000 ]; do
    _=$(( i >> 10 ))
    i=$(( i + 1 ))
done
This difference is more visible for 5000000: | Command | Mean [s] | Min [s] | Max [s] | Relative | |:---|---:|---:|---:|---:| | ./calc-div.bash | 13.333 ± 0.202 | 12.870 | 13.584 | 1.23 ± 0.02 | | ./calc-div.sh | 10.830 ± 0.119 | 10.750 | 11.150 | 1.00 | | ./calc-shift.bash | 17.361 ± 0.357 | 16.995 | 18.283 | 1.60 ± 0.04 | | ./calc-shift.sh | 11.226 ± 0.351 | 10.834 | 11.958 | 1.04 ± 0.03 |
Summary
  './calc-div.sh' ran
    1.04 ± 0.03 times faster than './calc-shift.sh'
    1.23 ± 0.02 times faster than './calc-div.bash'
    1.60 ± 0.04 times faster than './calc-shift.bash'
As you can see, for both bash and dash, division operator is faster than equivalent bitwise-shift to the right.
Zeta.Investigator (1190 rep)
Jun 30, 2021, 02:58 PM • Last activity: Jun 30, 2021, 03:13 PM
1 votes
0 answers
627 views
Is jq internal sort slower than GNU sort?
While filtering through [this json file](https://iptv-org.github.io/iptv/channels.json) I did a [benchmark](https://github.com/sharkdp/hyperfine) and found out utilizing jq's internal `sort` and `unique` method is actually **25% slower** than `sort --unique`! | Command | Mean [ms] | Min [ms] | Max [...
While filtering through [this json file](https://iptv-org.github.io/iptv/channels.json) I did a [benchmark](https://github.com/sharkdp/hyperfine) and found out utilizing jq's internal sort and unique method is actually **25% slower** than sort --unique! | Command | Mean [ms] | Min [ms] | Max [ms] | Relative | |:---|---:|---:|---:|---:| | jq "[.[].category] \| sort \| unique" channels.json | 172.0 ± 2.6 | 167.8 | 176.8 | 1.25 ± 0.06 | | jq "[.[].category \| select((. != null) and (. != \"XXX\"))] \| sort \| unique" channels.json | 151.9 ± 4.1 | 146.5 | 163.9 | 1.11 ± 0.06 | | jq ".[].category" channels.json \| sort -u | 137.2 ± 6.6 | 131.8 | 156.6 | 1.00 |
Summary
  'jq ".[].category" channels.json | sort -u' ran
    1.11 ± 0.06 times faster than 'jq "[.[].category | select((. != null) and (. != \"XXX\"))] | sort | unique" channels.json'
    1.25 ± 0.06 times faster than 'jq "[.[].category] | sort | unique" channels.json'
test command:
hyperfine --warmup 3 \
    'jq "[.[].category] | sort | unique" channels.json'  \
    'jq "[.[].category | select((. != null) and (. != \"XXX\"))] | sort | unique" channels.json' \
    'jq ".[].category" channels.json | sort -u'
If we only test sort (without uniqueness), again jq is **9% slower** than sort: | Command | Mean [ms] | Min [ms] | Max [ms] | Relative | |:---|---:|---:|---:|---:| | jq "[.[].category] \| sort" channels.json | 133.9 ± 1.6 | 131.1 | 138.2 | 1.09 ± 0.02 | | jq ".[].category" channels.json \| sort | 123.0 ± 1.3 | 120.5 | 125.7 | 1.00 |
Summary
  'jq ".[].category" channels.json | sort' ran
    1.09 ± 0.02 times faster than 'jq "[.[].category] | sort" channels.json'
versions:
jq-1.5-1-a5b5cbe
sort (GNU coreutils) 8.28
I expected using jq's internal functions would result in a faster processing than piping into an external app which itself should be spawned. Am I using jq poorly? **update** Just repeated this experiment on host with FLASH storage, Arm CPU and these versions:
jq-1.6
sort (GNU coreutils) 8.32
result:
Benchmark #1: jq "[.[].category] | sort" channels.json
  Time (mean ± σ):     587.8 ms ±   3.9 ms    [User: 539.5 ms, System: 44.2 ms]
  Range (min … max):   582.8 ms … 594.2 ms    10 runs
 
Benchmark #2: jq ".[].category" channels.json | sort
  Time (mean ± σ):     606.0 ms ±   8.6 ms    [User: 569.5 ms, System: 49.0 ms]
  Range (min … max):   589.6 ms … 616.2 ms    10 runs
 
Summary
  'jq "[.[].category] | sort" channels.json' ran
    1.03 ± 0.02 times faster than 'jq ".[].category" channels.json | sort'
Now jq sort runs 3% faster than GNU sort :D
Zeta.Investigator (1190 rep)
Jun 26, 2021, 08:39 AM • Last activity: Jun 26, 2021, 08:28 PM
1 votes
0 answers
56 views
How can I frequently update a file without harming my disk?
I've got an i3 install on a laptop with an SSD. Currently I have it configured to save the WM layout on many different events. The tool that I'm using to do this is built on Python, and I'm just running it with an ampersand through the i3 config. However, I'm concerned that this will hurt the longev...
I've got an i3 install on a laptop with an SSD. Currently I have it configured to save the WM layout on many different events. The tool that I'm using to do this is built on Python, and I'm just running it with an ampersand through the i3 config. However, I'm concerned that this will hurt the longevity of my disk. I understand a bit about the "virtual filesystem", but I'm not really sure how that applies here. Should I be concerned about my disk? And if so, how can I change my setup to avoid that, while still being able to frequently update this table (which is stored as multiple json files). Thanks!
wknr (11 rep)
May 17, 2021, 02:52 PM • Last activity: May 18, 2021, 11:12 AM
-1 votes
2 answers
46 views
Optimize time when creating archive
Currently I am using the following command to create an archive with files older than 7 days: `find /var/tunningLog/ -type f -mtime +7 -print0 | tar -czf "/var/tunningLog/$(date '+%Y-%m-%d').tar.gz" --null -T - && echo "OK" || echo "NOK"` But it is taking to long (currently `/var/tunningLog/` has 49...
Currently I am using the following command to create an archive with files older than 7 days: find /var/tunningLog/ -type f -mtime +7 -print0 | tar -czf "/var/tunningLog/$(date '+%Y-%m-%d').tar.gz" --null -T - && echo "OK" || echo "NOK" But it is taking to long (currently /var/tunningLog/ has 49G). Is there any way to speed up the process or to improve the command? Thx
dejanualex (369 rep)
Jan 14, 2021, 02:56 PM • Last activity: Jan 28, 2021, 02:54 PM
Showing page 1 of 20 total questions