Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

0 votes

3 answers

497 views

Does the time command include the memory claimed by forked processes?

I want to benchmark some scripts with the `time` command. I am wondering if this command catches child processes' memory usage. ``` command time -f '%M' python my_script.py ``` If not, what are my options? Is `valgrind` suitable for this purpose? I also don't want to double count copy-on-write memor...

I want to benchmark some scripts with the time command. I am wondering if this command catches child processes' memory usage.

command time -f '%M' python my_script.py

If not, what are my options? Is valgrind suitable for this purpose? I also don't want to double count copy-on-write memory that is not actually filling up space.

HappyFace (1694 rep)

Jan 19, 2022, 12:29 PM • Last activity: Jul 13, 2025, 01:44 PM

1 votes

1 answers

1916 views

Printing milliseconds with GNU time

time benchmark

I am using GNU time for benchmarking and would like to measure real, user and sys time to the nearest millisecond. That is, I want to measure seconds to 3 decimal places, not the default 2. Does GNU time offer such an option?

                                  I am using GNU time for benchmarking and would like to measure real, user and sys time to the nearest millisecond. That is, I want to measure seconds to 3 decimal places, not the default 2. Does GNU time offer such an option?
                                

Jacob Baird (11 rep)

Aug 16, 2019, 03:44 PM • Last activity: Jun 7, 2025, 03:05 PM

0 votes

1 answers

3042 views

How to read the data from SysBench And UnixBench when testing VPS

linux vps benchmark

i want to test several linux VPS using bench mark tools as i reead there are 2 indestry standart tools called unixBench and SysBench I compiled them and executed them on the VPS And i have results : SysBench:( 4 CPU) ./sysbench --test=cpu --cpu-max-prime=20000 --num-threads=4 run The result : Genera...

                                  i want to test several linux VPS using bench mark tools 
as i reead there are 2 indestry standart tools called unixBench and SysBench
I compiled them and executed them on the VPS 
And i have results :
SysBench:( 4 CPU)

    ./sysbench --test=cpu --cpu-max-prime=20000 --num-threads=4 run

The result :

    General statistics:
        total time:                          3.222s
        total number of events:              10000
    
    Latency (ms):
             min:                                  1.64
             avg:                                  5.76
             max:                                  6.19
             95th percentile:                      3.00
             sum:                              60000.86
    
    Threads fairness:
        events (avg/stddev):           30000.0000/2.00
        execution time (avg/stddev):   8.0002/0.00

from reading i know that the important info is in :  total time: 3.222s
ok .. but compared to what ? 
how can i know that this is good result ? 
also what about the other parameters ? like 95th percentile 
what does it means ? 


Now running UnixBench ( 4 CPU )

    ./Run -c 4

The result : 
 

    BYTE UNIX Benchmarks (Version 5.1.3)
    
       System: ip-10-0-1-48: GNU/Linux
       OS: GNU/Linux -- 3.14.48-33.39.amzn1.x86_64 -- #1 SMP Tue Jul 14 23:43:07 UTC 2015
       Machine: x86_64 (x86_64)
       Language: en_US.UTF-8 (charmap="UTF-8", collate="UTF-8")
       CPU 0:  info .. 
       CPU 1:  info .. 
       CPU 2:  info .. 
       CPU 3:  info .. 
       
    
    ------------------------------------------------------------------------
    Benchmark Run: Wed Apr 12 2017 
    4 CPUs in system; running 4 parallel copies of tests
    
    Dhrystone 2 using register variables       74325935.8 lps   (10.0 s, 7 samples)
    Double-Precision Whetstone                    13710.8 MWIPS (9.9 s, 7 samples)
    Execl Throughput                               3528.0 lps   (30.0 s, 2 samples)
    File Copy 1024 bufsize 2000 maxblocks        422092.9 KBps  (30.0 s, 2 samples)
    File Copy 256 bufsize 500 maxblocks          107334.5 KBps  (30.0 s, 2 samples)
    File Copy 4096 bufsize 8000 maxblocks       1485937.1 KBps  (30.0 s, 2 samples)
    Pipe Throughput                              998109.2 lps   (10.0 s, 7 samples)
    Pipe-based Context Switching                 162959.5 lps   (10.0 s, 7 samples)
    Process Creation                               7151.7 lps   (30.0 s, 2 samples)
    Shell Scripts (1 concurrent)                   6494.3 lpm   (60.0 s, 2 samples)
    Shell Scripts (8 concurrent)                    880.4 lpm   (60.1 s, 2 samples)
    System Call Overhead                         900145.3 lps   (10.0 s, 7 samples)
    
    System Benchmarks Index Values               BASELINE       RESULT    INDEX
    Dhrystone 2 using register variables         116700.0   74325935.8   6369.0
    Double-Precision Whetstone                       55.0      13710.8   2492.9
    Execl Throughput                                 43.0       3528.0    820.5
    File Copy 1024 bufsize 2000 maxblocks          3960.0     422092.9   1065.9
    File Copy 256 bufsize 500 maxblocks            1655.0     107334.5    648.5
    File Copy 4096 bufsize 8000 maxblocks          5800.0    1485937.1   2562.0
    Pipe Throughput                               12440.0     998109.2    802.3
    Pipe-based Context Switching                   4000.0     162959.5    407.4
    Process Creation                                126.0       7151.7    567.6
    Shell Scripts (1 concurrent)                     42.4       6494.3   1531.7
    Shell Scripts (8 concurrent)                      6.0        880.4   1467.3
    System Call Overhead                          15000.0     900145.3    600.1
                                                                       ========
    System Benchmarks Index Score                                        1157.3 


Here again i know i should look at the :
System Benchmarks Index Score                                        1157.3

but again the question raised this result is compared to what ? 
how should i know if this total result is good ? bad ? average ? 
Thanks 
                                

user63898 (343 rep)

Apr 16, 2017, 11:51 AM • Last activity: May 10, 2025, 07:02 AM

64 votes

8 answers

176765 views

How can I benchmark my HDD?

linux hard-disk benchmark

I've seen commands to benchmark one's HDD such as this using `dd`: $ time sh -c "dd if=/dev/zero of=ddfile bs=8k count=250000 && sync" Are there better methods to do so than this?

                                  I've seen commands to benchmark one's HDD such as this using dd:

    $ time sh -c "dd if=/dev/zero of=ddfile bs=8k count=250000 && sync"

Are there better methods to do so than this?

slm (378955 rep)

Jan 11, 2014, 03:50 AM • Last activity: Feb 25, 2025, 06:53 PM

1 votes

3 answers

1749 views

help with iperf and infiniband and multiple NIC

networking benchmark

I have two Dell servers in a rack in my server room: - **RHEL 7.9 x86-64** and **iperf-2.0.13-1.el7.x86_64** - hostnames are `A` and `B` - each has - 1 10GbE intel nic, having 2 ports - 1 1Gbps intel nic {traditional} having 2 ports - 1 mellanox infiniband card - my network naming is - em1, 10GbE po...

                                  I have two Dell servers in a rack in my server room:

 - **RHEL 7.9 x86-64** and **iperf-2.0.13-1.el7.x86_64**
 - hostnames are A and B
 - each has
   - 1 10GbE intel nic, having 2 ports
   - 1 1Gbps intel nic {traditional} having 2 ports
   - 1 mellanox infiniband card
 - my network naming is
   - em1, 10GbE port 1
   - em2, 10GbE port 2
   - em3, 1Gbps port 1, **static ip 192.168.1.1 / 255.255.255.0**
   - em4, 1Gbps port2
   - ib0, mellanox infiniband 100 gbps, **static ip 192.168.2.1 / 255.255.255.0**
   - A is 192.168.x.1 and B is 192.168.x.2 where x = 1 for 1Gbps and x = 2 for infiniband
   - I can ping between each and also scp between either on the infiniband 192.168.2.x subnet.

On A if I simply do iperf -s and on B I do iperf -c 192.168.1.1 is works and says 942 mbits/sec.  But on B if I do iperf -c 192.168.2.1 I get no route to host.

How do I use iperf in this scenario to see the transfer speed over specific network interfaces?  Specifically my ib0?  And eventually over my em1 10GbE when I get that set up?
                                

ron (8647 rep)

Mar 10, 2021, 08:38 PM • Last activity: Feb 4, 2025, 07:56 PM

0 votes

2 answers

387 views

Is Debian 12 performance slower than older versions?

debian benchmark

I recenly tried to upgrade a server from Debian 9 to latest version (12). After installing on a fresh server with the same hardware and without installing absolutely any packages, I faced with a lower performance of about 8 times. I tested with a couple of benchmarks. For example: # time openssl gen...

                                  I recenly tried to upgrade a server from Debian 9 to latest version (12). After installing on a fresh server with the same hardware and without installing absolutely any packages, I faced with a lower performance of about 8 times.

I tested with a couple of benchmarks. For example:

    # time openssl genrsa 4096

The result on Deb 9:

    real    0m0.377s
    user    0m0.364s
    sys     0m0.008s

And on Deb 12:

    real    0m2.803s
    user    0m2.795s
    sys     0m0.004s

I tested also on Debian 11 which was more like Debian 9. It seems there is a problem with Debian 12. Does anybody know what's happening?
                                

Mehrdad Seyrafi (1 rep)

Jan 22, 2025, 11:13 PM • Last activity: Jan 24, 2025, 07:19 AM

0 votes

0 answers

20 views

How to measure ram usage of a command

ram benchmark

How to measure the memory usage of a running command in terminal? Does `/usr/bin/time -v` measure the RAM usage of a command?

                                  How to measure the memory usage of a running command in terminal? Does /usr/bin/time -v measure the RAM usage of a command?
                                

MobiusT (1 rep)

Jan 18, 2025, 03:53 PM

56 votes

6 answers

76895 views

Benchmark ssd on linux: How to measure the same things as crystaldiskmark does in windows

performance dd ssd benchmark

I want to benchmark a ssd (possibly with encrypted filesystems) and compare it to benchmarks done by crystaldiskmark on windows. ![CrystalDiskMark on Windows][1] So how can I measure approximately the same things as crystaldiskmark does? For the first row (Seq) I think I could do something like LC_A...

                                  I want to benchmark a ssd (possibly with encrypted filesystems) and compare it to benchmarks done by crystaldiskmark on windows. 

So how can I measure approximately the same things as crystaldiskmark does?

For the first row (Seq) I think I could do something like

    LC_ALL=C dd if=/dev/zero of=tempfile bs=1M count=1024 conv=fdatasync,notrunc
    
    sudo su -c "echo 3 > /proc/sys/vm/drop_caches"
    LC_ALL=C dd if=tempfile of=/dev/null bs=1M count=1024

But I am not sure about the dd parameters.

For the random 512KB, 4KB, 4KB (Queue Depth=32) reads/writes speed-tests I don't have any idea how to reproduce the measurements in linux? So how can I do this?

For testing reading speeds something like sudo hdparm -Tt /dev/sda doesn't seem to make sense for me since I want for example benchmark something like encfs mounts.

**Edit** 

@Alko, @iain

Perhaps I should write something about the motivation about this question: I am trying to benchmark my ssd and compare some encryption solutions. But that's another question (https://unix.stackexchange.com/q/94465/5289) . While surfing in the web about ssd's and benchmarking I have often seen users posting their CrystelDiskMark results in forums. So this is the only motivation for the question. I just want to do the same on linux. For my particular benchmarking see my other question.

student (18865 rep)

Oct 6, 2013, 09:05 AM • Last activity: Jan 14, 2025, 05:05 PM

0 votes

1 answers

223 views

How can I benchmark disk throughput using dd? (writing to /dev/null is instantaneous?)

dd disk benchmark

I'm trying to perform a simple disk benchmark (throughput) of my ZFS filesystem using dd. Here is my command: ``` dd if=/tank/media/video/largefile.mp4 of=/dev/null bs=1M count=1000 ``` However, it finishes instanteously: ```1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB, 1000 MiB) co...

I'm trying to perform a simple disk benchmark (throughput) of my ZFS filesystem using dd. Here is my command:

dd if=/tank/media/video/largefile.mp4 of=/dev/null bs=1M count=1000

However, it finishes instanteously:

+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 0.18307 s, 5.7 GB/s

This is obviously not actually reading the file because that speed is unrealistic. How can I perform a read-only test for throughput? I don't want to write the file anywhere because then I'm also testing write speed. Also, what is a typical blocksize to use? (for example, to simulate copying over a network using SMB or similar)

SofaKng (343 rep)

Oct 28, 2024, 06:02 PM • Last activity: Oct 28, 2024, 06:16 PM

0 votes

1 answers

68 views

Linux Fedora suddenly much slower... only on one computer

fedora memory cpu benchmark slowdown

I have two computers running fully updated Fedora 40. The two computers have the exact same packages installed. They are not identical, but very similar and bought at the same time: Dell Optiplex 7000 (SSF vs Micro). They have always performed essentially identically. Yet, this week one of the two b...

                                  I have two computers running fully updated Fedora 40. The two computers have the exact same packages installed. They are not identical, but very similar and bought at the same time: Dell Optiplex 7000 (SSF vs Micro). They have always performed essentially identically.

Yet, this week one of the two began to behave *MUCH* slower than the other. Around 1/3 the speed. Commands top, free, and the benchmarks for the NVME SSD's show no sign of problems. It does not seem to be any process slowing down the computer.

The one that has problems is the Micro, that has 16GB of DDR5 RAM vs 24GB of DDR4 of the SFF.

I tried working in different NVMEs, I tried switching to an older kernel (kernel-6.10.11 vs kernel-6.11.3)... Nothing...

Any ideas?

Luis A. Florit (509 rep)

Oct 17, 2024, 12:13 AM • Last activity: Oct 23, 2024, 04:27 PM

0 votes

1 answers

50 views

Stop system interrupts on app

linux interrupt benchmark

I am using Linux on a microcontroller with 2 A72 cores, running my app on that, I am trying to measure runtime of a function in app using system call: clock_gettime(CLOCK_PROCESS_CPUTIME_ID , &tv); before and after the function, and calculate the CPU time function used, I do this for thousands of ti...

                                  I am using Linux on a microcontroller with 2 A72 cores, running my app on that, I am trying to measure runtime of a function in app using system call:

    clock_gettime(CLOCK_PROCESS_CPUTIME_ID , &tv);

before and after the function, and calculate the CPU time function used, I do this for thousands of times to profile this function over long run of input data, it happens  that I get very large runtime readings randomly in each run(sometimes not, but the overall curve of data is same and consistent), my explanation that this is system interrupts corrupts my runtime measuring, what do you think ?how to disable it?
I tried to give my app highest priority but still have the issue

mr.Arrow (1 rep)

Aug 24, 2024, 07:12 AM • Last activity: Aug 24, 2024, 10:22 AM

4 votes

1 answers

6708 views

Can I use fio on a mounted device?

storage benchmark

I'm using fio to get some broad read IOPS performance data for various storage configurations like this: fio --name=readiops --filename=/dev/md1 --direct=1 --rw=randread --bs=4k --numjobs=4 --iodepth=32 --direct=1 --iodepth_batch=16 --iodepth_batch_complete=16 \ --runtime=100 --ramp_time=5 --norando...

                                  I'm using fio to get some broad read IOPS performance data for various storage configurations like this:

    fio --name=readiops --filename=/dev/md1 --direct=1 --rw=randread --bs=4k --numjobs=4 --iodepth=32 --direct=1 --iodepth_batch=16 --iodepth_batch_complete=16 \
        --runtime=100 --ramp_time=5 --norandommap --time_based --ioengine=libaio --group_reporting
    readiops: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
    ...
    fio-2.1.11
    Starting 4 processes
    Jobs: 4 (f=4): [r(4)] [100.0% done] [3504MB/0KB/0KB /s] [897K/0/0 iops] [eta 00m:00s]
    readiops: (groupid=0, jobs=4): err= 0: pid=10458: Thu Jan 15 05:49:28 2015
    ...

I'd like to compare the figures I'm getting with an array that is in production use at a quiet time, is this possible to do without affecting the data on the array?

There is a --readonly option (duplicated for some reason) [in the man page](http://linux.die.net/man/1/fio)  but it isn't 100% clear to me that this is what I'm after:

> --readonly  
>     Enable read-only safety checks.   
> ...  
> --readonly  
>     Turn on safety read-only checks, preventing any attempted write.
                                

user12810

Jan 15, 2015, 06:01 AM • Last activity: Jun 27, 2024, 12:07 AM

25 votes

2 answers

1808 views

Why are my benchmark times not repeatable, even for a CPU-bound task?

linux cpu cpu-frequency benchmark

I'm running some benchmarks on my Linux desktop/laptop computer, but I'm not getting reliable results. I'm running a CPU-intensive task that does negligible I/O and doesn't use much RAM. My computer is multicore and not doing much. So I expect only slight variations between runs. But I see huge vari...

$ bash -c 'x=0; time while ((x < 999999)); do ((++x)); done'

real    0m2.281s
user    0m2.279s
sys     0m0.001s
$ bash -c 'x=0; time while ((x < 999999)); do ((++x)); done'

real    0m0.906s
user    0m0.906s
sys     0m0.000s
$ bash -c 'x=0; time while ((x < 999999)); do ((++x)); done'

real    0m1.030s
user    0m1.030s
sys     0m0.000s

There seems to be a cluster of fast times and a cluster of slow times, with the occasional thing in between. The variability between fast times is small enough for my purposes, and the variability between slow times is small enough for my purposes. But I can't work reliably when I don't know if I'm getting a slow time or a fast time or an in-between time. The variability is in user time, and the real time is almost equal to the user time (it's a single-threaded task). So the problem isn't that some other process is sharing the CPU. What's going on, and how can I get reliable benchmarks for a CPU-bound task on my PC?

Gilles 'SO- stop being evil' (862317 rep)

May 30, 2024, 05:16 PM • Last activity: Jun 1, 2024, 12:50 PM

11 votes

2 answers

12706 views

Telling Linux kernel not to use certain CPUs

linux-kernel cpu benchmark

I'm trying to run some benchmarks on a multicore machine and I'd like to tell the Linux kernel to simply avoid certain cores *unless* explicitly told to use them. The idea is that I could set aside a handful of cores (the machine has 6 physical cores) for benchmarking and use cpu mask to allow only...

                                  I'm trying to run some benchmarks on a multicore machine and I'd like to tell the Linux kernel to simply avoid certain cores *unless* explicitly told to use them. 

The idea is that I could set aside a handful of cores (the machine has 6 physical cores) for benchmarking and use cpu mask to allow only benchmark processes onto the given cores.

Is this feasible?

Lajos Nagy (283 rep)

Jun 9, 2015, 07:15 PM • Last activity: Jun 1, 2024, 09:38 AM

0 votes

1 answers

875 views

io_uring with `fio` fails on Rocky 9.3 w/kernel 5.14.0-362.18.1.el9_3.x86_64

linux-kernel io storage benchmark fio

I've tried various variations of the command: ```bash fio --name=test --ioengine=io_uring --iodepth=64 --rw=rw --bs=4k --direct=1 --size=2G --numjobs=24 --filename=/dev/sdc ``` - lower queue depth - direct set to 1/0 - lower numjobs - `setenforce 0` just in case SELinux was a problem but all yield:...

I've tried various variations of the command:

fio --name=test --ioengine=io_uring --iodepth=64 --rw=rw --bs=4k --direct=1 --size=2G --numjobs=24 --filename=/dev/sdc

- lower queue depth - direct set to 1/0 - lower numjobs - setenforce 0 just in case SELinux was a problem but all yield:

test: (g=0): rw=rw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=64
...
fio-3.35
Starting 24 processes
fio: pid=71823, err=1/file:engines/io_uring.c:1047, func=io_queue_init, error=Operation not permitted

I have confirmed my host supports io_uring:

[root@r7525-raid tmp]# grep io_uring_setup /proc/kallsyms
ffffffffaa7d4300 t __pfx_io_uring_setup
ffffffffaa7d4310 t io_uring_setup
ffffffffaa7d43a0 T __pfx___ia32_sys_io_uring_setup
ffffffffaa7d43b0 T __ia32_sys_io_uring_setup
ffffffffaa7d4430 T __pfx___x64_sys_io_uring_setup
ffffffffaa7d4440 T __x64_sys_io_uring_setup
ffffffffaae1b3ef t io_uring_setup.cold
ffffffffac2b0180 d event_exit__io_uring_setup
ffffffffac2b0220 d event_enter__io_uring_setup
ffffffffac2b02c0 d __syscall_meta__io_uring_setup
ffffffffac2b0300 d args__io_uring_setup
ffffffffac2b0310 d types__io_uring_setup
ffffffffacabbc68 d __event_exit__io_uring_setup
ffffffffacabbc70 d __event_enter__io_uring_setup
ffffffffacabdd38 d __p_syscall_meta__io_uring_setup
ffffffffacac1cd0 d _eil_addr___ia32_sys_io_uring_setup
ffffffffacac1ce0 d _eil_addr___x64_sys_io_uring_setup

Running with libaio against the same target works without issue. I have not yet read through the code for io_queue_init. Is there a trick to getting io_uring up and running with fio? I haven't yet read through the code for io_queue_init to see exactly what is failing.

Grant Curell (769 rep)

Apr 18, 2024, 02:24 PM • Last activity: May 13, 2024, 12:18 PM

0 votes

0 answers

61 views

IOzone Filesystem Benchmark WEIRD Results on EXT4

filesystems benchmark vfs iozone

I'm currently using an i7 with 32 GB of RAM to measure ext4 performance. However, I've encountered some unusual results when testing with 8 GB and 16 GB files. Shouldn't the performance increase instead of decrease? The buffer cache should be taking action and improving the results, not making them...

                                  I'm currently using an i7 with 32 GB of RAM to measure ext4 performance. However, I've encountered some unusual results when testing with 8 GB and 16 GB files.

Shouldn't the performance increase instead of decrease? The buffer cache should be taking action and improving the results, not making them worse. In the 16 GB scenario, the file size was 18 GB, and I have no desktop environment; it's solely a testing build.

[![Read Report]]

  [3[![\]]]: " class="img-fluid rounded" style="max-width: 100%; height: auto; margin: 10px 0;" loading="lazy">

DemonioValero (1 rep)

Apr 26, 2024, 02:42 PM • Last activity: Apr 26, 2024, 02:45 PM

11 votes

4 answers

17917 views

Find the lightest desktop environment

memory desktop-environment benchmark lightweight

A [high-votes answer][1] here for the question "What's the lightest desktop" which actually tried to quantitatively assess memory use relies on a [Wikipedia page which quotes 2011 data][2]. The [newest article I could find][3] dates back to November 2018 (thanks to https://LinuxLinks.com). Are there...

                                  A high-votes answer  here for the question "What's the lightest desktop" which actually tried to quantitatively assess memory use relies on a Wikipedia page which quotes 2011 data .  

The newest article I could find  dates back to November 2018 (thanks to https://LinuxLinks.com). Are there newer comparisons which objectively measure memory use?

K7AAY (3926 rep)

May 11, 2020, 09:13 PM • Last activity: Apr 8, 2024, 04:48 AM

0 votes

1 answers

281 views

How to configure Linux to run benchmarks as stable as possible?

linux performance gentoo benchmark

I have several long running benchmarks (SPEC CPU 2006 benchmarks) to run on a Linux server. The server is running Gentoo Linux with a Linux Kernel 3.6.11. I saw some big differences between different runs and I'm thinking that it could be some problem with my configuration. I'm the only user of the...

                                  I have several long running benchmarks (SPEC CPU 2006 benchmarks) to run on a Linux server. 

The server is running Gentoo Linux with a Linux Kernel 3.6.11.

I saw some big differences between different runs and I'm thinking that it could be some problem with my configuration. 

I'm the only user of the server. 

I have already done several configuration changes: 

 - I have disabled the CPU frequency scaling feature of the kernel. 
 - I have removed all cron jobs
 - I made sure that there were no frequency scaling in the BIOS

Is there some others things I should configure/disable/enable in the kernel or on the installation ? 

I do not need to make the server faster, but I want to make sure that the benchmarks are not encountering pertubations

Thanks

Baptiste Wicht (233 rep)

Jan 20, 2013, 07:48 AM • Last activity: Feb 8, 2024, 12:38 AM

0 votes

1 answers

314 views

Benchmarking lustre filesystem

dd benchmark lustre

I want to benchmark the ability of a single lustre client to save in its lustre-mounted filesystem. I am an application developer and not a storage maintainer, so i am not worried about storage write bandwidth saturation by several clients, I am worried about how much a single application server/lus...

                                  I want to benchmark the ability of a single lustre client to save in its lustre-mounted filesystem.

I am an application developer and not a storage maintainer, so i am not worried about storage write bandwidth saturation by several clients, I am worried about how much a single application server/lustre client can write at once so i can compare it to my application performance.

I found this page with several benchmarks , but all seem to be interested in configuring several clients at once instead of using just one client. Is any of these more interesting for what i am looking?

Alternatively, i have a naive script which uses dd to benchmark filesystems in different block sizes and counts. Can I trust the results that I obtained by running this dd script in my lustre client? If not, why? I know i am limited by my network bandwidth, but i am interested in understanding how it limits my performance too.

Marco Montevechi Filho (187 rep)

Feb 2, 2024, 06:31 PM • Last activity: Feb 3, 2024, 09:08 AM

1 votes

1 answers

98 views

Why is the load of 'stress' unevenly distributed?

linux benchmark

When I used stress to conduct stress testing on my system, I found that there was an issue with the distribution of `CPU` usage. `CPU2` had a much higher usage rate than the other three `CPUs`. Excuse me, is this normal? I observed for about 10 minutes and all the results were similar. Executed stre...

                                  When I used stress to conduct stress testing on my system, I found that there was an issue with the distribution of CPU usage. CPU2 had a much higher usage rate than the other three CPUs. Excuse me, is this normal?

I observed for about 10 minutes and all the results were similar.

Executed stress test command::

    stress -c 4 -m 16 -d 16

information of top:

    CPU0: 45.9% usr 53.8% sys  0.0% nic  0.0% idle  0.1% io  0.0% irq  0.0% sirq
    CPU1: 44.0% usr 55.5% sys  0.0% nic  0.1% idle  0.1% io  0.0% irq  0.0% sirq
    CPU2: 87.6% usr 11.9% sys  0.0% nic  0.0% idle  0.0% io  0.0% irq  0.3% sirq
    CPU3: 55.8% usr 44.1% sys  0.0% nic  0.0% idle  0.0% io  0.0% irq  0.0% sirq


lscpu：

    Architecture:            x86_64
      CPU op-mode(s):        32-bit, 64-bit
      Address sizes:         39 bits physical, 48 bits virtual
      Byte Order:            Little Endian
    CPU(s):                  4
      On-line CPU(s) list:   0-3
    Vendor ID:               GenuineIntel
      BIOS Vendor ID:        Intel(R) Corporation
      Model name:            Intel(R) Celeron(R) J6412 @ 2.00GHz
        BIOS Model name:     Intel(R) Celeron(R) J6412 @ 2.00GHz To Be Filled By O.E.M. CPU @ 1.9GHz
        BIOS CPU family:     15
        CPU family:          6
        Model:               150
        Thread(s) per core:  1
        Core(s) per socket:  4
        Socket(s):           1
        Stepping:            1
        CPU(s) scaling MHz:  100%
        CPU max MHz:         2000.0000
        CPU min MHz:         800.0000
        BogoMIPS:            3993.60
        Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc art arch_per
                             fmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg cx16 xtpr pdcm sse4_1 sse
                             4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave rdrand lahf_lm 3dnowprefetch cpuid_fault epb cat_l2 cdp_l2 ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexprio
                             rity ept vpid ept_ad fsgsbase tsc_adjust smep erms rdt_a rdseed smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect dtherm arat pln
                             pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req vnmi umip waitpkg gfni rdpid movdiri movdir64b md_clear flush_l1d arch_capabilities
    Virtualization features:
      Virtualization:        VT-x
    Caches (sum of all):
      L1d:                   128 KiB (4 instances)
      L1i:                   128 KiB (4 instances)
      L2:                    1.5 MiB (1 instance)
      L3:                    4 MiB (1 instance)
    NUMA:
      NUMA node(s):          1
      NUMA node0 CPU(s):     0-3
    Vulnerabilities:
      Itlb multihit:         Not affected
      L1tf:                  Not affected
      Mds:                   Not affected
      Meltdown:              Not affected
      Mmio stale data:       Mitigation; Clear CPU buffers; SMT disabled
      Retbleed:              Not affected
      Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
      Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
      Spectre v2:            Mitigation; Enhanced / Automatic IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS Not affected
      Srbds:                 Vulnerable: No microcode
      Tsx async abort:       Not affected

add /proc/interrupts:

    cat /proc/interrupts
                CPU0       CPU1       CPU2       CPU3
       0:        122          0          0          0  IR-IO-APIC    2-edge      timer
       8:          0          0          0          0  IR-IO-APIC    8-edge      rtc0
       9:          0          0          0          0  IR-IO-APIC    9-fasteoi   acpi
      16:          0          0       1163          0  IR-IO-APIC   16-fasteoi   i801_smbus, mmc0
     120:          0          0          0          0  DMAR-MSI    0-edge      dmar0
     121:          0          0          0          0  DMAR-MSI    1-edge      dmar1
     122:          0          0          0          0  IR-PCI-MSI-0000:00:1c.0    0-edge      PCIe PME, aerdrv, pcie-dpc
     123:          0          0          0      25032  IR-PCI-MSI-0000:00:14.0    0-edge      xhci_hcd
     124:          0          0          0          0  IR-PCI-MSI-0000:00:17.0    0-edge      ahci[0000:00:17.0]
     125:          0          1          0          0  IR-PCI-MSIX-0000:01:00.0    0-edge      eth0
     126:          0          0        650          0  IR-PCI-MSIX-0000:01:00.0    1-edge      eth0-TxRx-0
     127:          0          0          0        650  IR-PCI-MSIX-0000:01:00.0    2-edge      eth0-TxRx-1
     128:        648          0          0          0  IR-PCI-MSIX-0000:01:00.0    3-edge      eth0-TxRx-2
     129:          0        658          0          0  IR-PCI-MSIX-0000:01:00.0    4-edge      eth0-TxRx-3
     NMI:          0          0          0          0   Non-maskable interrupts
     LOC:      12551      10285      21303      11548   Local timer interrupts
     SPU:          0          0          0          0   Spurious interrupts
     PMI:          0          0          0          0   Performance monitoring interrupts
     IWI:         68         21         68         36   IRQ work interrupts
     RTR:          0          0          0          0   APIC ICR read retries
     RES:      12513       3510      14961       3320   Rescheduling interrupts
     CAL:      12861       2485       1025        889   Function call interrupts
     TLB:         31         48         28         61   TLB shootdowns
     TRM:          0          0          0          0   Thermal event interrupts
     THR:          0          0          0          0   Threshold APIC interrupts
     DFR:          0          0          0          0   Deferred Error APIC interrupts
     MCE:          0          0          0          0   Machine check exceptions
     MCP:          4          5          5          5   Machine check polls
     ERR:          0
     MIS:          0
     PIN:          0          0          0          0   Posted-interrupt notification event
     NPI:          0          0          0          0   Nested posted-interrupt event
     PIW:          0          0          0          0   Posted-interrupt wakeup event


                                

Vimer (67 rep)

Jan 22, 2024, 07:53 AM • Last activity: Jan 23, 2024, 09:31 AM

Showing page 1 of 20 total questions