Sample Header Ad - 728x90

Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

0 votes
1 answers
367 views
how to generate a flamegraph when using jeprof
I am using this command to generate jemolloc output svg file: jeprof --svg texhub-server --base=texhub.out.1.0.i0.heap texhub.out.1.* > output.svg I read document from here(https://github.com/GreptimeTeam/greptimedb/blob/develop/src/common/mem-prof/README.md) and it told that will generate flamegrap...
I am using this command to generate jemolloc output svg file: jeprof --svg texhub-server --base=texhub.out.1.0.i0.heap texhub.out.1.* > output.svg I read document from here(https://github.com/GreptimeTeam/greptimedb/blob/develop/src/common/mem-prof/README.md) and it told that will generate flamegraph, but when I open the output.svg file, it was not a flamegraph. Am I missing something? I have tried to open the file with browser and inkspace. enter image description here this is the jeperf version info: root@texhub-server-service-5c8fcf49d5-qrgh8:/app# jeprof --version jeprof (part of jemalloc 5.2.1-0-gea6b3e973b477b8061e0076bb257dbd7f3faa756) based on pprof (part of gperftools 2.0) Copyright 1998-2007 Google Inc. This is BSD licensed software; see the source for copying conditions and license information. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Dolphin (791 rep)
Nov 12, 2024, 01:26 PM • Last activity: Feb 10, 2025, 11:07 AM
743 votes
22 answers
1069181 views
How to get execution time of a script effectively?
I would like to display the completion time of a script. What I currently do is - #!/bin/bash date ## echo the date at start # the script contents date ## echo the date at end This just show's the time of start and end of the script. Would it be possible to display a fine grained output like process...
I would like to display the completion time of a script. What I currently do is - #!/bin/bash date ## echo the date at start # the script contents date ## echo the date at end This just show's the time of start and end of the script. Would it be possible to display a fine grained output like processor time/ io time , etc?
mtk (28478 rep)
Oct 19, 2012, 01:26 PM • Last activity: Dec 19, 2024, 03:14 PM
4 votes
0 answers
403 views
What are profiling tools available for bash/shell script
I have a project which is entirely written in shell scripts. I find some of the modules in my code takes a lot of time, I need to reduce it functionality wise. What are profiling tools available for shell scripts?
I have a project which is entirely written in shell scripts. I find some of the modules in my code takes a lot of time, I need to reduce it functionality wise. What are profiling tools available for shell scripts?
Ragini Dahihande (203 rep)
Feb 16, 2021, 11:43 AM • Last activity: Aug 25, 2024, 07:25 AM
21 votes
7 answers
17305 views
How can I profile a shell script?
I have several programs that I'm executing in a shell script: ./myprogram1 ./myprogram2 ... I know that I can profile each individual program by editing the source code, but I wanted to know if there was a way I could measure the total time executed by profiling the script itself. Is there a timer p...
I have several programs that I'm executing in a shell script: ./myprogram1 ./myprogram2 ... I know that I can profile each individual program by editing the source code, but I wanted to know if there was a way I could measure the total time executed by profiling the script itself. Is there a timer program that I can use for this purpose? If so, how precise is its measurement?
Paul (9823 rep)
May 30, 2012, 02:00 AM • Last activity: Aug 25, 2024, 07:23 AM
9 votes
1 answers
1785 views
Does mtrace() still work in modern distros?
tldr: Does mtrace still work or am I just doing it wrong? I was attempting to use mtrace and have been unable to get it to write data to a file. I followed the instructions in `man 3 mtrace`: t_mtrace.c: ```c #include #include #include int main(int argc, char *argv[]) { mtrace(); for (int j = 0; j <...
tldr: Does mtrace still work or am I just doing it wrong? I was attempting to use mtrace and have been unable to get it to write data to a file. I followed the instructions in man 3 mtrace: t_mtrace.c:
#include 
#include 
#include 

int
main(int argc, char *argv[])
{
    mtrace();

    for (int j = 0; j < 2; j++)
        malloc(100);            /* Never freed--a memory leak */

    calloc(16, 16);             /* Never freed--a memory leak */
    exit(EXIT_SUCCESS);
}
Then running this in bash:
gcc -g t_mtrace.c -O0 -o t_mtrace
export MALLOC_TRACE=/tmp/t
./t_mtrace
mtrace ./t_mtrace $MALLOC_TRACE
but the file /tmp/t (or any other file I attempt to use) is not created. When I create an empty file with that name it remains zero length. I've tried using relative paths in the MALLOC_TRACE. I tried adding setenv("MALLOC_TRACE", "/tmp/t", 1); inside the program before the mtrace() call. I've tried adding muntrace() before the program terminates. I've tried these tactics on Ubuntu 22.04 and Fedora 39, and I get the same result: the trace file is empty. The ctime and mtime on the file (if I create it in advance) are unchanged when I run the program. I've verified the permissions of the file and its parent directory are read/writable. strace isn't showing that the file in question is stated, much less opened. This occurs using Glibc 2.35 on Ubuntu and 2.38 on Fedora. This isn't a question on how to profile or check for memory leaks. I realize I can do this with valgrind or half a dozen other programs, this is mostly a curiosity and me wanting to know if this is a bug that might need to be patched or whether the man page needs updating (or whether I'm just misapprehending something and the only problem is sitting in my chair).
TopherIsSwell (265 rep)
May 2, 2024, 06:55 AM • Last activity: May 8, 2024, 03:57 AM
0 votes
1 answers
216 views
Which tools for micro-benchmarking?
I'm not sure which tool to use for micro-benchmarking a C program. I would like to measure both: - Memory usage, RSS ( Resident Set Size ) - CPU cycles I did use `perf record -g` and `perf script` piped into an awk script. This worked for finding the memory usage, but CPU cycles weren't accurate bec...
I'm not sure which tool to use for micro-benchmarking a C program. I would like to measure both: - Memory usage, RSS ( Resident Set Size ) - CPU cycles I did use perf record -g and perf script piped into an awk script. This worked for finding the memory usage, but CPU cycles weren't accurate because perf record gets the cpu cycles by sampling. perf stat is accurate but obviously doesn't give per-function stats. The perf_event library seems to be terribly documented and a meal of a task for simple benchmarking. Having briefly looked at: - SystemTap - DTrace - LTTng - gperftools - likwid - PAPI Which seem like decent, well-documented tools. What would you recommend looking the most into? Or any other suggestions? Thank you for your time.
jinTgreater (1 rep)
Jan 19, 2024, 06:23 PM • Last activity: Jan 19, 2024, 11:20 PM
0 votes
1 answers
200 views
How to measure mmap I/O latency?
I have an application which appears to be slowing/blocking at the same time there's a lot of disk I/O going on, so I suspect it's I/O operations within the application which are blocking. I can't imagine what else the problem might be, but I would like to confirm it. The problem is that the applicat...
I have an application which appears to be slowing/blocking at the same time there's a lot of disk I/O going on, so I suspect it's I/O operations within the application which are blocking. I can't imagine what else the problem might be, but I would like to confirm it. The problem is that the application largely uses mmap'd files for I/O, and thus they don't show up with strace. I know blocking I/O operations from mmap'd memory is going to be a page fault. But is there a way to measure the amount of time thread execution was suspended due to page faults?
phemmer (73711 rep)
Oct 5, 2023, 05:00 PM • Last activity: Oct 5, 2023, 06:00 PM
56 votes
5 answers
63386 views
Determining Specific File Responsible for High I/O
This is a simple problem but the first time I've ever had to actually fix it: finding which specific files/inodes are the targets of the most I/O. I'd like to be able to get a general system overview, but if I have to give a PID or TID I'm alright with that. I'd like to go without having to do a `st...
This is a simple problem but the first time I've ever had to actually fix it: finding which specific files/inodes are the targets of the most I/O. I'd like to be able to get a general system overview, but if I have to give a PID or TID I'm alright with that. I'd like to go without having to do a strace on the program that pops up in iotop. Preferably, using a tool in the same vein as iotop but one that itemizes by file. I can use lsof to see which files mailman has open but it doesn't indicate which file is receiving I/O or how much. I've seen elsewhere where it was suggested to use auditd but I'd prefer to not do that since it would put the information into our audit files, which we use for other purposes and this seems like an issue I ought to be able to research in this way. The specific problem I have right now is with LVM snapshots filling too rapidly. I've since resolved the problem but would like to have been able to fix it this way rather than just doing an ls on all the open file descriptors in /proc//fd to see which one was growing fastest.
Bratchley (17244 rep)
Aug 15, 2013, 02:19 PM • Last activity: Sep 4, 2023, 07:30 AM
0 votes
1 answers
625 views
thread profiling and monitoring
how can I get lock time values specifically of the threads of any process? in linux.. I was using the command /proc/pid/stat but I am unable to determine which values are of lock time.
how can I get lock time values specifically of the threads of any process? in linux.. I was using the command /proc/pid/stat but I am unable to determine which values are of lock time.
I192100 Mayra Ahmad (31 rep)
Apr 1, 2021, 03:32 PM • Last activity: Oct 5, 2022, 11:56 AM
3 votes
4 answers
2594 views
How to profile wall-clock time?
In my program, real time duration is sometimes as much as 3 times that of cpu time. This is a single thread application that does a lot of memory allocation and NFS base read/write. So my doubt is that it is either mem-swap or NFS read-write that is slowing things down. For example, the following is...
In my program, real time duration is sometimes as much as 3 times that of cpu time. This is a single thread application that does a lot of memory allocation and NFS base read/write. So my doubt is that it is either mem-swap or NFS read-write that is slowing things down. For example, the following is the output of /usr/bin/time a.out 2165.32user 64.93system 6036.33elapsed Is there any profiling tool for real time? I know and have used multiple tools for cpu time profiling, but am not sure if there is anything that can help and point out NFS / mem-swap or any other wall clock slowdowns. My program is written in C++ **EDIT** : /usr/bin/time gives me a summary at the end - I am not looking for that. I am looking for a way to correlate the real-time consumption during specific program blocks of my application. A profiler like collect/gprof that can tell me things like - the area where most context switches are happening due to waits. - specific functions where NFS access is happening. Since my system is dedicated, I am not worried about other processes that might impact these profiles.
amisax (3083 rep)
Sep 8, 2020, 04:13 PM • Last activity: Jan 25, 2022, 07:55 AM
1 votes
0 answers
245 views
How do I report the max RSS of a cgroup?
I would like to monitor the peak RSS a cgroup has used since it was created. By "peak RSS" I mean: the sum of all processes' RSS at the point in time where that sum is the greatest. I believe `memory.max_usage_in_bytes` reports RSS+CACHE, and AFAIK there isn't a `"max_cache_in_bytes"` metric that I...
I would like to monitor the peak RSS a cgroup has used since it was created. By "peak RSS" I mean: the sum of all processes' RSS at the point in time where that sum is the greatest. I believe memory.max_usage_in_bytes reports RSS+CACHE, and AFAIK there isn't a "max_cache_in_bytes" metric that I can use to derive a "max_rss_in_bytes". Basically: I would love the cgroup equivalent of time -f %M. Does anyone know of a way?
Lawrence Wagerfield (133 rep)
Oct 8, 2021, 08:54 PM • Last activity: Oct 8, 2021, 09:58 PM
2 votes
1 answers
310 views
How can I profile virtual memory accesses made in user mode and kernel mode?
I would like to generate a log of all virtual memory accesses performed in user mode and kernel mode as a result of running some program. Besides collecting memory access locations, I also want to capture other state information (e.g., instruction pointer, thread identifier). I anticipate that I won...
I would like to generate a log of all virtual memory accesses performed in user mode and kernel mode as a result of running some program. Besides collecting memory access locations, I also want to capture other state information (e.g., instruction pointer, thread identifier). I anticipate that I won't be able to collect all of my desired statistics with any tool out of the box. I intend on doing this profiling off-line, so I'm not concerned about the performance impacts. In fact, depending on what is available, it would be helpful to know which tools can record all memory accesses and which can only sample. I was originally going to augment Valgrind's lacky tool until I realized that it only records user mode memory accesses. Looking into what other tools I might use, I'm at a loss at how I can quickly determine which tool is capable of capturing the information I want. Here are some resources I've found that have gotten me started: - [Brendan Gregg's _Choosing a Linux Tracer_](http://www.brendangregg.com/blog/2015-07-08/choosing-a-linux-tracer.html) - [Julia Evans' _Linux tracing systems & how they fit together_](https://jvns.ca/blog/2017/07/05/linux-tracing-systems/)
bryantcurto (21 rep)
Jul 12, 2021, 03:45 PM • Last activity: Aug 17, 2021, 07:57 PM
13 votes
1 answers
11462 views
Security implications of changing “perf_event_paranoid”
I'd like to use the `perf` utility to gather measurements for my program. It runs on a shared cluster machine with Debian 9 where by default the `/proc/sys/kernel/perf_event_paranoid` is set to 3, therefore disallowing me to gather measurements. Before changing it, I'd like to know what the implicat...
I'd like to use the perf utility to gather measurements for my program. It runs on a shared cluster machine with Debian 9 where by default the /proc/sys/kernel/perf_event_paranoid is set to 3, therefore disallowing me to gather measurements. Before changing it, I'd like to know what the implications of this are. Is it just security that would allow other users to profile stuff run by other uses and therefore gain insights? We do not care about this as it is a inner circle of users anyway. Or is it performance perhaps, which will impact everyone else as well?
Martin Ueding (2812 rep)
May 15, 2019, 12:12 PM • Last activity: Jul 8, 2021, 11:39 AM
2 votes
1 answers
360 views
How do I profile a real world application?
I run Debian 10 and since two weeks or so the PDF reader Atril (a fork of Evince) takes 25 seconds to start. Previously it started almost instantly. Now I'm trying to find out what causes the delay. I have downloaded the source package and built and installed it with profiling enabled: cd "$HOME/.lo...
I run Debian 10 and since two weeks or so the PDF reader Atril (a fork of Evince) takes 25 seconds to start. Previously it started almost instantly. Now I'm trying to find out what causes the delay. I have downloaded the source package and built and installed it with profiling enabled: cd "$HOME/.local/src" apt source atril cd atril-1.20.3 ./autogen.sh ./configure CFLAGS=-pg LDFLAGS=-pg --prefix="$HOME/.local" --disable-caja make V=1 make install However, when I launch "$HOME/.local/bin/atril" no file named gmon.out is created. With verbose mode V=1 in the make command I can see that the option -pg is added to compilation and linking commands. Any clues? What's missing? There are several tutorials on the internet showing how to profile simple statically linked example programs but how do we profile "real world" applications? **Edit:** It turned out that gmon.out was created in my home directory. However, when I run Atril through gprof the resulting output doesn't say much because the application is multi-threaded.
August Karlstrom (1986 rep)
Jun 12, 2021, 09:59 AM • Last activity: Jun 12, 2021, 06:59 PM
3 votes
1 answers
1835 views
Strace shows that the time spent in syscalls is much longer than the total time elapsed. Why?
I am running an AI inference program based on Tensorflow-gpu. By running `/usr/bin/strace -c -f /usr/bin/time ./program`, I got the following output: ``` 367.91user 1032.14system 26:43.41elapsed 87%CPU (0avgtext+0avgdata 4158812maxresident)k ------ ----------- ----------- --------- --------- -------...
I am running an AI inference program based on Tensorflow-gpu. By running /usr/bin/strace -c -f /usr/bin/time ./program, I got the following output:
367.91user 1032.14system 26:43.41elapsed 87%CPU (0avgtext+0avgdata 4158812maxresident)k 
------ ----------- ----------- --------- --------- ----------------
100.00 38559.038652               5385548    247845 total
It shows that my program spent 34105 seconds in futex alone, which is **20 times longer** than the elapsed time of 26:43.41. I assumed that strace was recording the total system call time on all of my cores, so I re-experimented with only a single core enabled (using taskset), but the problem persisted. **Edit:** I did use taskset with the --all-tasks option:
/usr/bin/taskset --all-tasks --cpu-list 0 /usr/bin/strace  -c -f /usr/bin/time ./program
Azuresonance (73 rep)
Mar 9, 2021, 03:20 AM • Last activity: Mar 9, 2021, 07:30 PM
1 votes
0 answers
452 views
How can I measure time spent in child processes?
I have a command which invokes another command (synchronously) a couple of times. Is there a way to get the total time spent in the subcommand? In other words, is there a command similar to *time* but which can time child processes as well? Edit 2021-02-01: I don't have the source code of the comman...
I have a command which invokes another command (synchronously) a couple of times. Is there a way to get the total time spent in the subcommand? In other words, is there a command similar to *time* but which can time child processes as well? Edit 2021-02-01: I don't have the source code of the command so I can't do the timing from within the command.
August Karlstrom (1986 rep)
Jan 31, 2021, 05:37 PM • Last activity: Feb 1, 2021, 03:23 PM
1 votes
0 answers
563 views
How to take stack samples using `perf` based on wall-clock time
I am trying to use Linux's `perf_events` framework to investigate an issue with an application on one of our servers. Based on my reading about the `perf` tool, collecting stacks is relatively straightforward. I want to know if it's possible to simply use the wall-time as a kind of event to collect...
I am trying to use Linux's perf_events framework to investigate an issue with an application on one of our servers. Based on my reading about the perf tool, collecting stacks is relatively straightforward. I want to know if it's possible to simply use the wall-time as a kind of event to collect stacks every N seconds. The current command I've been using is:
perf record -e cycles -T -o /samples.data -F 1 --call-graph dwarf -T -p
From my current understanding, this command will sample every 1 second (-F 1) and grab the stacks (-g) from the process (-p ) until the perf command is terminated with a signal. But based on the data I collect, it seems like there is more than one set of samples per second? So, I think I am misinterpreting or misunderstanding something about the way perf collects samples. Also, what if I wanted to record stacks every 2, 5, or 10 seconds to reduce the amount of data collected? Is there a way to achieve that with perf?
zac (111 rep)
Jun 24, 2020, 09:49 PM
0 votes
0 answers
902 views
High-frequency performance counter sampling using perf record/report
I want to retrieve performance counter counts at a high frequency (i.e. 100-200Hz) using the `perf` tool (similar in functionality to https://github.com/RRZE-HPC/likwid/wiki/likwid-perfctr#the-timeline-mode but at a higher frequency). Is there a way to do this? If so, what flags do I need to use whe...
I want to retrieve performance counter counts at a high frequency (i.e. 100-200Hz) using the perf tool (similar in functionality to https://github.com/RRZE-HPC/likwid/wiki/likwid-perfctr#the-timeline-mode but at a higher frequency). Is there a way to do this? If so, what flags do I need to use when recording with perf record and reporting results with perf report? So far, I've tried the following to retrieve the r6d70 performance counter at 5ms intervals during sleep 5 execution: sudo perf record -F200 -e r6d70 -a sleep 5. However, when I use perf report to view the outputted data, I see the following, which isn't really what I want:
Samples: 109  of event 'r6d70', Event count (approx.): 68432
Overhead  Command     Shared Object      Symbol
  33.77%  swapper     [kernel.kallsyms]  [k] update_blocked_averages
  10.30%  node        [kernel.kallsyms]  [k] update_blocked_averages
   9.07%  containerd  [kernel.kallsyms]  [k] update_load_avg
   8.98%  containerd  [kernel.kallsyms]  [k] __switch_to
   8.56%  node        node               [.] Builtins_LdaNamedPropertyHandler
   5.90%  swapper     [kernel.kallsyms]  [k] __sched_text_start
   5.88%  swapper     [kernel.kallsyms]  [k] cpufreq_this_cpu_can_update
   5.81%  nautilus    [kernel.kallsyms]  [k] update_blocked_averages
   4.56%  node        node               [.] v8::platform::tracing::TracingController
   3.82%  swapper     [kernel.kallsyms]  [k] arch_irq_work_raise
   3.20%  containerd  [kernel.kallsyms]  [k] select_task_rq_fair
   0.03%  swapper     [kernel.kallsyms]  [k] acpi_idle_do_entry
Justin Borromeo (121 rep)
Mar 8, 2020, 07:55 AM
1 votes
2 answers
982 views
profiling a linux command to get metrics
I am trying to unzip a huge .gz file. I would like to know if there is a way we could profile this command to get the CPU utilization while the command is executing I am looking for something like this gunzip file.gz | profileTheCommand
I am trying to unzip a huge .gz file. I would like to know if there is a way we could profile this command to get the CPU utilization while the command is executing I am looking for something like this gunzip file.gz | profileTheCommand
wandermonk (111 rep)
Mar 3, 2020, 04:40 AM • Last activity: Mar 5, 2020, 01:21 AM
7 votes
1 answers
1548 views
How to get the GPU execution time of a script in the shell?
I would like to display the completion time of a script, in terms of GPU time (not CPU). For the CPU, I can [simply](https://unix.stackexchange.com/q/52313/16704) [use](https://unix.stackexchange.com/q/52313/16704) [`time`](https://stackoverflow.com/q/556405/395857): francky@gimmek80s:~$ time ls -l...
I would like to display the completion time of a script, in terms of GPU time (not CPU). For the CPU, I can [simply](https://unix.stackexchange.com/q/52313/16704) [use](https://unix.stackexchange.com/q/52313/16704) [time](https://stackoverflow.com/q/556405/395857) : francky@gimmek80s:~$ time ls -l total 8 drwxrwxr-x 3 francky francky 4096 Dec 16 22:19 codes drwxrwxr-x 2 francky francky 4096 Jun 20 00:06 CUDA_practice drwxrwxr-x 3 francky francky 4096 Dec 16 22:44 data real 0m0.001s user 0m0.000s sys 0m0.000s What is the equivalent for GPUs? (I use the CUDA toolkit to perform some computations on my Nvidia GPUs.)
Franck Dernoncourt (5533 rep)
Dec 17, 2015, 03:26 PM • Last activity: Mar 4, 2020, 02:36 PM
Showing page 1 of 20 total questions