Sample Header Ad - 728x90

perf fails to work on really large process

5 votes
1 answer
9336 views
I use perf quite often to profile my applications. Recently I got a few new machine with ~750G RAM each. I am trying to profile a process that uses about 400G of ram on one of them. This works fine on all the new machines but one. They were all installed the same way, running ubuntu. I tried both perf as installed through apt-get and compiled my own from git . When running ./perf top -p 14182 I some time manage to get results (after a long wait, and very rarely) but most other times I am getting: The sys_perf_event_open() syscall returned with 3 (No such process) for event (cycles:pp). /bin/dmesg may provide additional information. No CONFIG_PERF_EVENTS=y kernel support configured? For the same pid(!). I get similar problems when trying to record, for instance: time ./perf record -F 111 -a -g -p 14182 -- sleep 3 Warning: PID/TID switch overriding SYSTEMsleep: Terminated Command exited with non-zero status 255 2.66user 91.58system 1:36.68elapsed 97%CPU (0avgtext+0avgdata 4896maxresident)k 0inputs+5248outputs (0major+4847minor)pagefaults 0swaps This command takes way longer than 3 seconds, I cannot ctrl+c it, and killing it requires -9. It emits some data into perf.data but when I try to view it, i get file has no samples. Everything works just fine on other (smaller) processes on this machine, and as I mentioned earlier this works great on the rest of the machines, for processes weighting about 200G. When I run the previous command on the other machines, it also uses much less cpu (33% rather than 97%). I am not sure where else to look, and google has not been very helpful for me. Any ideas or directions? **Edit** Seeing how system was high, I ran this with strace -c, this is what I get: Another machines with a process weighting ~200G: _ strace -c perf record -F 111 -a -g -p 27879 -- sleep 3 Warning: PID/TID switch overriding SYSTEM[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.684 MB perf.data (~29876 samples) ] % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 91.43 2.548664 109 23481 read 2.47 0.068712 11 6457 15 stat 2.42 0.067392 9 7303 write 0.68 0.018960 12 1646 getdents 0.68 0.018841 22 847 1 mmap This machine with a process weighting ~400G: _ strace -c perf record -F 111 -a -g -p 14182 -- sleep 3 /tmpvaknins9@snlp-brevis-3 Warning: PID/TID switch overriding SYSTEMsleep: Terminated sleep: Terminated % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 99.70 131.243199 472 278178 read 0.16 0.207757 8 25895 write 0.04 0.048962 8 6398 5 stat 0.02 0.031247 14 2217 mmap 0.02 0.026385 9 3091 close It seems lots of time is being spent on reading /proc/tid/maps|status.. Perhaps my process is simply too large? Any clues how I could still get it to record this process? Thanks!
Asked by Shlomi (151 rep)
Aug 10, 2016, 12:13 AM
Last activity: Mar 13, 2017, 11:56 PM