Clarifications on perf_events data collection

0 votes
0 answers
184 views
                          _Originally posted [here](https://stackoverflow.com/questions/79391172/clarifications-on-perf-events-data-collection)  on stackoverflow_

I have never used the perf command before (but I need it), hence I have been reading the (really useful) [PerfWiki](https://perfwiki.github.io/main/) .

The section devoted to [Event-based sampling overview](https://perfwiki.github.io/main/tutorial/#sampling-with-perf-record)  contains a number of statements that are not completely clear to me.
As they are quite essential to precisely understand how data collection is carried out, here I am asking for your help.

In the following I will quote paragraphs from that section and explain my duobts immediately after.

> Perf_events is based on event-based sampling. The period is expressed as the number of occurrences of an event, not the number of timer ticks. A sample is recorded when the sampling counter overflows, i.e., wraps from 2^64 back to 0. No PMU implements 64-bit hardware counters, but perf_events emulates such counters in software.

Q1. Consider a CPU working at a fixed frequency (no frequency scaling): what is the precise definition of a timer tick? Is it true that the _number_ of ticks in a second equals the _value_ of the frequency (e.g. 1_000_000_000 ticks/second for a CPU working at 1 GHz)?

Q2. perf does not use timer ticks, instead, it counts the number of times an event occurs and only "stops" the CPU to gather the relevant data once ever period times; e.g. if period=1 each occurence of an event is registered, if period=2 it only registers half of the total number of occurrences, and so on... is it right? 
When period > 1 does perf automatically scale the final values and provide data as if all the events were registered?

Q3. The above section says that "A sample is recorded when the sampling counter overflows, i.e., wraps from 2^64 back to 0" which seems to contradict the measurement being taken once every period occurrences of an event... what am I missing?
More generally, why does perf wait for a counter to overflow before gathering the information?
Also, what happens when more than one event is being monitored?

> The way perf_events emulates 64-bit counter is limited to expressing sampling periods using the number of bits in the actual hardware counters. If this is smaller than 64, the kernel **silently** truncates the period in this case. Therefore, it is best if the period is always smaller than 2^31 if running on 32-bit systems.

Q4. I cannot truly understand the meaning of this paragraph (maybe I am missing some underlying knowledge). If the actual hardware counter has `N  On counter overflow, the kernel records information, i.e., a sample, about the execution of the program. What gets recorded depends on the type of measurement. This is all specified by the user and the tool. But the key information that is common in all samples is the instruction pointer, i.e. where was the program when it was interrupted.

Ok, I got this.

> Interrupt-based sampling introduces skids on modern processors. That means that the instruction pointer stored in each sample designates the place where the program was interrupted to process the PMU interrupt, not the place where the counter actually overflows, i.e., where it was at the end of the sampling period. In some case, the distance between those two points may be several dozen instructions or more if there were taken branches. When the program cannot make forward progress, those two locations are indeed identical. **For this reason, care must be taken when interpreting profiles.**

Q5. I am aware I know way too little about how a CPU actually works to understand this section, but could you confirm that this paragraph is warning about the following potential chain of events (pun not intended):

 1. A sample must be taken
 2. The current value in the istruction pointer register is taken
 3. A few more instructions are executed by the CPU
 4. The sample is taken and the gathered data is saved using and associated with the instruction pointer taken at step (2)

If this is correct, could you give me a brief explanation (or point me to some external resource) about what may cause step (3)?
More importantly, how can I monitor how many times skids have occurred? Is there anything on my side that I can do to mitigate this?

> By default, perf record uses the cycles event as the sampling event.
>
> [...]
>
> The perf_events interface allows two modes to express the sampling period:
>
>    * the number of occurrences of the event (period)
>    * the average rate of samples/sec (frequency)
>
> The perf tool defaults to the average rate. It is set to 1000Hz, or 1000 samples/sec. That means that the kernel is dynamically adjusting the sampling period to achieve the target average rate.

Q6. Does perf use the cycles event as a reference to compute the sampling period even if it is not among the set of events being monitored?
What happens when multiple events are monitored? Does each event have its own period, or is there one event that counts for all?
When a frequency is used to determine when samples must be taken, the event used as reference for sampling should be irrelevant, right?

                        
Asked by Sirion (101 rep)
Jan 29, 2025, 08:23 AM
Clarifications on perf_events data collection

Related Questions