On 2023-01-29, Paul Floyd wrote:

My recommendations for this are:

1/ PMU/PMC (performance monitoring unit/counter) event counting tools (perf record on Linux, pmcstat on FreeBSD, Oracle Studio collect on Solaris, don't know for macOS). These can record events such as cache misses with the associated callstacks. You can then use tools HotSpot and perfgrind/kcachegrind (I hae used HotSpot but not perfgrind).

The big advantage of this is that the PMCs are part of the hardware and the 
overhead of doing this is minor. The only slight limitation is that then number 
of counters is limited.

Another disadvantage: the hardware does not know which accesses
belong to the target code versus which accesses belong to
the code of valgrind itself.

Even if the hardware could separate accesses on that basis, it does not know
about stack frames.  Allocating a stack frame shortly after CALL, and
discarding it shortly before RETURN, can be significant reasons for
cache misses, either immediately or in the near future.

Then there are system calls, which might significantly alter cache contents.
Sometimes the resulting cache misses should be included (they most certainly
do affect wall clock time), but in some other cases you may wish that the
operating system was ignored.

If the target program uses threads, then using memory for inter-thread
communication (semaphore, mutex, pipeline, etc.) becomes another factor.



_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to