Re: [Xen-devel] Operf and Opreport outputs

2017-07-06 Thread dhara buch
Hello,

I tried the perf-hello.c file and found the results as you explained. It is
very helpful to me. Now, for PERF_COUNT_HW_BRANCH_MISSES, I am getting
varying values every time unlike PERF_COUNT_HW_INSTRUCTIONS.

To get count for only defined process, I am trying to set perf_event_attr
structure variables sample_type as PERF_SAMPLE_BRANCH_STACK and
branch_sample_type as PERF_SAMPLE_BRANCH_USER. By doing so, I am getting
error no. 5 by perf_event_open. What may be the problem?

Dhara Buch

On Mon, Jun 19, 2017 at 5:51 PM, Michael Petlan  wrote:

> On Mon, 19 Jun 2017, dhara buch wrote:
> > Hello,
> > Thanks a lot for the detailed explanation. I could understand the
> working of operf and opreport.
> > Unlike operf, ocount counts each occurrence of the monitored event. In
> such a case, why ocount also gives varying values of CPU_CLK_UNHALTED even
> when the monitored code doesn't get
> > changed. Is it because of background processes? If so, then can we get
> event count for only monitored command?
>
> Hi Dhara,
>
> it is kind of because of the background processes, but not in the way that
> ocount
> would count also their CPU consumption. If you don't use '-s' option,
> ocount will
> count only what your command does.
>
> However, it counts either kernelspace, userspace or both (default if you
> are root).
> This can be configured by the event's flags:
>
> Both:
> -e CPU_CLK_UNHALTED:0:1:1
>
> Only kernelspace (used by the command, i.e. syscalls, etc):
> -e CPU_CLK_UNHALTED:0:1:0
>
> Only userspace (i.e. syscalls are excluded):
> -e CPU_CLK_UNHALTED:0:0:1
>
> (the format is (ocount): event:umask:kernelspace:userspace,
> for operf it is: event:umask:sample_rate:kernelspace:userspace)
>
> The numbers are slightly different each run because the process does not
> live in
> vacuum. It is usually dynamically linked, it calls libraries, kernel
> switches
> contexts, etc. All these things cause the differences. Also, the CPU
> optimizes
> the execution with floating success, depending on what else runs there
> with it.
> If the process is interrupted many times, the CPU consumption will be
> higher.
>
> I have a sample code which I linked dynamically and statically. While the
> dyn.
> linked executable always takes around 670,260 cycles, the statically
> linked one
> takes around 380,202 and seems to be more steady. Since it is very simple
> code,
> you can see that the linker-related stuff took more than 40% of the CPU
> cycles).
>
> Executing a binary is so complex problem that there is no surprise that
> those
> measurements give different values every time.
>
> 
>
> TIP:
>
> You can use either papi library or the built-in kernel support for counting
> just a particular part of code, e.g. a function. I have attached one
> example
> which uses perfevents kernel background. I have two functions for factorial
> counting, one is slightly less effective than the other. I count cycles and
> instructions for both. When executing repeatedly, you can see, that the
> inst.
> count is the same all the time (at least on my laptop), but the cycles
> count
> differs even for just the stuff in the functions which are very simple.
>
> cc -o perf-hello perf-hello.c myperf.c
> ./perf-hello
>
> Ineffective method:
> PERF_COUNT_HW_CPU_CYCLES = 32192
> PERF_COUNT_HW_INSTRUCTIONS = 20681
>
> More effective method:
> PERF_COUNT_HW_CPU_CYCLES = 15931
> PERF_COUNT_HW_INSTRUCTIONS = 19103
>
> 
>
> Cheers,
> Michael
>
>
> >
> > Thank you,
> >
> > On Fri, Jun 16, 2017 at 4:52 PM, Michael Petlan 
> wrote:
> >   On Fri, 16 Jun 2017, dhara buch wrote:
> >   > Hello,
> >   >
> >   > Many days ago you had helped me a lot in using Oprofile in
> Virtual Machine. That time, I also had a query regarding
> >   > getting information from 'opreport --symbols --debuginfo'
> command.
> >
> >   Hello,
> >
> >   yeah, I remember...
> >
> >   >
> >   > Actually, I am profiling a file1.c file with the following
> commands...
> >   >
> >   > 1. gcc -g file1.c -o file1
> >   >
> >   > 2. operf ./file1 --events=CPU_CLK_UNHALTED
> >   >
> >   > 3. opreport --symbols --debuginfo
> >   >
> >   > the output contains various lines where one line look follows:
> >   >
> >   > samples  %linenr info image
> name   symbol name
> >   >
> >   > 2   12.5000  rsaimp.c:16
> rsaimp   MM
> >   > 212.5000  rsaimp.c:34
> rsaimp   main
> >   >
> >   > When I execute commands 2 and 3 multiple times, each time I gwt
> the output for symbole name MM (which is UDF in the
> >   > program) but I don't get output for main every time.
> >   >
> >   This is kind of random, see below.
> >
> >   > Additionally,'' the sample % also vary every time I file operf
> command. I do not understand the working of 

Re: [Xen-devel] Operf and Opreport outputs

2017-06-19 Thread dhara buch
Hello,

Thanks a lot for the detailed explanation. I could understand the working
of operf and opreport.

Unlike operf, ocount counts each occurrence of the monitored event. In such
a case, why ocount also gives varying values of CPU_CLK_UNHALTED even when
the monitored code doesn't get changed. Is it because of background
processes? If so, then can we get event count for only monitored command?

Thank you,

On Fri, Jun 16, 2017 at 4:52 PM, Michael Petlan  wrote:

> On Fri, 16 Jun 2017, dhara buch wrote:
> > Hello,
> >
> > Many days ago you had helped me a lot in using Oprofile in Virtual
> Machine. That time, I also had a query regarding
> > getting information from 'opreport --symbols --debuginfo' command.
>
> Hello,
>
> yeah, I remember...
>
> >
> > Actually, I am profiling a file1.c file with the following commands...
> >
> > 1. gcc -g file1.c -o file1
> >
> > 2. operf ./file1 --events=CPU_CLK_UNHALTED
> >
> > 3. opreport --symbols --debuginfo
> >
> > the output contains various lines where one line look follows:
> >
> > samples  %linenr info image name
> symbol name
> >
> > 2   12.5000  rsaimp.c:16 rsaimp   MM
> > 212.5000  rsaimp.c:34 rsaimp
> main
> >
> > When I execute commands 2 and 3 multiple times, each time I gwt the
> output for symbole name MM (which is UDF in the
> > program) but I don't get output for main every time.
> >
> This is kind of random, see below.
>
> > Additionally,'' the sample % also vary every time I file operf command.
> I do not understand the working of operf that
> > how it gives different outputs for the same program.
> >
>
> How _counting_ works:
>
> Each time an event happens in CPU (in our case -- CPU_CLK_UNHALTED -- one
> CPU tick), its counter is incremented by one. This way, you can get the
> whole number of how many times the event happened:
>
> $ ocount ls
> Events were actively counted for 1863267 nanoseconds.
> Event counts (actual) for /bin/ls:
>   EventCount% time counted
>   CPU_CLK_UNHALTED 2,672,309100.00
>
> This means that there were 2.672 million CPU ticks necessary to execute
> `ls`.
>
> The tool for counting is *ocount*.
>
>
> How _profiling_ works:
>
> For profiling, events are counted the same way as when counting, but only
> once per N event occurrences a "sample" is taken. A sample is a piece of
> information about where in the code it happened, what was the context and
> potentially call-stack, etc. This sample is saved into the oprofile_data
> dir structure. When you finish profiling, opreport generates some overview
> from the samples recorded.
>
> Of course, taking sample causes some overhead. Thus, you cannot take it
> each time the event counter increments. For this purpose, there is "sample
> rate" value which divides the event counter and takes a sample only once
> per "sample rate" occurrences. Thus, it is pretty random, which event
> occurrence is picked for a sample.
>
> Of course, your main() consumes CPU ticks and the counter is incremented,
> but sometimes, one of these incrementations produces sample, sometimes not,
> thus the function is missing.
>
> You have 2 samples on each of the line in the report. That is really a low
> number, so it is very probable that sometimes the samples hit main() and
> sometimes not.
>
> $ operf ls
> Profiling done.
>
> $ opreport
> Using /root/oprofile_data/samples/ for samples directory.
> CPU: Core 2, speed 2267 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
> mask of 0x00 (Unhalted core cycles) count 10
> CPU_CLK_UNHALT...|
>   samples|  %|
> --
>  -->   37 100.000 ls
>CPU_CLK_UNHALT...|
>  samples|  %|
>  --
>   27 72.9730 kallsyms
>5 13.5135 libc-2.19.so
>2  5.4054 ld-2.19.so
>1  2.7027 ls
>1  2.7027 ext4
>1  2.7027 libattr.so.1.1.0
>
> You see that out of (estimated) 2.5-3.5 mil of event occurrences (which is
> the scale `ls` usually needs on my PC), I got only 37 samples.
>
> The sample rate for each event are in `ophelp` as "min count".
>
> Why did I get 37 samples? My default setting of default event is:
>
> $ ophelp -d
> CPU_CLK_UNHALTED:10:0:1:1
> ---^^
>
> This means that there is 1 sample per 10 event occurrences.
>
> You can tweak this value a bit, but note that oprofile won't let you set
> the value to be too low, due to the overhead reasons.
>
> Let's try 1 (sampling 10 times more often):
>
> $ operf -e CPU_CLK_UNHALTED:1 ls
> WARNING: Lost samples detected! See /root/oprofile_data/samples/operf.log
> for details.
> Lowering the sampling rate may reduce or eliminate lost samples.
> See the '--events' option description in the operf man page for help.
> Profiling