Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-06-30 Thread Alexey Budankov
Hi Peter, On 21.06.2017 18:39, Alexey Budankov wrote: > > Hi, > > On 15.06.2017 20:42, Alexey Budankov wrote: >> On 29.05.2017 14:45, Alexey Budankov wrote: >>> On 29.05.2017 14:23, Peter Zijlstra wrote: On Mon, May 29, 2017 at 01:56:05PM +0300, Alexey Budankov wrote: > On 29.05.2017

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-06-30 Thread Alexey Budankov
Hi Peter, On 21.06.2017 18:39, Alexey Budankov wrote: > > Hi, > > On 15.06.2017 20:42, Alexey Budankov wrote: >> On 29.05.2017 14:45, Alexey Budankov wrote: >>> On 29.05.2017 14:23, Peter Zijlstra wrote: On Mon, May 29, 2017 at 01:56:05PM +0300, Alexey Budankov wrote: > On 29.05.2017

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-06-21 Thread Alexey Budankov
Hi, On 15.06.2017 20:42, Alexey Budankov wrote: On 29.05.2017 14:45, Alexey Budankov wrote: On 29.05.2017 14:23, Peter Zijlstra wrote: On Mon, May 29, 2017 at 01:56:05PM +0300, Alexey Budankov wrote: On 29.05.2017 13:43, Peter Zijlstra wrote: Why can't the tree do both? Well, indeed,

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-06-21 Thread Alexey Budankov
Hi, On 15.06.2017 20:42, Alexey Budankov wrote: On 29.05.2017 14:45, Alexey Budankov wrote: On 29.05.2017 14:23, Peter Zijlstra wrote: On Mon, May 29, 2017 at 01:56:05PM +0300, Alexey Budankov wrote: On 29.05.2017 13:43, Peter Zijlstra wrote: Why can't the tree do both? Well, indeed,

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-06-15 Thread Alexey Budankov
On 29.05.2017 14:45, Alexey Budankov wrote: On 29.05.2017 14:23, Peter Zijlstra wrote: On Mon, May 29, 2017 at 01:56:05PM +0300, Alexey Budankov wrote: On 29.05.2017 13:43, Peter Zijlstra wrote: Why can't the tree do both? Well, indeed, the tree provides such capability too. However

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-06-15 Thread Alexey Budankov
On 29.05.2017 14:45, Alexey Budankov wrote: On 29.05.2017 14:23, Peter Zijlstra wrote: On Mon, May 29, 2017 at 01:56:05PM +0300, Alexey Budankov wrote: On 29.05.2017 13:43, Peter Zijlstra wrote: Why can't the tree do both? Well, indeed, the tree provides such capability too. However

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-06-14 Thread Alexey Budankov
On 31.05.2017 3:04, Arun Kalyanasundaram wrote: Hi Alexey, I am interested in validating this fix. Can you please share some of your testcases or let me know if you use any standard OpenMP benchmarks? - Arun Hi Arun, I am profiling STREAM benchmark running in 272 OpenMP threads. The

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-06-14 Thread Alexey Budankov
On 31.05.2017 3:04, Arun Kalyanasundaram wrote: Hi Alexey, I am interested in validating this fix. Can you please share some of your testcases or let me know if you use any standard OpenMP benchmarks? - Arun Hi Arun, I am profiling STREAM benchmark running in 272 OpenMP threads. The

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-06-14 Thread Alexey Budankov
On 01.06.2017 0:33, David Carrillo-Cisneros wrote: On Sat, May 27, 2017 at 4:19 AM, Alexey Budankov wrote: Motivation: The issue manifests like 4x slowdown when profiling single thread STREAM benchmark on Intel Xeon Phi running RHEL7.2 (Intel MPSS

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-06-14 Thread Alexey Budankov
On 01.06.2017 0:33, David Carrillo-Cisneros wrote: On Sat, May 27, 2017 at 4:19 AM, Alexey Budankov wrote: Motivation: The issue manifests like 4x slowdown when profiling single thread STREAM benchmark on Intel Xeon Phi running RHEL7.2 (Intel MPSS distribution). Perf profiling is done in

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-05-31 Thread David Carrillo-Cisneros
On Sat, May 27, 2017 at 4:19 AM, Alexey Budankov wrote: > Motivation: > > The issue manifests like 4x slowdown when profiling single thread STREAM > benchmark on Intel Xeon Phi running RHEL7.2 (Intel MPSS distribution). > Perf profiling is done in per-process mode

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-05-31 Thread David Carrillo-Cisneros
On Sat, May 27, 2017 at 4:19 AM, Alexey Budankov wrote: > Motivation: > > The issue manifests like 4x slowdown when profiling single thread STREAM > benchmark on Intel Xeon Phi running RHEL7.2 (Intel MPSS distribution). > Perf profiling is done in per-process mode and involves about 30 core >

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-05-30 Thread Arun Kalyanasundaram
Hi Alexey, I am interested in validating this fix. Can you please share some of your testcases or let me know if you use any standard OpenMP benchmarks? - Arun

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-05-30 Thread Arun Kalyanasundaram
Hi Alexey, I am interested in validating this fix. Can you please share some of your testcases or let me know if you use any standard OpenMP benchmarks? - Arun

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-05-29 Thread Alexey Budankov
On 29.05.2017 14:23, Peter Zijlstra wrote: On Mon, May 29, 2017 at 01:56:05PM +0300, Alexey Budankov wrote: On 29.05.2017 13:43, Peter Zijlstra wrote: Why can't the tree do both? Well, indeed, the tree provides such capability too. However switching to the full tree iteration in cases

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-05-29 Thread Alexey Budankov
On 29.05.2017 14:23, Peter Zijlstra wrote: On Mon, May 29, 2017 at 01:56:05PM +0300, Alexey Budankov wrote: On 29.05.2017 13:43, Peter Zijlstra wrote: Why can't the tree do both? Well, indeed, the tree provides such capability too. However switching to the full tree iteration in cases

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-05-29 Thread Peter Zijlstra
On Mon, May 29, 2017 at 01:56:05PM +0300, Alexey Budankov wrote: > On 29.05.2017 13:43, Peter Zijlstra wrote: > > Why can't the tree do both? > > > > Well, indeed, the tree provides such capability too. However switching to > the full tree iteration in cases where we now go through _groups

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-05-29 Thread Peter Zijlstra
On Mon, May 29, 2017 at 01:56:05PM +0300, Alexey Budankov wrote: > On 29.05.2017 13:43, Peter Zijlstra wrote: > > Why can't the tree do both? > > > > Well, indeed, the tree provides such capability too. However switching to > the full tree iteration in cases where we now go through _groups

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-05-29 Thread Alexey Budankov
On 29.05.2017 13:43, Peter Zijlstra wrote: On Mon, May 29, 2017 at 12:15:14PM +0300, Alexey Budankov wrote: On 29.05.2017 10:46, Peter Zijlstra wrote: On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote: @@ -742,7 +772,17 @@ struct perf_event_context { struct list_head

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-05-29 Thread Alexey Budankov
On 29.05.2017 13:43, Peter Zijlstra wrote: On Mon, May 29, 2017 at 12:15:14PM +0300, Alexey Budankov wrote: On 29.05.2017 10:46, Peter Zijlstra wrote: On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote: @@ -742,7 +772,17 @@ struct perf_event_context { struct list_head

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-05-29 Thread Alexey Budankov
On 29.05.2017 13:33, Peter Zijlstra wrote: On Mon, May 29, 2017 at 12:24:53PM +0300, Alexey Budankov wrote: On 29.05.2017 10:45, Peter Zijlstra wrote: On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote: Solution: cpu indexed trees for perf_event_context::pinned_groups and

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-05-29 Thread Alexey Budankov
On 29.05.2017 13:33, Peter Zijlstra wrote: On Mon, May 29, 2017 at 12:24:53PM +0300, Alexey Budankov wrote: On 29.05.2017 10:45, Peter Zijlstra wrote: On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote: Solution: cpu indexed trees for perf_event_context::pinned_groups and

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-05-29 Thread Peter Zijlstra
On Mon, May 29, 2017 at 12:15:14PM +0300, Alexey Budankov wrote: > On 29.05.2017 10:46, Peter Zijlstra wrote: > > On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote: > > > @@ -742,7 +772,17 @@ struct perf_event_context { > > > > > > struct list_head

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-05-29 Thread Peter Zijlstra
On Mon, May 29, 2017 at 12:15:14PM +0300, Alexey Budankov wrote: > On 29.05.2017 10:46, Peter Zijlstra wrote: > > On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote: > > > @@ -742,7 +772,17 @@ struct perf_event_context { > > > > > > struct list_head

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-05-29 Thread Peter Zijlstra
On Mon, May 29, 2017 at 12:24:53PM +0300, Alexey Budankov wrote: > On 29.05.2017 10:45, Peter Zijlstra wrote: > > On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote: > > > Solution: > > > > > > cpu indexed trees for perf_event_context::pinned_groups and > > >

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-05-29 Thread Peter Zijlstra
On Mon, May 29, 2017 at 12:24:53PM +0300, Alexey Budankov wrote: > On 29.05.2017 10:45, Peter Zijlstra wrote: > > On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote: > > > Solution: > > > > > > cpu indexed trees for perf_event_context::pinned_groups and > > >

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-05-29 Thread Alexey Budankov
On 29.05.2017 10:45, Peter Zijlstra wrote: On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote: Solution: cpu indexed trees for perf_event_context::pinned_groups and perf_event_context::flexible_groups lists are introduced. Every tree node keeps a list of groups allocated for the

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-05-29 Thread Alexey Budankov
On 29.05.2017 10:45, Peter Zijlstra wrote: On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote: Solution: cpu indexed trees for perf_event_context::pinned_groups and perf_event_context::flexible_groups lists are introduced. Every tree node keeps a list of groups allocated for the

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-05-29 Thread Alexey Budankov
On 29.05.2017 10:46, Peter Zijlstra wrote: On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote: @@ -571,6 +587,27 @@ struct perf_event { * either sufficies for read. */ struct list_headgroup_entry; + /* +* Node on the pinned or

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-05-29 Thread Alexey Budankov
On 29.05.2017 10:46, Peter Zijlstra wrote: On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote: @@ -571,6 +587,27 @@ struct perf_event { * either sufficies for read. */ struct list_headgroup_entry; + /* +* Node on the pinned or

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-05-29 Thread Peter Zijlstra
On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote: > @@ -571,6 +587,27 @@ struct perf_event { >* either sufficies for read. >*/ > struct list_headgroup_entry; > + /* > + * Node on the pinned or flexible tree located at the event context;

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-05-29 Thread Peter Zijlstra
On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote: > @@ -571,6 +587,27 @@ struct perf_event { >* either sufficies for read. >*/ > struct list_headgroup_entry; > + /* > + * Node on the pinned or flexible tree located at the event context;

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-05-29 Thread Peter Zijlstra
On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote: > Solution: > > cpu indexed trees for perf_event_context::pinned_groups and > perf_event_context::flexible_groups lists are introduced. Every tree node > keeps a list of groups allocated for the same cpu. A tree references only >

Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-05-29 Thread Peter Zijlstra
On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote: > Solution: > > cpu indexed trees for perf_event_context::pinned_groups and > perf_event_context::flexible_groups lists are introduced. Every tree node > keeps a list of groups allocated for the same cpu. A tree references only >

[PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-05-27 Thread Alexey Budankov
Motivation: The issue manifests like 4x slowdown when profiling single thread STREAM benchmark on Intel Xeon Phi running RHEL7.2 (Intel MPSS distribution). Perf profiling is done in per-process mode and involves about 30 core events. In case the benchmark is OpenMP based and runs under profiling

[PATCH v2]: perf/core: addressing 4x slowdown during per-process, profiling of STREAM benchmark on Intel Xeon Phi

2017-05-27 Thread Alexey Budankov
Motivation: The issue manifests like 4x slowdown when profiling single thread STREAM benchmark on Intel Xeon Phi running RHEL7.2 (Intel MPSS distribution). Perf profiling is done in per-process mode and involves about 30 core events. In case the benchmark is OpenMP based and runs under profiling