Hi Peter,
On 21.06.2017 18:39, Alexey Budankov wrote:
>
> Hi,
>
> On 15.06.2017 20:42, Alexey Budankov wrote:
>> On 29.05.2017 14:45, Alexey Budankov wrote:
>>> On 29.05.2017 14:23, Peter Zijlstra wrote:
On Mon, May 29, 2017 at 01:56:05PM +0300, Alexey Budankov wrote:
> On 29.05.2017
Hi Peter,
On 21.06.2017 18:39, Alexey Budankov wrote:
>
> Hi,
>
> On 15.06.2017 20:42, Alexey Budankov wrote:
>> On 29.05.2017 14:45, Alexey Budankov wrote:
>>> On 29.05.2017 14:23, Peter Zijlstra wrote:
On Mon, May 29, 2017 at 01:56:05PM +0300, Alexey Budankov wrote:
> On 29.05.2017
Hi,
On 15.06.2017 20:42, Alexey Budankov wrote:
On 29.05.2017 14:45, Alexey Budankov wrote:
On 29.05.2017 14:23, Peter Zijlstra wrote:
On Mon, May 29, 2017 at 01:56:05PM +0300, Alexey Budankov wrote:
On 29.05.2017 13:43, Peter Zijlstra wrote:
Why can't the tree do both?
Well, indeed,
Hi,
On 15.06.2017 20:42, Alexey Budankov wrote:
On 29.05.2017 14:45, Alexey Budankov wrote:
On 29.05.2017 14:23, Peter Zijlstra wrote:
On Mon, May 29, 2017 at 01:56:05PM +0300, Alexey Budankov wrote:
On 29.05.2017 13:43, Peter Zijlstra wrote:
Why can't the tree do both?
Well, indeed,
On 29.05.2017 14:45, Alexey Budankov wrote:
On 29.05.2017 14:23, Peter Zijlstra wrote:
On Mon, May 29, 2017 at 01:56:05PM +0300, Alexey Budankov wrote:
On 29.05.2017 13:43, Peter Zijlstra wrote:
Why can't the tree do both?
Well, indeed, the tree provides such capability too. However
On 29.05.2017 14:45, Alexey Budankov wrote:
On 29.05.2017 14:23, Peter Zijlstra wrote:
On Mon, May 29, 2017 at 01:56:05PM +0300, Alexey Budankov wrote:
On 29.05.2017 13:43, Peter Zijlstra wrote:
Why can't the tree do both?
Well, indeed, the tree provides such capability too. However
On 31.05.2017 3:04, Arun Kalyanasundaram wrote:
Hi Alexey,
I am interested in validating this fix. Can you please share some of
your testcases or let me know if you use any standard OpenMP
benchmarks?
- Arun
Hi Arun,
I am profiling STREAM benchmark running in 272 OpenMP threads. The
On 31.05.2017 3:04, Arun Kalyanasundaram wrote:
Hi Alexey,
I am interested in validating this fix. Can you please share some of
your testcases or let me know if you use any standard OpenMP
benchmarks?
- Arun
Hi Arun,
I am profiling STREAM benchmark running in 272 OpenMP threads. The
On 01.06.2017 0:33, David Carrillo-Cisneros wrote:
On Sat, May 27, 2017 at 4:19 AM, Alexey Budankov
wrote:
Motivation:
The issue manifests like 4x slowdown when profiling single thread STREAM
benchmark on Intel Xeon Phi running RHEL7.2 (Intel MPSS
On 01.06.2017 0:33, David Carrillo-Cisneros wrote:
On Sat, May 27, 2017 at 4:19 AM, Alexey Budankov
wrote:
Motivation:
The issue manifests like 4x slowdown when profiling single thread STREAM
benchmark on Intel Xeon Phi running RHEL7.2 (Intel MPSS distribution).
Perf profiling is done in
On Sat, May 27, 2017 at 4:19 AM, Alexey Budankov
wrote:
> Motivation:
>
> The issue manifests like 4x slowdown when profiling single thread STREAM
> benchmark on Intel Xeon Phi running RHEL7.2 (Intel MPSS distribution).
> Perf profiling is done in per-process mode
On Sat, May 27, 2017 at 4:19 AM, Alexey Budankov
wrote:
> Motivation:
>
> The issue manifests like 4x slowdown when profiling single thread STREAM
> benchmark on Intel Xeon Phi running RHEL7.2 (Intel MPSS distribution).
> Perf profiling is done in per-process mode and involves about 30 core
>
Hi Alexey,
I am interested in validating this fix. Can you please share some of
your testcases or let me know if you use any standard OpenMP
benchmarks?
- Arun
Hi Alexey,
I am interested in validating this fix. Can you please share some of
your testcases or let me know if you use any standard OpenMP
benchmarks?
- Arun
On 29.05.2017 14:23, Peter Zijlstra wrote:
On Mon, May 29, 2017 at 01:56:05PM +0300, Alexey Budankov wrote:
On 29.05.2017 13:43, Peter Zijlstra wrote:
Why can't the tree do both?
Well, indeed, the tree provides such capability too. However switching to
the full tree iteration in cases
On 29.05.2017 14:23, Peter Zijlstra wrote:
On Mon, May 29, 2017 at 01:56:05PM +0300, Alexey Budankov wrote:
On 29.05.2017 13:43, Peter Zijlstra wrote:
Why can't the tree do both?
Well, indeed, the tree provides such capability too. However switching to
the full tree iteration in cases
On Mon, May 29, 2017 at 01:56:05PM +0300, Alexey Budankov wrote:
> On 29.05.2017 13:43, Peter Zijlstra wrote:
> > Why can't the tree do both?
> >
>
> Well, indeed, the tree provides such capability too. However switching to
> the full tree iteration in cases where we now go through _groups
On Mon, May 29, 2017 at 01:56:05PM +0300, Alexey Budankov wrote:
> On 29.05.2017 13:43, Peter Zijlstra wrote:
> > Why can't the tree do both?
> >
>
> Well, indeed, the tree provides such capability too. However switching to
> the full tree iteration in cases where we now go through _groups
On 29.05.2017 13:43, Peter Zijlstra wrote:
On Mon, May 29, 2017 at 12:15:14PM +0300, Alexey Budankov wrote:
On 29.05.2017 10:46, Peter Zijlstra wrote:
On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote:
@@ -742,7 +772,17 @@ struct perf_event_context {
struct list_head
On 29.05.2017 13:43, Peter Zijlstra wrote:
On Mon, May 29, 2017 at 12:15:14PM +0300, Alexey Budankov wrote:
On 29.05.2017 10:46, Peter Zijlstra wrote:
On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote:
@@ -742,7 +772,17 @@ struct perf_event_context {
struct list_head
On 29.05.2017 13:33, Peter Zijlstra wrote:
On Mon, May 29, 2017 at 12:24:53PM +0300, Alexey Budankov wrote:
On 29.05.2017 10:45, Peter Zijlstra wrote:
On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote:
Solution:
cpu indexed trees for perf_event_context::pinned_groups and
On 29.05.2017 13:33, Peter Zijlstra wrote:
On Mon, May 29, 2017 at 12:24:53PM +0300, Alexey Budankov wrote:
On 29.05.2017 10:45, Peter Zijlstra wrote:
On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote:
Solution:
cpu indexed trees for perf_event_context::pinned_groups and
On Mon, May 29, 2017 at 12:15:14PM +0300, Alexey Budankov wrote:
> On 29.05.2017 10:46, Peter Zijlstra wrote:
> > On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote:
> > > @@ -742,7 +772,17 @@ struct perf_event_context {
> > >
> > > struct list_head
On Mon, May 29, 2017 at 12:15:14PM +0300, Alexey Budankov wrote:
> On 29.05.2017 10:46, Peter Zijlstra wrote:
> > On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote:
> > > @@ -742,7 +772,17 @@ struct perf_event_context {
> > >
> > > struct list_head
On Mon, May 29, 2017 at 12:24:53PM +0300, Alexey Budankov wrote:
> On 29.05.2017 10:45, Peter Zijlstra wrote:
> > On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote:
> > > Solution:
> > >
> > > cpu indexed trees for perf_event_context::pinned_groups and
> > >
On Mon, May 29, 2017 at 12:24:53PM +0300, Alexey Budankov wrote:
> On 29.05.2017 10:45, Peter Zijlstra wrote:
> > On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote:
> > > Solution:
> > >
> > > cpu indexed trees for perf_event_context::pinned_groups and
> > >
On 29.05.2017 10:45, Peter Zijlstra wrote:
On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote:
Solution:
cpu indexed trees for perf_event_context::pinned_groups and
perf_event_context::flexible_groups lists are introduced. Every tree node
keeps a list of groups allocated for the
On 29.05.2017 10:45, Peter Zijlstra wrote:
On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote:
Solution:
cpu indexed trees for perf_event_context::pinned_groups and
perf_event_context::flexible_groups lists are introduced. Every tree node
keeps a list of groups allocated for the
On 29.05.2017 10:46, Peter Zijlstra wrote:
On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote:
@@ -571,6 +587,27 @@ struct perf_event {
* either sufficies for read.
*/
struct list_headgroup_entry;
+ /*
+* Node on the pinned or
On 29.05.2017 10:46, Peter Zijlstra wrote:
On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote:
@@ -571,6 +587,27 @@ struct perf_event {
* either sufficies for read.
*/
struct list_headgroup_entry;
+ /*
+* Node on the pinned or
On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote:
> @@ -571,6 +587,27 @@ struct perf_event {
>* either sufficies for read.
>*/
> struct list_headgroup_entry;
> + /*
> + * Node on the pinned or flexible tree located at the event context;
On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote:
> @@ -571,6 +587,27 @@ struct perf_event {
>* either sufficies for read.
>*/
> struct list_headgroup_entry;
> + /*
> + * Node on the pinned or flexible tree located at the event context;
On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote:
> Solution:
>
> cpu indexed trees for perf_event_context::pinned_groups and
> perf_event_context::flexible_groups lists are introduced. Every tree node
> keeps a list of groups allocated for the same cpu. A tree references only
>
On Sat, May 27, 2017 at 02:19:51PM +0300, Alexey Budankov wrote:
> Solution:
>
> cpu indexed trees for perf_event_context::pinned_groups and
> perf_event_context::flexible_groups lists are introduced. Every tree node
> keeps a list of groups allocated for the same cpu. A tree references only
>
Motivation:
The issue manifests like 4x slowdown when profiling single thread STREAM
benchmark on Intel Xeon Phi running RHEL7.2 (Intel MPSS distribution).
Perf profiling is done in per-process mode and involves about 30 core
events. In case the benchmark is OpenMP based and runs under profiling
Motivation:
The issue manifests like 4x slowdown when profiling single thread STREAM
benchmark on Intel Xeon Phi running RHEL7.2 (Intel MPSS distribution).
Perf profiling is done in per-process mode and involves about 30 core
events. In case the benchmark is OpenMP based and runs under profiling
36 matches
Mail list logo