On Sat, Oct 26, 2013 at 12:36:52PM +0200, Ingo Molnar wrote:
>
> * Don Zickus wrote:
>
> > On Thu, Oct 24, 2013 at 12:52:06PM +0200, Peter Zijlstra wrote:
> > > On Wed, Oct 23, 2013 at 10:48:38PM +0200, Peter Zijlstra wrote:
> > > > I'll also make sure to test we actually hit the fault path
> >
* Don Zickus wrote:
> On Thu, Oct 24, 2013 at 12:52:06PM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 23, 2013 at 10:48:38PM +0200, Peter Zijlstra wrote:
> > > I'll also make sure to test we actually hit the fault path
> > > by concurrently running something like:
> > >
> > > while :; echo 1 >
On Fri, Oct 25, 2013 at 12:33:03PM -0400, Don Zickus wrote:
> Hi Peter,
>
> I finally had a chance to run this on my machine. From my testing, it
> looks good. Better performance numbers. I think my longest latency went
> from 300K cycles down to 150K cycles and very few of those (most are unde
On Thu, Oct 24, 2013 at 12:52:06PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 23, 2013 at 10:48:38PM +0200, Peter Zijlstra wrote:
> > I'll also make sure to test we actually hit the fault path
> > by concurrently running something like:
> >
> > while :; echo 1 > /proc/sys/vm/drop_caches ; done
>
On Thu, Oct 24, 2013 at 09:47:06AM -0400, Don Zickus wrote:
> > Don, can you give this stuff a spin on your system?
>
> I'll try to grab the machine I was testing with and see what this patch
> does. Thanks! I assume this can go on top of the other patch that was
> committed to -tip last week?
On Thu, Oct 24, 2013 at 12:52:06PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 23, 2013 at 10:48:38PM +0200, Peter Zijlstra wrote:
> > I'll also make sure to test we actually hit the fault path
> > by concurrently running something like:
> >
> > while :; echo 1 > /proc/sys/vm/drop_caches ; done
>
On Wed, Oct 23, 2013 at 10:48:38PM +0200, Peter Zijlstra wrote:
> I'll also make sure to test we actually hit the fault path
> by concurrently running something like:
>
> while :; echo 1 > /proc/sys/vm/drop_caches ; done
>
> while doing perf top or so..
So the below appears to work; I've ran:
On Wed, Oct 23, 2013 at 08:09:53AM +0100, Linus Torvalds wrote:
> On Tue, Oct 22, 2013 at 10:12 PM, Peter Zijlstra wrote:
> >>
> >> Careful! There is one magic piece of state that you need to
> >> save-and-restore if you do this, namely %cr2. Taking a page fault
> >> always writes to %cr2, and we
* Peter Zijlstra wrote:
> On Thu, Oct 17, 2013 at 03:27:48PM -0700, Linus Torvalds wrote:
> > On Thu, Oct 17, 2013 at 3:01 PM, Peter Zijlstra
> > wrote:
> > >
> > > Oh wait,.. now that Steven fixed being able to take faults from NMI
> > > context; we could actually try copy_from_user_inatomic(
On Tue, Oct 22, 2013 at 10:12 PM, Peter Zijlstra wrote:
>>
>> Careful! There is one magic piece of state that you need to
>> save-and-restore if you do this, namely %cr2. Taking a page fault
>> always writes to %cr2, and we must *not* corrupt it in the NMI
>> handler.
>
> It looks like this is alr
On Thu, Oct 17, 2013 at 03:27:48PM -0700, Linus Torvalds wrote:
> On Thu, Oct 17, 2013 at 3:01 PM, Peter Zijlstra wrote:
> >
> > Oh wait,.. now that Steven fixed being able to take faults from NMI
> > context; we could actually try copy_from_user_inatomic(). Being able to
> > directly access users
On Thu, Oct 17, 2013 at 3:01 PM, Peter Zijlstra wrote:
>
> Oh wait,.. now that Steven fixed being able to take faults from NMI
> context; we could actually try copy_from_user_inatomic(). Being able to
> directly access userspace would make the whole deal a lot easier again.
Careful! There is one
On Thu, Oct 17, 2013 at 11:26:23AM -0700, Linus Torvalds wrote:
> On Thu, Oct 17, 2013 at 9:30 AM, Peter Zijlstra wrote:
> >
> > So avoid having to call copy_from_user_nmi() for every instruction.
> > Since we already limit the max basic block size, we can easily
> > pre-allocate a piece of memory
On Thu, Oct 17, 2013 at 11:08:16PM +0200, Peter Zijlstra wrote:
> I did a patch that avoids the page count mucking about, Don didn't see
> any significant improvements from it.
On top of which there's another patch -- which could as easily be done
without it, that adds some state to the copy_from_
On Thu, Oct 17, 2013 at 11:26:23AM -0700, Linus Torvalds wrote:
> On Thu, Oct 17, 2013 at 9:30 AM, Peter Zijlstra wrote:
> >
> > So avoid having to call copy_from_user_nmi() for every instruction.
> > Since we already limit the max basic block size, we can easily
> > pre-allocate a piece of memory
On Thu, Oct 17, 2013 at 9:30 AM, Peter Zijlstra wrote:
>
> So avoid having to call copy_from_user_nmi() for every instruction.
> Since we already limit the max basic block size, we can easily
> pre-allocate a piece of memory to copy the entire thing into in one
> go.
copy_from_user_nmi() itself i
On Thu, Oct 17, 2013 at 12:04:39PM -0400, Don Zickus wrote:
> I take that back the copy_from_user_nmi_iter is not super fast, I just had
> a bug in how I accumulate total time. So some how this approach is slower
> that yesterdays.
Humm interesting..
Slightly weird, because that instruction deco
On Thu, Oct 17, 2013 at 12:00:34PM -0400, Don Zickus wrote:
> On Thu, Oct 17, 2013 at 11:41:45AM +0200, Peter Zijlstra wrote:
> > On Thu, Oct 17, 2013 at 01:07:12AM +0200, Peter Zijlstra wrote:
> > > On Wed, Oct 16, 2013 at 11:03:19PM +0200, Peter Zijlstra wrote:
> > > > Anyway; if you want to have
On Thu, Oct 17, 2013 at 11:41:45AM +0200, Peter Zijlstra wrote:
> On Thu, Oct 17, 2013 at 01:07:12AM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 16, 2013 at 11:03:19PM +0200, Peter Zijlstra wrote:
> > > Anyway; if you want to have a go at this, feel free.
> >
> > OK, couldn't help myself; complet
On Thu, Oct 17, 2013 at 05:09:44PM +0200, Peter Zijlstra wrote:
> The patches you find in:
>
> http://programming.kicks-ass.net/sekrit/patches.tar.bz2
# sha256sum patches.tar.bz2
28e26d4a20004eee231a4c0c6067508a322241046b400a226af1cceed8854bfb
patches.tar.bz2
--
To unsubscribe from this list:
On Thu, Oct 17, 2013 at 11:03:58AM -0400, Don Zickus wrote:
> On Thu, Oct 17, 2013 at 04:51:31PM +0200, Peter Zijlstra wrote:
> > On Thu, Oct 17, 2013 at 10:49:13AM -0400, Don Zickus wrote:
> > > For some reason this patch is page faulting at an invalid address inside
> > > __intel_pmu_pebs_event()
On Thu, Oct 17, 2013 at 04:51:31PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 17, 2013 at 10:49:13AM -0400, Don Zickus wrote:
> > For some reason this patch is page faulting at an invalid address inside
> > __intel_pmu_pebs_event().
>
> Ah yes, I lost a refresh, but read on; I've send a gazillion
On Thu, Oct 17, 2013 at 10:49:13AM -0400, Don Zickus wrote:
> For some reason this patch is page faulting at an invalid address inside
> __intel_pmu_pebs_event().
Ah yes, I lost a refresh, but read on; I've send a gazillion new emails
since ;-)
I think it was something like: s/this_cpu_ptr/this_c
On Wed, Oct 16, 2013 at 12:57:55PM +0200, Peter Zijlstra wrote:
> A prettier patch below. The main difference is on-demand allocation of
> the scratch buffer.
>
> ---
> Subject: perf, x86: Optimize intel_pmu_pebs_fixup_ip()
> From: Peter Zijlstra
> Date: Tue, 15 Oct 2013 12:14:04 +0200
>
> On Mo
On Wed, Oct 16, 2013 at 03:31:25PM +0200, Peter Zijlstra wrote:
> Pick a smaller box? I seem to be able to reproduce on my wsm-ep, which
> boots inside a minute :-)
OK, so what I'm actually seeing on my WSM is that sched/clock.c is
'broken' for the purpose we're using it for.
What triggered it is
On Wed, Oct 16, 2013 at 03:31:25PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 16, 2013 at 08:46:49AM -0400, Don Zickus wrote:
> > On Wed, Oct 16, 2013 at 12:57:55PM +0200, Peter Zijlstra wrote:
> > > A prettier patch below. The main difference is on-demand allocation of
> > > the scratch buffer.
>
On Thu, Oct 17, 2013 at 01:07:12AM +0200, Peter Zijlstra wrote:
> On Wed, Oct 16, 2013 at 11:03:19PM +0200, Peter Zijlstra wrote:
> > Anyway; if you want to have a go at this, feel free.
>
> OK, couldn't help myself; completely untested patch below.
>
> I think the full once copy it best for the
On Wed, Oct 16, 2013 at 11:03:19PM +0200, Peter Zijlstra wrote:
> Anyway; if you want to have a go at this, feel free.
OK, couldn't help myself; completely untested patch below.
I think the full once copy it best for the decode as even with the below
interface you'd end up doing a lot of duplicat
On Wed, Oct 16, 2013 at 01:52:27PM -0700, Andi Kleen wrote:
> > So avoid having to call copy_from_user_nmi() for every instruction.
> > Since we already limit the max basic block size, we can easily
> > pre-allocate a piece of memory to copy the entire thing into in one
> > go.
>
> It would be bet
> So avoid having to call copy_from_user_nmi() for every instruction.
> Since we already limit the max basic block size, we can easily
> pre-allocate a piece of memory to copy the entire thing into in one
> go.
It would be better/more generic if you split copy_from_user_nmi() into
init() copy() en
On Wed, Oct 16, 2013 at 03:31:25PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 16, 2013 at 08:46:49AM -0400, Don Zickus wrote:
> > On Wed, Oct 16, 2013 at 12:57:55PM +0200, Peter Zijlstra wrote:
> > > A prettier patch below. The main difference is on-demand allocation of
> > > the scratch buffer.
>
On Wed, Oct 16, 2013 at 08:46:49AM -0400, Don Zickus wrote:
> On Wed, Oct 16, 2013 at 12:57:55PM +0200, Peter Zijlstra wrote:
> > A prettier patch below. The main difference is on-demand allocation of
> > the scratch buffer.
>
> I'll see if I can sanity test this in the next couple hours.
>
> Fur
On Wed, Oct 16, 2013 at 12:57:55PM +0200, Peter Zijlstra wrote:
> A prettier patch below. The main difference is on-demand allocation of
> the scratch buffer.
I'll see if I can sanity test this in the next couple hours.
Further testing yesterday showed that intel_pmu_drain_pebs_nhm still
has long
A prettier patch below. The main difference is on-demand allocation of
the scratch buffer.
---
Subject: perf, x86: Optimize intel_pmu_pebs_fixup_ip()
From: Peter Zijlstra
Date: Tue, 15 Oct 2013 12:14:04 +0200
On Mon, Oct 14, 2013 at 04:35:49PM -0400, Don Zickus wrote:
> While there are a few pla
34 matches
Mail list logo