Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-28 Thread Don Zickus
On Sat, Oct 26, 2013 at 12:36:52PM +0200, Ingo Molnar wrote: > > * Don Zickus wrote: > > > On Thu, Oct 24, 2013 at 12:52:06PM +0200, Peter Zijlstra wrote: > > > On Wed, Oct 23, 2013 at 10:48:38PM +0200, Peter Zijlstra wrote: > > > > I'll also make sure to test we actually hit the fault path > >

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-26 Thread Ingo Molnar
* Don Zickus wrote: > On Thu, Oct 24, 2013 at 12:52:06PM +0200, Peter Zijlstra wrote: > > On Wed, Oct 23, 2013 at 10:48:38PM +0200, Peter Zijlstra wrote: > > > I'll also make sure to test we actually hit the fault path > > > by concurrently running something like: > > > > > > while :; echo 1 >

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-25 Thread Peter Zijlstra
On Fri, Oct 25, 2013 at 12:33:03PM -0400, Don Zickus wrote: > Hi Peter, > > I finally had a chance to run this on my machine. From my testing, it > looks good. Better performance numbers. I think my longest latency went > from 300K cycles down to 150K cycles and very few of those (most are unde

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-25 Thread Don Zickus
On Thu, Oct 24, 2013 at 12:52:06PM +0200, Peter Zijlstra wrote: > On Wed, Oct 23, 2013 at 10:48:38PM +0200, Peter Zijlstra wrote: > > I'll also make sure to test we actually hit the fault path > > by concurrently running something like: > > > > while :; echo 1 > /proc/sys/vm/drop_caches ; done >

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-24 Thread Peter Zijlstra
On Thu, Oct 24, 2013 at 09:47:06AM -0400, Don Zickus wrote: > > Don, can you give this stuff a spin on your system? > > I'll try to grab the machine I was testing with and see what this patch > does. Thanks! I assume this can go on top of the other patch that was > committed to -tip last week?

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-24 Thread Don Zickus
On Thu, Oct 24, 2013 at 12:52:06PM +0200, Peter Zijlstra wrote: > On Wed, Oct 23, 2013 at 10:48:38PM +0200, Peter Zijlstra wrote: > > I'll also make sure to test we actually hit the fault path > > by concurrently running something like: > > > > while :; echo 1 > /proc/sys/vm/drop_caches ; done >

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-24 Thread Peter Zijlstra
On Wed, Oct 23, 2013 at 10:48:38PM +0200, Peter Zijlstra wrote: > I'll also make sure to test we actually hit the fault path > by concurrently running something like: > > while :; echo 1 > /proc/sys/vm/drop_caches ; done > > while doing perf top or so.. So the below appears to work; I've ran:

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-23 Thread Peter Zijlstra
On Wed, Oct 23, 2013 at 08:09:53AM +0100, Linus Torvalds wrote: > On Tue, Oct 22, 2013 at 10:12 PM, Peter Zijlstra wrote: > >> > >> Careful! There is one magic piece of state that you need to > >> save-and-restore if you do this, namely %cr2. Taking a page fault > >> always writes to %cr2, and we

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-23 Thread Ingo Molnar
* Peter Zijlstra wrote: > On Thu, Oct 17, 2013 at 03:27:48PM -0700, Linus Torvalds wrote: > > On Thu, Oct 17, 2013 at 3:01 PM, Peter Zijlstra > > wrote: > > > > > > Oh wait,.. now that Steven fixed being able to take faults from NMI > > > context; we could actually try copy_from_user_inatomic(

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-23 Thread Linus Torvalds
On Tue, Oct 22, 2013 at 10:12 PM, Peter Zijlstra wrote: >> >> Careful! There is one magic piece of state that you need to >> save-and-restore if you do this, namely %cr2. Taking a page fault >> always writes to %cr2, and we must *not* corrupt it in the NMI >> handler. > > It looks like this is alr

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-22 Thread Peter Zijlstra
On Thu, Oct 17, 2013 at 03:27:48PM -0700, Linus Torvalds wrote: > On Thu, Oct 17, 2013 at 3:01 PM, Peter Zijlstra wrote: > > > > Oh wait,.. now that Steven fixed being able to take faults from NMI > > context; we could actually try copy_from_user_inatomic(). Being able to > > directly access users

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-17 Thread Linus Torvalds
On Thu, Oct 17, 2013 at 3:01 PM, Peter Zijlstra wrote: > > Oh wait,.. now that Steven fixed being able to take faults from NMI > context; we could actually try copy_from_user_inatomic(). Being able to > directly access userspace would make the whole deal a lot easier again. Careful! There is one

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-17 Thread Peter Zijlstra
On Thu, Oct 17, 2013 at 11:26:23AM -0700, Linus Torvalds wrote: > On Thu, Oct 17, 2013 at 9:30 AM, Peter Zijlstra wrote: > > > > So avoid having to call copy_from_user_nmi() for every instruction. > > Since we already limit the max basic block size, we can easily > > pre-allocate a piece of memory

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-17 Thread Peter Zijlstra
On Thu, Oct 17, 2013 at 11:08:16PM +0200, Peter Zijlstra wrote: > I did a patch that avoids the page count mucking about, Don didn't see > any significant improvements from it. On top of which there's another patch -- which could as easily be done without it, that adds some state to the copy_from_

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-17 Thread Peter Zijlstra
On Thu, Oct 17, 2013 at 11:26:23AM -0700, Linus Torvalds wrote: > On Thu, Oct 17, 2013 at 9:30 AM, Peter Zijlstra wrote: > > > > So avoid having to call copy_from_user_nmi() for every instruction. > > Since we already limit the max basic block size, we can easily > > pre-allocate a piece of memory

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-17 Thread Linus Torvalds
On Thu, Oct 17, 2013 at 9:30 AM, Peter Zijlstra wrote: > > So avoid having to call copy_from_user_nmi() for every instruction. > Since we already limit the max basic block size, we can easily > pre-allocate a piece of memory to copy the entire thing into in one > go. copy_from_user_nmi() itself i

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-17 Thread Peter Zijlstra
On Thu, Oct 17, 2013 at 12:04:39PM -0400, Don Zickus wrote: > I take that back the copy_from_user_nmi_iter is not super fast, I just had > a bug in how I accumulate total time. So some how this approach is slower > that yesterdays. Humm interesting.. Slightly weird, because that instruction deco

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-17 Thread Don Zickus
On Thu, Oct 17, 2013 at 12:00:34PM -0400, Don Zickus wrote: > On Thu, Oct 17, 2013 at 11:41:45AM +0200, Peter Zijlstra wrote: > > On Thu, Oct 17, 2013 at 01:07:12AM +0200, Peter Zijlstra wrote: > > > On Wed, Oct 16, 2013 at 11:03:19PM +0200, Peter Zijlstra wrote: > > > > Anyway; if you want to have

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-17 Thread Don Zickus
On Thu, Oct 17, 2013 at 11:41:45AM +0200, Peter Zijlstra wrote: > On Thu, Oct 17, 2013 at 01:07:12AM +0200, Peter Zijlstra wrote: > > On Wed, Oct 16, 2013 at 11:03:19PM +0200, Peter Zijlstra wrote: > > > Anyway; if you want to have a go at this, feel free. > > > > OK, couldn't help myself; complet

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-17 Thread Peter Zijlstra
On Thu, Oct 17, 2013 at 05:09:44PM +0200, Peter Zijlstra wrote: > The patches you find in: > > http://programming.kicks-ass.net/sekrit/patches.tar.bz2 # sha256sum patches.tar.bz2 28e26d4a20004eee231a4c0c6067508a322241046b400a226af1cceed8854bfb patches.tar.bz2 -- To unsubscribe from this list:

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-17 Thread Peter Zijlstra
On Thu, Oct 17, 2013 at 11:03:58AM -0400, Don Zickus wrote: > On Thu, Oct 17, 2013 at 04:51:31PM +0200, Peter Zijlstra wrote: > > On Thu, Oct 17, 2013 at 10:49:13AM -0400, Don Zickus wrote: > > > For some reason this patch is page faulting at an invalid address inside > > > __intel_pmu_pebs_event()

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-17 Thread Don Zickus
On Thu, Oct 17, 2013 at 04:51:31PM +0200, Peter Zijlstra wrote: > On Thu, Oct 17, 2013 at 10:49:13AM -0400, Don Zickus wrote: > > For some reason this patch is page faulting at an invalid address inside > > __intel_pmu_pebs_event(). > > Ah yes, I lost a refresh, but read on; I've send a gazillion

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-17 Thread Peter Zijlstra
On Thu, Oct 17, 2013 at 10:49:13AM -0400, Don Zickus wrote: > For some reason this patch is page faulting at an invalid address inside > __intel_pmu_pebs_event(). Ah yes, I lost a refresh, but read on; I've send a gazillion new emails since ;-) I think it was something like: s/this_cpu_ptr/this_c

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-17 Thread Don Zickus
On Wed, Oct 16, 2013 at 12:57:55PM +0200, Peter Zijlstra wrote: > A prettier patch below. The main difference is on-demand allocation of > the scratch buffer. > > --- > Subject: perf, x86: Optimize intel_pmu_pebs_fixup_ip() > From: Peter Zijlstra > Date: Tue, 15 Oct 2013 12:14:04 +0200 > > On Mo

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-17 Thread Peter Zijlstra
On Wed, Oct 16, 2013 at 03:31:25PM +0200, Peter Zijlstra wrote: > Pick a smaller box? I seem to be able to reproduce on my wsm-ep, which > boots inside a minute :-) OK, so what I'm actually seeing on my WSM is that sched/clock.c is 'broken' for the purpose we're using it for. What triggered it is

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-17 Thread Peter Zijlstra
On Wed, Oct 16, 2013 at 03:31:25PM +0200, Peter Zijlstra wrote: > On Wed, Oct 16, 2013 at 08:46:49AM -0400, Don Zickus wrote: > > On Wed, Oct 16, 2013 at 12:57:55PM +0200, Peter Zijlstra wrote: > > > A prettier patch below. The main difference is on-demand allocation of > > > the scratch buffer. >

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-17 Thread Peter Zijlstra
On Thu, Oct 17, 2013 at 01:07:12AM +0200, Peter Zijlstra wrote: > On Wed, Oct 16, 2013 at 11:03:19PM +0200, Peter Zijlstra wrote: > > Anyway; if you want to have a go at this, feel free. > > OK, couldn't help myself; completely untested patch below. > > I think the full once copy it best for the

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-16 Thread Peter Zijlstra
On Wed, Oct 16, 2013 at 11:03:19PM +0200, Peter Zijlstra wrote: > Anyway; if you want to have a go at this, feel free. OK, couldn't help myself; completely untested patch below. I think the full once copy it best for the decode as even with the below interface you'd end up doing a lot of duplicat

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-16 Thread Peter Zijlstra
On Wed, Oct 16, 2013 at 01:52:27PM -0700, Andi Kleen wrote: > > So avoid having to call copy_from_user_nmi() for every instruction. > > Since we already limit the max basic block size, we can easily > > pre-allocate a piece of memory to copy the entire thing into in one > > go. > > It would be bet

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-16 Thread Andi Kleen
> So avoid having to call copy_from_user_nmi() for every instruction. > Since we already limit the max basic block size, we can easily > pre-allocate a piece of memory to copy the entire thing into in one > go. It would be better/more generic if you split copy_from_user_nmi() into init() copy() en

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-16 Thread Don Zickus
On Wed, Oct 16, 2013 at 03:31:25PM +0200, Peter Zijlstra wrote: > On Wed, Oct 16, 2013 at 08:46:49AM -0400, Don Zickus wrote: > > On Wed, Oct 16, 2013 at 12:57:55PM +0200, Peter Zijlstra wrote: > > > A prettier patch below. The main difference is on-demand allocation of > > > the scratch buffer. >

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-16 Thread Peter Zijlstra
On Wed, Oct 16, 2013 at 08:46:49AM -0400, Don Zickus wrote: > On Wed, Oct 16, 2013 at 12:57:55PM +0200, Peter Zijlstra wrote: > > A prettier patch below. The main difference is on-demand allocation of > > the scratch buffer. > > I'll see if I can sanity test this in the next couple hours. > > Fur

Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-16 Thread Don Zickus
On Wed, Oct 16, 2013 at 12:57:55PM +0200, Peter Zijlstra wrote: > A prettier patch below. The main difference is on-demand allocation of > the scratch buffer. I'll see if I can sanity test this in the next couple hours. Further testing yesterday showed that intel_pmu_drain_pebs_nhm still has long

[PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()

2013-10-16 Thread Peter Zijlstra
A prettier patch below. The main difference is on-demand allocation of the scratch buffer. --- Subject: perf, x86: Optimize intel_pmu_pebs_fixup_ip() From: Peter Zijlstra Date: Tue, 15 Oct 2013 12:14:04 +0200 On Mon, Oct 14, 2013 at 04:35:49PM -0400, Don Zickus wrote: > While there are a few pla