Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-16 Thread Steven Rostedt
On Fri, 16 Jan 2015 05:40:59 -0800 Eric Dumazet wrote: > I made same observation about 3 years ago, on old cpus. > Thank you for letting me know. I was thinking I was going insane! (yeah yeah, there's lots of people who will still say that I've already gone insane, but at least I know my memor

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-16 Thread Eric Dumazet
On Thu, 2015-01-15 at 23:07 -0500, Steven Rostedt wrote: > On Thu, 15 Jan 2015 21:57:58 -0600 (CST) > Christoph Lameter wrote: > > > > I get: > > > > > > mov%gs:0x18(%rax),%rdx > > > > > > Looks to me that %gs is used. > > > > %gs is used as a segment prefix. That does not add sign

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-15 Thread Steven Rostedt
On Thu, 15 Jan 2015 21:57:58 -0600 (CST) Christoph Lameter wrote: > > I get: > > > > mov%gs:0x18(%rax),%rdx > > > > Looks to me that %gs is used. > > %gs is used as a segment prefix. That does not add significant cycles. > Retrieving the content of %gs and loading it into another

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-15 Thread Steven Rostedt
On Thu, 15 Jan 2015 22:51:30 -0500 Steven Rostedt wrote: > > I haven't done benchmarks in a while, so perhaps accessing the %gs > segment isn't as expensive as I saw it before. I'll have to profile > function tracing on my i7 and see where things are slow again. I just ran it on my i7, and yeah

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-15 Thread Christoph Lameter
> I get: > > mov%gs:0x18(%rax),%rdx > > Looks to me that %gs is used. %gs is used as a segment prefix. That does not add significant cycles. Retrieving the content of %gs and loading it into another register would be expensive in terms of cpu cycles. -- To unsubscribe from this

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-15 Thread Steven Rostedt
On Thu, 15 Jan 2015 21:27:14 -0600 (CST) Christoph Lameter wrote: > > The %gs register is not used since the address of the per cpu area is > available as one of the first fields in the per cpu areas. Have you disassembled your code? Looking at put_cpu_partial() from 3.19-rc3 where it does:

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-15 Thread Christoph Lameter
On Thu, 15 Jan 2015, Andrew Morton wrote: > > I saw roughly 5% win in a fast-path loop over kmem_cache_alloc/free > > in CONFIG_PREEMPT. (14.821 ns -> 14.049 ns) > > I'm surprised. preempt_disable/enable are pretty fast. I wonder why > this makes a measurable difference. Perhaps preempt_enable(

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-15 Thread Christoph Lameter
On Thu, 15 Jan 2015, Steven Rostedt wrote: > profiling function tracing I discovered that accessing preempt_count > was actually quite expensive, even just to read. But it may not be as > bad since Peter Zijlstra converted preempt_count to a per_cpu variable. > Although, IIRC, the perf profiling s

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-15 Thread Steven Rostedt
On Thu, 15 Jan 2015 17:16:34 -0800 Andrew Morton wrote: > > I saw roughly 5% win in a fast-path loop over kmem_cache_alloc/free > > in CONFIG_PREEMPT. (14.821 ns -> 14.049 ns) > > I'm surprised. preempt_disable/enable are pretty fast. I wonder why > this makes a measurable difference. Perhaps

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-15 Thread Andrew Morton
On Thu, 15 Jan 2015 16:40:32 +0900 Joonsoo Kim wrote: > We had to insert a preempt enable/disable in the fastpath a while ago > in order to guarantee that tid and kmem_cache_cpu are retrieved on the > same cpu. It is the problem only for CONFIG_PREEMPT in which scheduler > can move the process to

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-15 Thread Jesper Dangaard Brouer
On Thu, 15 Jan 2015 16:40:32 +0900 Joonsoo Kim wrote: [...] > > I saw roughly 5% win in a fast-path loop over kmem_cache_alloc/free > in CONFIG_PREEMPT. (14.821 ns -> 14.049 ns) > > Below is the result of Christoph's slab_test reported by > Jesper Dangaard Brouer. > [...] Acked-by: Jesper Dang

[PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-14 Thread Joonsoo Kim
We had to insert a preempt enable/disable in the fastpath a while ago in order to guarantee that tid and kmem_cache_cpu are retrieved on the same cpu. It is the problem only for CONFIG_PREEMPT in which scheduler can move the process to other cpu during retrieving data. Now, I reach the solution to