Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-20 Thread Christoph Lameter
On Tue, 18 Sep 2007, Siddha, Suresh B wrote: > For now, we are trying to do slab Vs slub comparisons for the mainline > kernels. > Let's see how that goes. > > Meanwhile, any chance that you can point us at relevant recent patches/fixes > that are in -mm and perhaps that can be applied to

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-20 Thread Christoph Lameter
On Tue, 18 Sep 2007, Siddha, Suresh B wrote: For now, we are trying to do slab Vs slub comparisons for the mainline kernels. Let's see how that goes. Meanwhile, any chance that you can point us at relevant recent patches/fixes that are in -mm and perhaps that can be applied to mainline

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-18 Thread Siddha, Suresh B
On Fri, Sep 14, 2007 at 12:51:34PM -0700, Christoph Lameter wrote: > On Fri, 14 Sep 2007, Siddha, Suresh B wrote: > > We are trying to get the latest data with 2.6.23-rc4-mm1 with and without > > slub. Is this good enough? > > Good enough. If you are concerned about the page allocator pass through

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-18 Thread Siddha, Suresh B
On Fri, Sep 14, 2007 at 12:51:34PM -0700, Christoph Lameter wrote: On Fri, 14 Sep 2007, Siddha, Suresh B wrote: We are trying to get the latest data with 2.6.23-rc4-mm1 with and without slub. Is this good enough? Good enough. If you are concerned about the page allocator pass through then

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-14 Thread Christoph Lameter
On Fri, 14 Sep 2007, Siddha, Suresh B wrote: > Numbers I posted in the previous e-mail is the only story we have so far. It would be interesting to know more about how the allocator is used there. > Sorry, These systems are huge and limited. We are raising the priority > with the performance

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-14 Thread Siddha, Suresh B
Christoph, On Thu, Sep 13, 2007 at 11:03:53AM -0700, Christoph Lameter wrote: > On Wed, 12 Sep 2007, Siddha, Suresh B wrote: > > > Christoph, Not sure if you are referring to me or not here. But our > > tests(atleast on with the database workloads) approx 1.5 months or so back > > showed that on

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-14 Thread Siddha, Suresh B
Christoph, On Thu, Sep 13, 2007 at 11:03:53AM -0700, Christoph Lameter wrote: On Wed, 12 Sep 2007, Siddha, Suresh B wrote: Christoph, Not sure if you are referring to me or not here. But our tests(atleast on with the database workloads) approx 1.5 months or so back showed that on ia64

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-14 Thread Christoph Lameter
On Fri, 14 Sep 2007, Siddha, Suresh B wrote: Numbers I posted in the previous e-mail is the only story we have so far. It would be interesting to know more about how the allocator is used there. Sorry, These systems are huge and limited. We are raising the priority with the performance team

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-13 Thread Christoph Lameter
On Wed, 12 Sep 2007, Siddha, Suresh B wrote: > Christoph, Not sure if you are referring to me or not here. But our > tests(atleast on with the database workloads) approx 1.5 months or so back > showed that on ia64 slub was on par with slab and on x86_64, slub was 9% down. > And after changing the

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-13 Thread Siddha, Suresh B
On Tue, Sep 11, 2007 at 01:19:30PM -0700, Christoph Lameter wrote: > On Tue, 11 Sep 2007, Nick Piggin wrote: > > > The impression I got at vm meeting was that SLUB was good to go :( > > Its not? I have had Intel test this thoroughly and they assured me that it > is up to SLAB. Christoph, Not

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-13 Thread Siddha, Suresh B
On Tue, Sep 11, 2007 at 01:19:30PM -0700, Christoph Lameter wrote: On Tue, 11 Sep 2007, Nick Piggin wrote: The impression I got at vm meeting was that SLUB was good to go :( Its not? I have had Intel test this thoroughly and they assured me that it is up to SLAB. Christoph, Not sure if

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-13 Thread Christoph Lameter
On Wed, 12 Sep 2007, Siddha, Suresh B wrote: Christoph, Not sure if you are referring to me or not here. But our tests(atleast on with the database workloads) approx 1.5 months or so back showed that on ia64 slub was on par with slab and on x86_64, slub was 9% down. And after changing the

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-11 Thread Nick Piggin
On Wednesday 12 September 2007 06:19, Christoph Lameter wrote: > On Tue, 11 Sep 2007, Nick Piggin wrote: > > The impression I got at vm meeting was that SLUB was good to go :( > > Its not? I have had Intel test this thoroughly and they assured me that it > is up to SLAB. This particular case is an

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-11 Thread Christoph Lameter
On Tue, 11 Sep 2007, Nick Piggin wrote: > The impression I got at vm meeting was that SLUB was good to go :( Its not? I have had Intel test this thoroughly and they assured me that it is up to SLAB. This particular case is an synthetic tests for a PAGE_SIZE alloc and SLUB was not optimized for

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-11 Thread Nick Piggin
On Tuesday 11 September 2007 05:07, Christoph Lameter wrote: > On Mon, 10 Sep 2007, Nick Piggin wrote: > > OK, so after isolating the scheduler, then SLUB should be as fast as SLAB > > at the same allocation size. That's basically what we need to do before > > we can replace SLAB with it, I think?

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-11 Thread Nick Piggin
On Tuesday 11 September 2007 05:07, Christoph Lameter wrote: On Mon, 10 Sep 2007, Nick Piggin wrote: OK, so after isolating the scheduler, then SLUB should be as fast as SLAB at the same allocation size. That's basically what we need to do before we can replace SLAB with it, I think? The

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-11 Thread Christoph Lameter
On Tue, 11 Sep 2007, Nick Piggin wrote: The impression I got at vm meeting was that SLUB was good to go :( Its not? I have had Intel test this thoroughly and they assured me that it is up to SLAB. This particular case is an synthetic tests for a PAGE_SIZE alloc and SLUB was not optimized for

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-11 Thread Nick Piggin
On Wednesday 12 September 2007 06:19, Christoph Lameter wrote: On Tue, 11 Sep 2007, Nick Piggin wrote: The impression I got at vm meeting was that SLUB was good to go :( Its not? I have had Intel test this thoroughly and they assured me that it is up to SLAB. This particular case is an

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-10 Thread Christoph Lameter
On Mon, 10 Sep 2007, Nick Piggin wrote: > OK, so after isolating the scheduler, then SLUB should be as fast as SLAB > at the same allocation size. That's basically what we need to do before we > can replace SLAB with it, I think? The regression is due to the limited number of objects in the per

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-10 Thread Nick Piggin
On Monday 10 September 2007 10:56, Zhang, Yanmin wrote: > On Sat, 2007-09-08 at 18:08 +1000, Nick Piggin wrote: > > On Wednesday 05 September 2007 17:07, Christoph Lameter wrote: > > > On Wed, 5 Sep 2007, Zhang, Yanmin wrote: > > > > > slub_max_order=3 slub_min_objects=8 > > > > > > > > I tried

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-10 Thread Nick Piggin
On Monday 10 September 2007 10:56, Zhang, Yanmin wrote: On Sat, 2007-09-08 at 18:08 +1000, Nick Piggin wrote: On Wednesday 05 September 2007 17:07, Christoph Lameter wrote: On Wed, 5 Sep 2007, Zhang, Yanmin wrote: slub_max_order=3 slub_min_objects=8 I tried this approach. The

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-10 Thread Christoph Lameter
On Mon, 10 Sep 2007, Nick Piggin wrote: OK, so after isolating the scheduler, then SLUB should be as fast as SLAB at the same allocation size. That's basically what we need to do before we can replace SLAB with it, I think? The regression is due to the limited number of objects in the per cpu

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-09 Thread Zhang, Yanmin
On Sat, 2007-09-08 at 18:08 +1000, Nick Piggin wrote: > On Wednesday 05 September 2007 17:07, Christoph Lameter wrote: > > On Wed, 5 Sep 2007, Zhang, Yanmin wrote: > > > > slub_max_order=3 slub_min_objects=8 > > > > > > I tried this approach. The testing result showed 2.6.23-rc4 is about > > >

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-09 Thread Zhang, Yanmin
On Sat, 2007-09-08 at 18:08 +1000, Nick Piggin wrote: On Wednesday 05 September 2007 17:07, Christoph Lameter wrote: On Wed, 5 Sep 2007, Zhang, Yanmin wrote: slub_max_order=3 slub_min_objects=8 I tried this approach. The testing result showed 2.6.23-rc4 is about 2.5% better than

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-07 Thread Nick Piggin
On Wednesday 05 September 2007 17:07, Christoph Lameter wrote: > On Wed, 5 Sep 2007, Zhang, Yanmin wrote: > > > slub_max_order=3 slub_min_objects=8 > > > > I tried this approach. The testing result showed 2.6.23-rc4 is about > > 2.5% better than 2.6.22. It really resovles the issue. > > Note also

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-07 Thread Nick Piggin
On Wednesday 05 September 2007 17:07, Christoph Lameter wrote: On Wed, 5 Sep 2007, Zhang, Yanmin wrote: slub_max_order=3 slub_min_objects=8 I tried this approach. The testing result showed 2.6.23-rc4 is about 2.5% better than 2.6.22. It really resovles the issue. Note also that the

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-05 Thread Zhang, Yanmin
On Wed, 2007-09-05 at 03:45 -0700, Christoph Lameter wrote: > On Wed, 5 Sep 2007, Zhang, Yanmin wrote: > > > > > However, the approach treats the slabs in the same policy. Could we > > > > implement a per-slab specific approach like direct b)? > > > > > > I am not sure what you mean by same

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-05 Thread Christoph Lameter
On Wed, 5 Sep 2007, Zhang, Yanmin wrote: > > > However, the approach treats the slabs in the same policy. Could we > > > implement a per-slab specific approach like direct b)? > > > > I am not sure what you mean by same policy. Same configuration for all > > slabs? > Yes. Ok. I could add the

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-05 Thread Zhang, Yanmin
On Tue, 2007-09-04 at 23:58 -0700, Christoph Lameter wrote: > On Wed, 5 Sep 2007, Zhang, Yanmin wrote: > > > On Tue, 2007-09-04 at 20:59 -0700, Christoph Lameter wrote: > > > On Wed, 5 Sep 2007, Zhang, Yanmin wrote: > > > > > > > 8) kmalloc-4096 order is 1 which means one slab consists of 2

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-05 Thread Christoph Lameter
On Wed, 5 Sep 2007, Zhang, Yanmin wrote: > > slub_max_order=3 slub_min_objects=8 > I tried this approach. The testing result showed 2.6.23-rc4 is about > 2.5% better than 2.6.22. It really resovles the issue. Note also that the configuration you tried is the way SLUB is configured in Andrew's

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-05 Thread Christoph Lameter
On Wed, 5 Sep 2007, Zhang, Yanmin wrote: > On Tue, 2007-09-04 at 20:59 -0700, Christoph Lameter wrote: > > On Wed, 5 Sep 2007, Zhang, Yanmin wrote: > > > > > 8) kmalloc-4096 order is 1 which means one slab consists of 2 objects. So > > > a > > > > You can change that by booting with

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-05 Thread Christoph Lameter
On Wed, 5 Sep 2007, Zhang, Yanmin wrote: On Tue, 2007-09-04 at 20:59 -0700, Christoph Lameter wrote: On Wed, 5 Sep 2007, Zhang, Yanmin wrote: 8) kmalloc-4096 order is 1 which means one slab consists of 2 objects. So a You can change that by booting with slub_max_order=0. Then we

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-05 Thread Christoph Lameter
On Wed, 5 Sep 2007, Zhang, Yanmin wrote: slub_max_order=3 slub_min_objects=8 I tried this approach. The testing result showed 2.6.23-rc4 is about 2.5% better than 2.6.22. It really resovles the issue. Note also that the configuration you tried is the way SLUB is configured in Andrew's tree.

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-05 Thread Zhang, Yanmin
On Tue, 2007-09-04 at 23:58 -0700, Christoph Lameter wrote: On Wed, 5 Sep 2007, Zhang, Yanmin wrote: On Tue, 2007-09-04 at 20:59 -0700, Christoph Lameter wrote: On Wed, 5 Sep 2007, Zhang, Yanmin wrote: 8) kmalloc-4096 order is 1 which means one slab consists of 2 objects. So a

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-05 Thread Christoph Lameter
On Wed, 5 Sep 2007, Zhang, Yanmin wrote: However, the approach treats the slabs in the same policy. Could we implement a per-slab specific approach like direct b)? I am not sure what you mean by same policy. Same configuration for all slabs? Yes. Ok. I could add the ability to

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-04 Thread Zhang, Yanmin
On Tue, 2007-09-04 at 20:59 -0700, Christoph Lameter wrote: > On Wed, 5 Sep 2007, Zhang, Yanmin wrote: > > > 8) kmalloc-4096 order is 1 which means one slab consists of 2 objects. So a > > You can change that by booting with slub_max_order=0. Then we can also use > the per cpu queues to get

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-04 Thread Christoph Lameter
On Wed, 5 Sep 2007, Zhang, Yanmin wrote: > 8) kmalloc-4096 order is 1 which means one slab consists of 2 objects. So a You can change that by booting with slub_max_order=0. Then we can also use the per cpu queues to get these order 0 objects which may speed up the allocations because we do not

tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-04 Thread Zhang, Yanmin
1) Tbench has about 30% regression in kernel 2.6.23-rc4 than 2.6.22. 2.6.23-rc1 has about 10% regression. I investigated 2.6.22 and 2.6.23-rc4. 2) Testing environment: x86_64, qual-core, 2 physical processors, totally 8 cores. 8GB memory. Kernel enables CONFIG_SLUB=y and CONFIG_SLUB_DEBUG=y. 3)

tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-04 Thread Zhang, Yanmin
1) Tbench has about 30% regression in kernel 2.6.23-rc4 than 2.6.22. 2.6.23-rc1 has about 10% regression. I investigated 2.6.22 and 2.6.23-rc4. 2) Testing environment: x86_64, qual-core, 2 physical processors, totally 8 cores. 8GB memory. Kernel enables CONFIG_SLUB=y and CONFIG_SLUB_DEBUG=y. 3)

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-04 Thread Christoph Lameter
On Wed, 5 Sep 2007, Zhang, Yanmin wrote: 8) kmalloc-4096 order is 1 which means one slab consists of 2 objects. So a You can change that by booting with slub_max_order=0. Then we can also use the per cpu queues to get these order 0 objects which may speed up the allocations because we do not

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-04 Thread Zhang, Yanmin
On Tue, 2007-09-04 at 20:59 -0700, Christoph Lameter wrote: On Wed, 5 Sep 2007, Zhang, Yanmin wrote: 8) kmalloc-4096 order is 1 which means one slab consists of 2 objects. So a You can change that by booting with slub_max_order=0. Then we can also use the per cpu queues to get these