RE: Industry db benchmark result on recent 2.6 kernels

2005-04-01 Thread Chen, Kenneth W
Linus Torvalds wrote on Tuesday, March 29, 2005 4:00 PM > Also, it would be absolutely wonderful to see a finer granularity (which > would likely also answer the stability question of the numbers). If you > can do this with the daily snapshots, that would be great. If it's not > easily

RE: Industry db benchmark result on recent 2.6 kernels

2005-04-01 Thread Chen, Kenneth W
Ingo Molnar wrote on Thursday, March 31, 2005 10:46 PM > before we get into complexities, i'd like to see whether it solves Ken's > performance problem. The attached patch (against BK-curr, but should > apply to vanilla 2.6.12-rc1 too) adds the autodetection feature. (For > ia64 i've hacked in a

RE: [RFC][PATCH] timers fixes/improvements

2005-04-01 Thread Chen, Kenneth W
Linus Torvalds <[EMAIL PROTECTED]> wrote: > On Fri, 1 Apr 2005, Oleg Nesterov wrote: > > > > This patch replaces and updates 6 timer patches which are currently > > in -mm tree. This version does not play games with __TIMER_PENDING > > bit, so incremental patch is not suitable. It is against

RE: [RFC][PATCH] timers fixes/improvements

2005-04-01 Thread Chen, Kenneth W
Linus Torvalds [EMAIL PROTECTED] wrote: On Fri, 1 Apr 2005, Oleg Nesterov wrote: This patch replaces and updates 6 timer patches which are currently in -mm tree. This version does not play games with __TIMER_PENDING bit, so incremental patch is not suitable. It is against 2.6.12-rc1.

RE: Industry db benchmark result on recent 2.6 kernels

2005-04-01 Thread Chen, Kenneth W
Ingo Molnar wrote on Thursday, March 31, 2005 10:46 PM before we get into complexities, i'd like to see whether it solves Ken's performance problem. The attached patch (against BK-curr, but should apply to vanilla 2.6.12-rc1 too) adds the autodetection feature. (For ia64 i've hacked in a

RE: Industry db benchmark result on recent 2.6 kernels

2005-04-01 Thread Chen, Kenneth W
Linus Torvalds wrote on Tuesday, March 29, 2005 4:00 PM Also, it would be absolutely wonderful to see a finer granularity (which would likely also answer the stability question of the numbers). If you can do this with the daily snapshots, that would be great. If it's not easily automatable, or

RE: Linux Kernel Performance Testing

2005-04-01 Thread Chen, Kenneth W
Grecko OSCP wrote on Friday, April 01, 2005 10:22 AM I noticed yesterday a news article on Linux.org about more kernel performance testing being called for, and I decided it would be a nice project to try. I have 10 completely identical systems that can be used for this, and would like to get

RE: Industry db benchmark result on recent 2.6 kernels

2005-04-01 Thread Chen, Kenneth W
Paul Jackson wrote on Friday, April 01, 2005 5:45 PM Kenneth wrote: Paul, you definitely want to check this out on your large numa box. Interesting - thanks. I can get a kernel patched and booted on a big box easily enough. I don't know how to run an industry db benchmark, and benchmarks

RE: Industry db benchmark result on recent 2.6 kernels

2005-03-31 Thread Chen, Kenneth W
Ingo Molnar wrote on Thursday, March 31, 2005 8:52 PM > the current scheduler queue in -mm has some experimental bits as well > which will reduce the amount of balancing. But we cannot just merge them > an bloc right now, there's been too much back and forth in recent > kernels. The

RE: Industry db benchmark result on recent 2.6 kernels

2005-03-31 Thread Chen, Kenneth W
Linus Torvalds wrote on Thursday, March 31, 2005 12:09 PM > Btw, I realize that you can't give good oprofiles for the user-mode > components, but a kernel profile with even just single "time spent in user > mode" datapoint would be good, since a kernel scheduling problem might > just make caches

RE: Industry db benchmark result on recent 2.6 kernels

2005-03-31 Thread Chen, Kenneth W
Ingo Molnar wrote on Thursday, March 31, 2005 6:15 AM > is there any idle time on the system, in steady state (it's a sign of > under-balancing)? Idle balancing (and wakeup balancing) is one of the > things that got tuned back and forth alot. Also, do you know what the > total number of

RE: Industry db benchmark result on recent 2.6 kernels

2005-03-31 Thread Chen, Kenneth W
Ingo Molnar wrote on Thursday, March 31, 2005 6:15 AM is there any idle time on the system, in steady state (it's a sign of under-balancing)? Idle balancing (and wakeup balancing) is one of the things that got tuned back and forth alot. Also, do you know what the total number of

RE: Industry db benchmark result on recent 2.6 kernels

2005-03-31 Thread Chen, Kenneth W
Linus Torvalds wrote on Thursday, March 31, 2005 12:09 PM Btw, I realize that you can't give good oprofiles for the user-mode components, but a kernel profile with even just single time spent in user mode datapoint would be good, since a kernel scheduling problem might just make caches work

RE: Industry db benchmark result on recent 2.6 kernels

2005-03-31 Thread Chen, Kenneth W
Ingo Molnar wrote on Thursday, March 31, 2005 8:52 PM the current scheduler queue in -mm has some experimental bits as well which will reduce the amount of balancing. But we cannot just merge them an bloc right now, there's been too much back and forth in recent kernels. The

RE: Industry db benchmark result on recent 2.6 kernels

2005-03-29 Thread Chen, Kenneth W
Nick Piggin wrote on Tuesday, March 29, 2005 5:32 PM > If it is doing a lot of mapping/unmapping (or fork/exit), then that > might explain why 2.6.11 is worse. > > Fortunately there are more patches to improve this on the way. Once benchmark reaches steady state, there is no mapping/unmapping

RE: Industry db benchmark result on recent 2.6 kernels

2005-03-29 Thread Chen, Kenneth W
Linus Torvalds wrote on Tuesday, March 29, 2005 4:00 PM > The fact that it seems to fluctuate pretty wildly makes me wonder > how stable the numbers are. I can't resist myself from bragging. The high point in the fluctuation might be because someone is working hard trying to make 2.6 kernel run

RE: Industry db benchmark result on recent 2.6 kernels

2005-03-29 Thread Chen, Kenneth W
On Mon, 28 Mar 2005, Chen, Kenneth W wrote: > With that said, here goes our first data point along with some historical data > we have collected so far. > > 2.6.11-13% > 2.6.9 - 6% > 2.6.8 -23% > 2.6.2 - 1% > baseline (rhel3) Linus T

RE: [patch] new fifo I/O elevator that really does nothing at all

2005-03-29 Thread Chen, Kenneth W
Jens Axboe wrote on Tuesday, March 29, 2005 12:04 PM > No such promise was ever made, noop just means it does 'basically > nothing'. It never meant FIFO in anyway, we cannot break the semantics > of block layer commands just for the hell of it. Acknowledged and understood, will try your patch

RE: [patch] use cheaper elv_queue_empty when unplug a device

2005-03-29 Thread Chen, Kenneth W
Nick Piggin wrote on Tuesday, March 29, 2005 1:20 AM > Speaking of which, I've had a few ideas lying around for possible > performance improvement in the block code. > > I haven't used a big disk array (or tried any simulation), but I'll > attach the patch if you're looking into that area. > >

RE: [patch] new fifo I/O elevator that really does nothing at all

2005-03-29 Thread Chen, Kenneth W
On Mon, Mar 28 2005, Chen, Kenneth W wrote: > The noop elevator is still too fat for db transaction processing > workload. Since the db application already merged all blocks before > sending it down, the I/O presented to the elevator are actually not > merge-able anymore. Since

RE: [patch] optimization: defer bio_vec deallocation

2005-03-29 Thread Chen, Kenneth W
> Dave Jones wrote on Monday, March 28, 2005 7:00 PM > > If you can't publish results from that certain benchmark due its stupid > > restrictions, Forgot to thank Dave earlier for his understanding. I can't even mention the 4 letter acronym for the benchmark. Sorry, I did not make the rule nor

RE: [patch] optimization: defer bio_vec deallocation

2005-03-29 Thread Chen, Kenneth W
Dave Jones wrote on Monday, March 28, 2005 7:00 PM If you can't publish results from that certain benchmark due its stupid restrictions, Forgot to thank Dave earlier for his understanding. I can't even mention the 4 letter acronym for the benchmark. Sorry, I did not make the rule nor have

RE: [patch] new fifo I/O elevator that really does nothing at all

2005-03-29 Thread Chen, Kenneth W
On Mon, Mar 28 2005, Chen, Kenneth W wrote: The noop elevator is still too fat for db transaction processing workload. Since the db application already merged all blocks before sending it down, the I/O presented to the elevator are actually not merge-able anymore. Since I/O are also random

RE: [patch] use cheaper elv_queue_empty when unplug a device

2005-03-29 Thread Chen, Kenneth W
Nick Piggin wrote on Tuesday, March 29, 2005 1:20 AM Speaking of which, I've had a few ideas lying around for possible performance improvement in the block code. I haven't used a big disk array (or tried any simulation), but I'll attach the patch if you're looking into that area.

RE: [patch] new fifo I/O elevator that really does nothing at all

2005-03-29 Thread Chen, Kenneth W
Jens Axboe wrote on Tuesday, March 29, 2005 12:04 PM No such promise was ever made, noop just means it does 'basically nothing'. It never meant FIFO in anyway, we cannot break the semantics of block layer commands just for the hell of it. Acknowledged and understood, will try your patch

RE: Industry db benchmark result on recent 2.6 kernels

2005-03-29 Thread Chen, Kenneth W
On Mon, 28 Mar 2005, Chen, Kenneth W wrote: With that said, here goes our first data point along with some historical data we have collected so far. 2.6.11-13% 2.6.9 - 6% 2.6.8 -23% 2.6.2 - 1% baseline (rhel3) Linus Torvalds wrote on Tuesday, March 29

RE: Industry db benchmark result on recent 2.6 kernels

2005-03-29 Thread Chen, Kenneth W
Linus Torvalds wrote on Tuesday, March 29, 2005 4:00 PM The fact that it seems to fluctuate pretty wildly makes me wonder how stable the numbers are. I can't resist myself from bragging. The high point in the fluctuation might be because someone is working hard trying to make 2.6 kernel run

RE: Industry db benchmark result on recent 2.6 kernels

2005-03-29 Thread Chen, Kenneth W
Nick Piggin wrote on Tuesday, March 29, 2005 5:32 PM If it is doing a lot of mapping/unmapping (or fork/exit), then that might explain why 2.6.11 is worse. Fortunately there are more patches to improve this on the way. Once benchmark reaches steady state, there is no mapping/unmapping going

RE: [patch] optimization: defer bio_vec deallocation

2005-03-28 Thread Chen, Kenneth W
On Mon, Mar 28, 2005 at 06:38:23PM -0800, Chen, Kenneth W wrote: > We have measured that the following patch give measurable performance gain > for industry standard db benchmark. Comments? Dave Jones wrote on Monday, March 28, 2005 7:00 PM > If you can't publish results from tha

[patch] use cheaper elv_queue_empty when unplug a device

2005-03-28 Thread Chen, Kenneth W
This patch was posted last year and if I remember correctly, Jens said he is OK with the patch. In function __generic_unplug_deivce(), kernel can use a cheaper function elv_queue_empty() instead of more expensive elv_next_request to find whether the queue is empty or not. blk_run_queue can also

[patch] optimization: defer bio_vec deallocation

2005-03-28 Thread Chen, Kenneth W
Kernel needs at least one bio and one bio_vec structure to process one I/O. For every I/O, kernel also does two pairs of mempool_alloc/free, one for bio and one for bio_vec. It is not exactly cheap in setup/tear down bio_vec structure. bio_alloc_bs() does more things in that function other than

[patch] new fifo I/O elevator that really does nothing at all

2005-03-28 Thread Chen, Kenneth W
The noop elevator is still too fat for db transaction processing workload. Since the db application already merged all blocks before sending it down, the I/O presented to the elevator are actually not merge-able anymore. Since I/O are also random, we don't want to sort them either. However the

RE: Industry db benchmark result on recent 2.6 kernels

2005-03-28 Thread Chen, Kenneth W
On Mon, 2005-03-28 at 11:33 -0800, Chen, Kenneth W wrote: > We will be taking db benchmark measurements more frequently from now on with > latest kernel from kernel.org (and make these measurements on a fixed > interval). > By doing this, I hope to achieve two things: one is to track

Industry db benchmark result on recent 2.6 kernels

2005-03-28 Thread Chen, Kenneth W
The roller coaster ride continues for the 2.6 kernel on how it measure up in performance using industry standard database transaction processing benchmark. We took a measurement on 2.6.11 and found it is 13% down from the baseline. We will be taking db benchmark measurements more frequently from

Industry db benchmark result on recent 2.6 kernels

2005-03-28 Thread Chen, Kenneth W
The roller coaster ride continues for the 2.6 kernel on how it measure up in performance using industry standard database transaction processing benchmark. We took a measurement on 2.6.11 and found it is 13% down from the baseline. We will be taking db benchmark measurements more frequently from

RE: Industry db benchmark result on recent 2.6 kernels

2005-03-28 Thread Chen, Kenneth W
On Mon, 2005-03-28 at 11:33 -0800, Chen, Kenneth W wrote: We will be taking db benchmark measurements more frequently from now on with latest kernel from kernel.org (and make these measurements on a fixed interval). By doing this, I hope to achieve two things: one is to track base kernel

[patch] new fifo I/O elevator that really does nothing at all

2005-03-28 Thread Chen, Kenneth W
The noop elevator is still too fat for db transaction processing workload. Since the db application already merged all blocks before sending it down, the I/O presented to the elevator are actually not merge-able anymore. Since I/O are also random, we don't want to sort them either. However the

[patch] optimization: defer bio_vec deallocation

2005-03-28 Thread Chen, Kenneth W
Kernel needs at least one bio and one bio_vec structure to process one I/O. For every I/O, kernel also does two pairs of mempool_alloc/free, one for bio and one for bio_vec. It is not exactly cheap in setup/tear down bio_vec structure. bio_alloc_bs() does more things in that function other than

[patch] use cheaper elv_queue_empty when unplug a device

2005-03-28 Thread Chen, Kenneth W
This patch was posted last year and if I remember correctly, Jens said he is OK with the patch. In function __generic_unplug_deivce(), kernel can use a cheaper function elv_queue_empty() instead of more expensive elv_next_request to find whether the queue is empty or not. blk_run_queue can also

RE: [patch] optimization: defer bio_vec deallocation

2005-03-28 Thread Chen, Kenneth W
On Mon, Mar 28, 2005 at 06:38:23PM -0800, Chen, Kenneth W wrote: We have measured that the following patch give measurable performance gain for industry standard db benchmark. Comments? Dave Jones wrote on Monday, March 28, 2005 7:00 PM If you can't publish results from that certain benchmark

RE: [PATCH 0/5] timers: description

2005-03-26 Thread Chen, Kenneth W
Oleg Nesterov wrote on March 19, 2005 17:28:48 > These patches are updated version of 'del_timer_sync: proof of > concept' 2 patches. I changed schedule_timeout() to call the new del_timer_sync instead of currently del_singleshot_timer_sync in attempt to stress these set of patches a bit more and

RE: [PATCH 0/5] timers: description

2005-03-26 Thread Chen, Kenneth W
Oleg Nesterov wrote on March 19, 2005 17:28:48 These patches are updated version of 'del_timer_sync: proof of concept' 2 patches. I changed schedule_timeout() to call the new del_timer_sync instead of currently del_singleshot_timer_sync in attempt to stress these set of patches a bit more and I

RE: [PATCH 0/5] timers: description

2005-03-24 Thread Chen, Kenneth W
Oleg Nesterov wrote on Monday, March 21, 2005 6:19 AM > These patches are updated version of 'del_timer_sync: proof of concept' > 2 patches. Looks good performance wise. Took a quick micro benchmark measurement on a 16-node numa box (32-way). Results are pretty nice (to the expectation). Time

RE: re-inline sched functions

2005-03-24 Thread Chen, Kenneth W
in any true fastpath, if it > makes any difference then the performance difference must be some other > artifact. Chen, Kenneth W wrote on Friday, March 11, 2005 10:40 AM > OK, I'll re-measure. Yeah, I agree that this function is not in the fastpath. Ingo is right, re-measured on our bench

RE: re-inline sched functions

2005-03-24 Thread Chen, Kenneth W
, if it makes any difference then the performance difference must be some other artifact. Chen, Kenneth W wrote on Friday, March 11, 2005 10:40 AM OK, I'll re-measure. Yeah, I agree that this function is not in the fastpath. Ingo is right, re-measured on our benchmark setup and did not see any

RE: [PATCH 0/5] timers: description

2005-03-24 Thread Chen, Kenneth W
Oleg Nesterov wrote on Monday, March 21, 2005 6:19 AM These patches are updated version of 'del_timer_sync: proof of concept' 2 patches. Looks good performance wise. Took a quick micro benchmark measurement on a 16-node numa box (32-way). Results are pretty nice (to the expectation). Time to

RE: [PATCH 2.6.11] AIO panic on PPC64 caused byis_hugepage_only_range()

2005-03-22 Thread Chen, Kenneth W
On Mon, 2005-03-21 at 18:41, Andrew Morton wrote: > Did we fix this yet? > Daniel McNeil wrote on Tuesday, March 22, 2005 11:25 AM > Here's a patch against 2.6.11 that fixes the problem. > It changes is_hugepage_only_range() to take mm as an argument > and then changes the places that call it to

RE: Fusion-MPT much faster as module

2005-03-22 Thread Chen, Kenneth W
On Mon, 21 Mar 2005, Andrew Morton wrote: > Holger, this problem remains unresolved, does it not? Have you done any > more experimentation? > > I must say that something funny seems to be happening here. I have two > MPT-based Dell machines, neither of which is using a modular driver: > >

RE: Fusion-MPT much faster as module

2005-03-22 Thread Chen, Kenneth W
On Mon, 21 Mar 2005, Andrew Morton wrote: Holger, this problem remains unresolved, does it not? Have you done any more experimentation? I must say that something funny seems to be happening here. I have two MPT-based Dell machines, neither of which is using a modular driver:

RE: [PATCH 2.6.11] AIO panic on PPC64 caused byis_hugepage_only_range()

2005-03-22 Thread Chen, Kenneth W
On Mon, 2005-03-21 at 18:41, Andrew Morton wrote: Did we fix this yet? Daniel McNeil wrote on Tuesday, March 22, 2005 11:25 AM Here's a patch against 2.6.11 that fixes the problem. It changes is_hugepage_only_range() to take mm as an argument and then changes the places that call it to pass

RE: [patch] del_timer_sync scalability patch

2005-03-20 Thread Chen, Kenneth W
We did exactly the same thing about 10 months back. Nice to see that independent people came up with exactly the same solution that we proposed 10 months back. In fact, this patch is line-by-line identical to the one we post. Hope Andrew is going to take the patch this time. See our original

RE: [patch] del_timer_sync scalability patch

2005-03-20 Thread Chen, Kenneth W
We did exactly the same thing about 10 months back. Nice to see that independent people came up with exactly the same solution that we proposed 10 months back. In fact, this patch is line-by-line identical to the one we post. Hope Andrew is going to take the patch this time. See our original

RE: re-inline sched functions

2005-03-11 Thread Chen, Kenneth W
Ingo Molnar wrote on Friday, March 11, 2005 1:32 AM > > -static unsigned int task_timeslice(task_t *p) > > +static inline unsigned int task_timeslice(task_t *p) > > the patch looks good except this one - could you try to undo it and > re-measure? task_timeslice() is not used in any true fastpath,

RE: re-inline sched functions

2005-03-11 Thread Chen, Kenneth W
Ingo Molnar wrote on Friday, March 11, 2005 1:32 AM -static unsigned int task_timeslice(task_t *p) +static inline unsigned int task_timeslice(task_t *p) the patch looks good except this one - could you try to undo it and re-measure? task_timeslice() is not used in any true fastpath, if it

re-inline sched functions

2005-03-10 Thread Chen, Kenneth W
This could be part of the unknown 2% performance regression with db transaction processing benchmark. The four functions in the following patch use to be inline. They are un-inlined since 2.6.7. We measured that by re-inline them back on 2.6.9, it improves performance for db transaction

RE: Direct io on block device has performance regression on 2.6.x kernel

2005-03-10 Thread Chen, Kenneth W
Andrew Morton wrote on Thursday, March 10, 2005 12:31 PM > > > Fine-grained alignment is probably too hard, and it should fall back to > > > __blockdev_direct_IO(). > > > > > > Does it do the right thing with a request which is non-page-aligned, but > > > 512-byte aligned? > > > > > > readv

RE: Direct io on block device has performance regression on 2.6.x kernel

2005-03-10 Thread Chen, Kenneth W
Andrew Morton wrote on Wednesday, March 09, 2005 8:10 PM > > 2.6.9 kernel is 6% slower compare to distributor's 2.4 kernel (RHEL3). > > Roughly > > 2% came from storage driver (I'm not allowed to say anything beyond that, > > there > > is a fix though). > > The codepaths are indeed longer in

RE: Direct io on block device has performance regression on 2.6.x kernel

2005-03-10 Thread Chen, Kenneth W
Andrew Morton wrote on Wednesday, March 09, 2005 8:10 PM 2.6.9 kernel is 6% slower compare to distributor's 2.4 kernel (RHEL3). Roughly 2% came from storage driver (I'm not allowed to say anything beyond that, there is a fix though). The codepaths are indeed longer in 2.6. Thank

RE: Direct io on block device has performance regression on 2.6.x kernel

2005-03-10 Thread Chen, Kenneth W
Andrew Morton wrote on Thursday, March 10, 2005 12:31 PM Fine-grained alignment is probably too hard, and it should fall back to __blockdev_direct_IO(). Does it do the right thing with a request which is non-page-aligned, but 512-byte aligned? readv and writev?

re-inline sched functions

2005-03-10 Thread Chen, Kenneth W
This could be part of the unknown 2% performance regression with db transaction processing benchmark. The four functions in the following patch use to be inline. They are un-inlined since 2.6.7. We measured that by re-inline them back on 2.6.9, it improves performance for db transaction

RE: Direct io on block device has performance regression on 2.6.x kernel

2005-03-09 Thread Chen, Kenneth W
Andrew Morton wrote Wednesday, March 09, 2005 6:26 PM > What does "1/3 of the total benchmark performance regression" mean? One > third of 0.1% isn't very impressive. You haven't told us anything at all > about the magnitude of this regression. 2.6.9 kernel is 6% slower compare to distributor's

RE: Direct io on block device has performance regression on 2.6.x kernel

2005-03-09 Thread Chen, Kenneth W
Chen, Kenneth W wrote on Wednesday, March 09, 2005 5:45 PM > Andrew Morton wrote on Wednesday, March 09, 2005 5:34 PM > > What are these percentages? Total CPU time? The direct-io stuff doesn't > > look too bad. It's surprising that tweaking the direct-io submission code

RE: Direct io on block device has performance regression on 2.6.x kernel

2005-03-09 Thread Chen, Kenneth W
Andrew Morton wrote on Wednesday, March 09, 2005 2:45 PM > > > > > Did you generate a kernel profile? > > > > Top 40 kernel hot functions, percentage is normalized to kernel > > utilization. > > > > _spin_unlock_irqrestore23.54% > > _spin_unlock_irq 19.27% > >

RE: Direct io on block device has performance regression on 2.6.x kernel

2005-03-09 Thread Chen, Kenneth W
Andrew Morton wrote on Wednesday, March 09, 2005 5:34 PM > What are these percentages? Total CPU time? The direct-io stuff doesn't > look too bad. It's surprising that tweaking the direct-io submission code > makes much difference. Percentage is relative to total kernel time. There are three

RE: Direct io on block device has performance regression on 2.6.x kernel

2005-03-09 Thread Chen, Kenneth W
For people who is dying to see some q-tool profile, here is one. It's not a vanilla 2.6.9 kernel, but with patches in raw device to get around the DIO performance problem. - Ken Flat profile of CPU_CYCLES in hist#0: Each histogram sample counts as 255.337u seconds % time self cumul

RE: Direct io on block device has performance regression on 2.6.x kernel

2005-03-09 Thread Chen, Kenneth W
Jesse Barnes wrote on Wednesday, March 09, 2005 3:53 PM > > "Chen, Kenneth W" <[EMAIL PROTECTED]> writes: > > > Just to clarify here, these data need to be taken at grain of salt. A > > > high count in _spin_unlock_* functions do not automatically po

RE: Direct io on block device has performance regression on 2.6.x kernel

2005-03-09 Thread Chen, Kenneth W
Andi Kleen wrote on Wednesday, March 09, 2005 3:23 PM > > Just to clarify here, these data need to be taken at grain of salt. A > > high count in _spin_unlock_* functions do not automatically points to > > lock contention. It's one of the blind spot syndrome with timer based > > profile on ia64.

RE: Direct io on block device has performance regression on 2.6.x kernel

2005-03-09 Thread Chen, Kenneth W
Chen, Kenneth W wrote on Wednesday, March 09, 2005 1:59 PM > > Did you generate a kernel profile? > > Top 40 kernel hot functions, percentage is normalized to kernel utilization. > > _spin_unlock_irqrestore 23.54% > _spin_unlock_irq 19.2

RE: Direct io on block device has performance regression on 2.6.x kernel

2005-03-09 Thread Chen, Kenneth W
Andrew Morton wrote on Wednesday, March 09, 2005 12:05 PM > "Chen, Kenneth W" <[EMAIL PROTECTED]> wrote: > > Let me answer the questions in reverse order. We started with running > > industry standard transaction processing database benchmark on 2.6 kernel, >

RE: Direct io on block device has performance regression on 2.6.x kernel

2005-03-09 Thread Chen, Kenneth W
Andrew Morton wrote on Tuesday, March 08, 2005 10:28 PM > But before doing anything else, please bench this on real hardware, > see if it is worth pursuing. Let me answer the questions in reverse order. We started with running industry standard transaction processing database benchmark on 2.6

RE: Direct io on block device has performance regression on 2.6.x kernel

2005-03-09 Thread Chen, Kenneth W
Andrew Morton wrote on Tuesday, March 08, 2005 10:28 PM But before doing anything else, please bench this on real hardware, see if it is worth pursuing. Let me answer the questions in reverse order. We started with running industry standard transaction processing database benchmark on 2.6

RE: Direct io on block device has performance regression on 2.6.x kernel

2005-03-09 Thread Chen, Kenneth W
Andrew Morton wrote on Wednesday, March 09, 2005 12:05 PM Chen, Kenneth W [EMAIL PROTECTED] wrote: Let me answer the questions in reverse order. We started with running industry standard transaction processing database benchmark on 2.6 kernel, on real hardware (4P smp, 64 GB memory, 450

RE: Direct io on block device has performance regression on 2.6.x kernel

2005-03-09 Thread Chen, Kenneth W
Chen, Kenneth W wrote on Wednesday, March 09, 2005 1:59 PM Did you generate a kernel profile? Top 40 kernel hot functions, percentage is normalized to kernel utilization. _spin_unlock_irqrestore 23.54% _spin_unlock_irq 19.27% Profile with spin lock

RE: Direct io on block device has performance regression on 2.6.x kernel

2005-03-09 Thread Chen, Kenneth W
Andi Kleen wrote on Wednesday, March 09, 2005 3:23 PM Just to clarify here, these data need to be taken at grain of salt. A high count in _spin_unlock_* functions do not automatically points to lock contention. It's one of the blind spot syndrome with timer based profile on ia64. There

RE: Direct io on block device has performance regression on 2.6.x kernel

2005-03-09 Thread Chen, Kenneth W
Jesse Barnes wrote on Wednesday, March 09, 2005 3:53 PM Chen, Kenneth W [EMAIL PROTECTED] writes: Just to clarify here, these data need to be taken at grain of salt. A high count in _spin_unlock_* functions do not automatically points to lock contention. It's one of the blind spot

RE: Direct io on block device has performance regression on 2.6.x kernel

2005-03-09 Thread Chen, Kenneth W
For people who is dying to see some q-tool profile, here is one. It's not a vanilla 2.6.9 kernel, but with patches in raw device to get around the DIO performance problem. - Ken Flat profile of CPU_CYCLES in hist#0: Each histogram sample counts as 255.337u seconds % time self cumul

RE: Direct io on block device has performance regression on 2.6.x kernel

2005-03-09 Thread Chen, Kenneth W
Andrew Morton wrote on Wednesday, March 09, 2005 5:34 PM What are these percentages? Total CPU time? The direct-io stuff doesn't look too bad. It's surprising that tweaking the direct-io submission code makes much difference. Percentage is relative to total kernel time. There are three DIO

RE: Direct io on block device has performance regression on 2.6.x kernel

2005-03-09 Thread Chen, Kenneth W
Andrew Morton wrote on Wednesday, March 09, 2005 2:45 PM Did you generate a kernel profile? Top 40 kernel hot functions, percentage is normalized to kernel utilization. _spin_unlock_irqrestore23.54% _spin_unlock_irq 19.27% Cripes. Is that with

RE: Direct io on block device has performance regression on 2.6.x kernel

2005-03-09 Thread Chen, Kenneth W
Chen, Kenneth W wrote on Wednesday, March 09, 2005 5:45 PM Andrew Morton wrote on Wednesday, March 09, 2005 5:34 PM What are these percentages? Total CPU time? The direct-io stuff doesn't look too bad. It's surprising that tweaking the direct-io submission code makes much difference

RE: Direct io on block device has performance regression on 2.6.x kernel

2005-03-09 Thread Chen, Kenneth W
Andrew Morton wrote Wednesday, March 09, 2005 6:26 PM What does 1/3 of the total benchmark performance regression mean? One third of 0.1% isn't very impressive. You haven't told us anything at all about the magnitude of this regression. 2.6.9 kernel is 6% slower compare to distributor's 2.4

Bug fix in slab.c:alloc_arraycache

2005-03-08 Thread Chen, Kenneth W
Kmem_cache_alloc_node is not capable of handling a null cachep pointer as its input argument. If I try to increase a slab limit by echoing a very large number into /proc/slabinfo, kernel will panic from alloc_arraycache() because Kmem_find_general_cachep() can actually return a NULL pointer if

RE: Direct io on block device has performance regression on 2.6.x kernel - fix sync I/O path

2005-03-08 Thread Chen, Kenneth W
Christoph Hellwig wrote on Tuesday, March 08, 2005 6:20 PM > this is not the blockdevice, but the obsolete raw device driver. Please > benchmark and if nessecary fix the blockdevice O_DIRECT codepath insted > as the raw driver is slowly going away. >From performance perspective, can raw device

Direct io on block device has performance regression on 2.6.x kernel - fix AIO path

2005-03-08 Thread Chen, Kenneth W
This patch adds block device direct I/O for AIO path. 30% performance gain!! AIO (io_submit) 2.6.9 206,917 2.6.9+patches 268,484 - Ken Signed-off-by: Ken Chen <[EMAIL PROTECTED]> --- linux-2.6.9/drivers/char/raw.c 2005-03-08 17:22:07.0

Direct io on block device has performance regression on 2.6.x kernel - pseudo disk driver

2005-03-08 Thread Chen, Kenneth W
The pseudo disk driver that I used to stress the kernel I/O stack (anything above block layer, AIO/DIO/BIO). - Ken diff -Nur zero/blknull.c blknull/blknull.c --- zero/blknull.c 1969-12-31 16:00:00.0 -0800 +++ blknull/blknull.c 2005-03-03 19:04:07.0 -0800 @@ -0,0 +1,97 @@

RE: Direct io on block device has performance regression on 2.6.x kernel

2005-03-08 Thread Chen, Kenneth W
OK, last one in the series: user level test programs that stress the kernel I/O stack. Pretty dull stuff. - Ken diff -Nur zero/aio_null.c blknull_test/aio_null.c --- zero/aio_null.c 1969-12-31 16:00:00.0 -0800 +++ blknull_test/aio_null.c 2005-03-08 00:46:17.0 -0800 @@

Direct io on block device has performance regression on 2.6.x kernel - fix sync I/O path

2005-03-08 Thread Chen, Kenneth W
This patch adds block device direct I/O for synchronous path. I added in the raw device code to demo the performance effect. 48% performance gain!! synchronous I/O (pread/pwrite/read/write) 2.6.9 218,565 2.6.9+patches 323,016

Direct io on block device has performance regression on 2.6.x kernel

2005-03-08 Thread Chen, Kenneth W
I don't know where to start, but let me start with the bombshell: Direct I/O on block device running 2.6.X kernel is a lot SLOWER than running on a 2.4 Kernel! Processing a direct I/O request on 2.6 is taking a lot more cpu time compare to the same I/O request running on a 2.4 kernel. The

Direct io on block device has performance regression on 2.6.x kernel

2005-03-08 Thread Chen, Kenneth W
I don't know where to start, but let me start with the bombshell: Direct I/O on block device running 2.6.X kernel is a lot SLOWER than running on a 2.4 Kernel! Processing a direct I/O request on 2.6 is taking a lot more cpu time compare to the same I/O request running on a 2.4 kernel. The

Direct io on block device has performance regression on 2.6.x kernel - fix sync I/O path

2005-03-08 Thread Chen, Kenneth W
This patch adds block device direct I/O for synchronous path. I added in the raw device code to demo the performance effect. 48% performance gain!! synchronous I/O (pread/pwrite/read/write) 2.6.9 218,565 2.6.9+patches 323,016

RE: Direct io on block device has performance regression on 2.6.x kernel

2005-03-08 Thread Chen, Kenneth W
OK, last one in the series: user level test programs that stress the kernel I/O stack. Pretty dull stuff. - Ken diff -Nur zero/aio_null.c blknull_test/aio_null.c --- zero/aio_null.c 1969-12-31 16:00:00.0 -0800 +++ blknull_test/aio_null.c 2005-03-08 00:46:17.0 -0800 @@

Direct io on block device has performance regression on 2.6.x kernel - pseudo disk driver

2005-03-08 Thread Chen, Kenneth W
The pseudo disk driver that I used to stress the kernel I/O stack (anything above block layer, AIO/DIO/BIO). - Ken diff -Nur zero/blknull.c blknull/blknull.c --- zero/blknull.c 1969-12-31 16:00:00.0 -0800 +++ blknull/blknull.c 2005-03-03 19:04:07.0 -0800 @@ -0,0 +1,97 @@

Direct io on block device has performance regression on 2.6.x kernel - fix AIO path

2005-03-08 Thread Chen, Kenneth W
This patch adds block device direct I/O for AIO path. 30% performance gain!! AIO (io_submit) 2.6.9 206,917 2.6.9+patches 268,484 - Ken Signed-off-by: Ken Chen [EMAIL PROTECTED] --- linux-2.6.9/drivers/char/raw.c 2005-03-08 17:22:07.0

RE: Direct io on block device has performance regression on 2.6.x kernel - fix sync I/O path

2005-03-08 Thread Chen, Kenneth W
Christoph Hellwig wrote on Tuesday, March 08, 2005 6:20 PM this is not the blockdevice, but the obsolete raw device driver. Please benchmark and if nessecary fix the blockdevice O_DIRECT codepath insted as the raw driver is slowly going away. From performance perspective, can raw device be

Bug fix in slab.c:alloc_arraycache

2005-03-08 Thread Chen, Kenneth W
Kmem_cache_alloc_node is not capable of handling a null cachep pointer as its input argument. If I try to increase a slab limit by echoing a very large number into /proc/slabinfo, kernel will panic from alloc_arraycache() because Kmem_find_general_cachep() can actually return a NULL pointer if

<    1   2   3