[patch] remove redundant iov segment check

2006-12-04 Thread Chen, Kenneth W
The access_ok() and negative length check on each iov segment in function generic_file_aio_read/write are redundant. They are all already checked before calling down to these low level generic functions. Vector I/O (both sync and async) are checked via rw_copy_check_uvector(). For single segment

[patch] speed up single bio_vec allocation

2006-12-04 Thread Chen, Kenneth W
On 64-bit arch like x86_64, struct bio is 104 byte. Since bio slab is created with SLAB_HWCACHE_ALIGN flag, there are usually spare memory available at the end of bio. I think we can utilize that memory for bio_vec allocation. The purpose is not so much on saving memory consumption for bio_vec,

RE: [patch] remove redundant iov segment check

2006-12-04 Thread Chen, Kenneth W
Zach Brown wrote on Monday, December 04, 2006 11:19 AM On Dec 4, 2006, at 8:26 AM, Chen, Kenneth W wrote: The access_ok() and negative length check on each iov segment in function generic_file_aio_read/write are redundant. They are all already checked before calling down

RE: [patch] remove redundant iov segment check

2006-12-04 Thread Chen, Kenneth W
Andrew Morton wrote on Monday, December 04, 2006 11:36 AM On Mon, 4 Dec 2006 08:26:36 -0800 Chen, Kenneth W [EMAIL PROTECTED] wrote: So it's not possible to call down to generic_file_aio_read/write with invalid iov segment. Patch proposed to delete these redundant code. erp, please

RE: [patch] speed up single bio_vec allocation

2006-12-04 Thread Chen, Kenneth W
Jens Axboe wrote on Monday, December 04, 2006 12:07 PM On Mon, Dec 04 2006, Chen, Kenneth W wrote: On 64-bit arch like x86_64, struct bio is 104 byte. Since bio slab is created with SLAB_HWCACHE_ALIGN flag, there are usually spare memory available at the end of bio. I think we can utilize

[patch] optimize o_direct on block device - v2

2006-12-04 Thread Chen, Kenneth W
This patch implements block device specific .direct_IO method instead of going through generic direct_io_worker for block device. direct_io_worker is fairly complex because it needs to handle O_DIRECT on file system, where it needs to perform block allocation, hole detection, extents file on

[patch] add an iterator index in struct pagevec

2006-12-04 Thread Chen, Kenneth W
pagevec is never expected to be more than PAGEVEC_SIZE, I think a unsigned char is enough to count them. This patch makes nr, cold to be unsigned char and also adds an iterator index. With that, the size can be even bumped up by 1 to 15. Signed-off-by: Ken Chen [EMAIL PROTECTED] diff -Nurp

RE: [patch] add an iterator index in struct pagevec

2006-12-04 Thread Chen, Kenneth W
Andrew Morton wrote on Monday, December 04, 2006 9:45 PM On Mon, 4 Dec 2006 21:21:31 -0800 Chen, Kenneth W [EMAIL PROTECTED] wrote: pagevec is never expected to be more than PAGEVEC_SIZE, I think a unsigned char is enough to count them. This patch makes nr, cold to be unsigned char

RE: [rfc patch] optimize o_direct on block device

2006-12-01 Thread Chen, Kenneth W
Chris Mason wrote on Friday, December 01, 2006 7:37 AM > > It benefit from shorter path length. It takes much shorter time to process > > one I/O request, both in the submit and completion path. I always think in > > terms of how many instructions, or clock ticks does it take to convert user > >

RE: [rfc patch] optimize o_direct on block device

2006-12-01 Thread Chen, Kenneth W
Chris Mason wrote on Friday, December 01, 2006 7:37 AM It benefit from shorter path length. It takes much shorter time to process one I/O request, both in the submit and completion path. I always think in terms of how many instructions, or clock ticks does it take to convert user request

RE: [rfc patch] optimize o_direct on block device

2006-11-30 Thread Chen, Kenneth W
Zach Brown wrote on Thursday, November 30, 2006 1:45 PM > > At that time, a patch was written for raw device to demonstrate that > > large performance head room is achievable (at ~20% speedup for micro- > > benchmark and ~2% for db transaction processing benchmark) with a > > tight I/O

RE: [rfc patch] optimize o_direct on block device

2006-11-30 Thread Chen, Kenneth W
Zach Brown wrote on Thursday, November 30, 2006 1:45 PM At that time, a patch was written for raw device to demonstrate that large performance head room is achievable (at ~20% speedup for micro- benchmark and ~2% for db transaction processing benchmark) with a tight I/O submission

[rfc patch] optimize o_direct on block device

2006-11-29 Thread Chen, Kenneth W
I've been complaining about O_DIRECT I/O processing being exceedingly complex and slow since March 2005, see posting below: http://marc.theaimsgroup.com/?l=linux-kernel=111033309732261=2 At that time, a patch was written for raw device to demonstrate that large performance head room is achievable

[rfc patch] optimize o_direct on block device

2006-11-29 Thread Chen, Kenneth W
I've been complaining about O_DIRECT I/O processing being exceedingly complex and slow since March 2005, see posting below: http://marc.theaimsgroup.com/?l=linux-kernelm=111033309732261w=2 At that time, a patch was written for raw device to demonstrate that large performance head room is

RE: [rfc patch] Re: sched: incorrect argument used in task_hot()

2006-11-17 Thread Chen, Kenneth W
Mike Galbraith wrote on Friday, November 17, 2006 2:19 PM > > And a changelog, then we're all set! > > > > Oh. And a patch, too. > > Co-opt rq->timestamp_last_tick to maintain a cache_hot_time evaluation > reference timestamp at both tick and sched times to prevent said > reference, formerly

RE: [rfc patch] Re: sched: incorrect argument used in task_hot()

2006-11-17 Thread Chen, Kenneth W
Ingo Molnar wrote on Friday, November 17, 2006 11:21 AM > * Mike Galbraith <[EMAIL PROTECTED]> wrote: > > > One way to improve granularity, and eliminate the possibility of > > p->last_run being > rq->timestamp_tast_tick, and thereby short > > circuiting the evaluation of cache_hot_time, is to

RE: [rfc patch] Re: sched: incorrect argument used in task_hot()

2006-11-17 Thread Chen, Kenneth W
Mike Galbraith wrote on Friday, November 17, 2006 8:57 AM > On Tue, 2006-11-14 at 15:00 -0800, Chen, Kenneth W wrote: > > The argument used for task_hot in can_migrate_task() looks wrong: > > > > int can_migrate_task() > > { ... > >if (task_h

RE: [rfc patch] Re: sched: incorrect argument used in task_hot()

2006-11-17 Thread Chen, Kenneth W
Mike Galbraith wrote on Friday, November 17, 2006 8:57 AM On Tue, 2006-11-14 at 15:00 -0800, Chen, Kenneth W wrote: The argument used for task_hot in can_migrate_task() looks wrong: int can_migrate_task() { ... if (task_hot(p, rq-timestamp_last_tick, sd

RE: [rfc patch] Re: sched: incorrect argument used in task_hot()

2006-11-17 Thread Chen, Kenneth W
Ingo Molnar wrote on Friday, November 17, 2006 11:21 AM * Mike Galbraith [EMAIL PROTECTED] wrote: One way to improve granularity, and eliminate the possibility of p-last_run being rq-timestamp_tast_tick, and thereby short circuiting the evaluation of cache_hot_time, is to cache the

RE: [rfc patch] Re: sched: incorrect argument used in task_hot()

2006-11-17 Thread Chen, Kenneth W
Mike Galbraith wrote on Friday, November 17, 2006 2:19 PM And a changelog, then we're all set! Oh. And a patch, too. Co-opt rq-timestamp_last_tick to maintain a cache_hot_time evaluation reference timestamp at both tick and sched times to prevent said reference, formerly

Prefetch kernel stacks to speed up context switch

2005-09-07 Thread Chen, Kenneth W
Repost previously discussed patch (on Jul 27, 2005). Ingo did the same thing for all arch with 471 lines of patch. I'm still advocating this little 30 lines patch, of 6 lines introduces prefetch_stack() generic interface. Andrew, please consider -mm inclusion. Or advise me what I need to do to

Prefetch kernel stacks to speed up context switch

2005-09-07 Thread Chen, Kenneth W
Repost previously discussed patch (on Jul 27, 2005). Ingo did the same thing for all arch with 471 lines of patch. I'm still advocating this little 30 lines patch, of 6 lines introduces prefetch_stack() generic interface. Andrew, please consider -mm inclusion. Or advise me what I need to do to

RE: [PATCH 1/1] Implement shared page tables

2005-09-01 Thread Chen, Kenneth W
Dave McCracken wrote on Tuesday, August 30, 2005 3:13 PM > This patch implements page table sharing for all shared memory regions that > span an entire page table page. It supports sharing at multiple page > levels, depending on the architecture. In function pt_share_pte(): > +

RE: [PATCH 1/1] Implement shared page tables

2005-09-01 Thread Chen, Kenneth W
Dave McCracken wrote on Tuesday, August 30, 2005 3:13 PM > This patch implements page table sharing for all shared memory regions that > span an entire page table page. It supports sharing at multiple page > levels, depending on the architecture. > > > This version of the patch supports i386

RE: [PATCH 1/1] Implement shared page tables

2005-09-01 Thread Chen, Kenneth W
Dave McCracken wrote on Tuesday, August 30, 2005 3:13 PM This patch implements page table sharing for all shared memory regions that span an entire page table page. It supports sharing at multiple page levels, depending on the architecture. This version of the patch supports i386 and

RE: [PATCH 1/1] Implement shared page tables

2005-09-01 Thread Chen, Kenneth W
Dave McCracken wrote on Tuesday, August 30, 2005 3:13 PM This patch implements page table sharing for all shared memory regions that span an entire page table page. It supports sharing at multiple page levels, depending on the architecture. In function pt_share_pte(): + while

RE: [sched, patch] better wake-balancing, #3

2005-08-08 Thread Chen, Kenneth W
Ingo Molnar wrote on Saturday, July 30, 2005 12:19 AM > * Nick Piggin <[EMAIL PROTECTED]> wrote: > > > > here's an updated patch. It handles one more detail: on SCHED_SMT we > > > should check the idleness of siblings too. Benchmark numbers still > > > look good. > > > > Maybe. Ken hasn't

FW: fix nohalt boot option

2005-08-08 Thread Chen, Kenneth W
"nohalt" option is currently broken on IPF (2.6.12 is the earliest kernel I looked, might be broken even earlier). - Ken -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Chen, Kenneth W Sent: Monday, August 08, 2005 3:25 PM To: linux-ia64@vger.

RE: [RFC] Demand faulting for large pages

2005-08-08 Thread Chen, Kenneth W
Adam Litke wrote on Monday, August 08, 2005 3:17 PM > The reason for the VM_FAULT_SIGBUS default return is because I thought a > fault on a pte_present hugetlb page was an invalid/unhandled fault. > I'll have another think about races to the fault handler though. Two threads fault on the same

RE: [RFC] Demand faulting for large pages

2005-08-08 Thread Chen, Kenneth W
Adam Litke wrote on Monday, August 08, 2005 3:17 PM The reason for the VM_FAULT_SIGBUS default return is because I thought a fault on a pte_present hugetlb page was an invalid/unhandled fault. I'll have another think about races to the fault handler though. Two threads fault on the same pte,

FW: fix nohalt boot option

2005-08-08 Thread Chen, Kenneth W
nohalt option is currently broken on IPF (2.6.12 is the earliest kernel I looked, might be broken even earlier). - Ken -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Chen, Kenneth W Sent: Monday, August 08, 2005 3:25 PM To: linux-ia64@vger.kernel.org

RE: [sched, patch] better wake-balancing, #3

2005-08-08 Thread Chen, Kenneth W
Ingo Molnar wrote on Saturday, July 30, 2005 12:19 AM * Nick Piggin [EMAIL PROTECTED] wrote: here's an updated patch. It handles one more detail: on SCHED_SMT we should check the idleness of siblings too. Benchmark numbers still look good. Maybe. Ken hasn't measured the effect

RE: [RFC] Demand faulting for large pages

2005-08-05 Thread Chen, Kenneth W
round-robin from all NUMA nodes. Chen, Kenneth W wrote on Friday, August 05, 2005 2:34 PM > Spurious WARN_ON. Calls to hugetlb_pte_fault() is conditioned upon > if (is_vm_hugetlb_page(vma)) > > > > Broken here. Return VM_FAULT_SIGBUS when *pte is present?? Why

RE: [RFC] Demand faulting for large pages

2005-08-05 Thread Chen, Kenneth W
Adam Litke wrote on Friday, August 05, 2005 8:22 AM > +int hugetlb_pte_fault(struct mm_struct *mm, struct vm_area_struct *vma, > + unsigned long address, int write_access) > +{ > + int ret = VM_FAULT_MINOR; > + unsigned long idx; > + pte_t *pte; > + struct page

RE: [RFC] Demand faulting for large pages

2005-08-05 Thread Chen, Kenneth W
Adam Litke wrote on Friday, August 05, 2005 8:22 AM > Below is a patch to implement demand faulting for huge pages. The main > motivation for changing from prefaulting to demand faulting is so that > huge page allocations can follow the NUMA API. Currently, huge pages > are allocated round-robin

RE: [RFC] Demand faulting for large pages

2005-08-05 Thread Chen, Kenneth W
Adam Litke wrote on Friday, August 05, 2005 8:22 AM Below is a patch to implement demand faulting for huge pages. The main motivation for changing from prefaulting to demand faulting is so that huge page allocations can follow the NUMA API. Currently, huge pages are allocated round-robin

RE: [RFC] Demand faulting for large pages

2005-08-05 Thread Chen, Kenneth W
Adam Litke wrote on Friday, August 05, 2005 8:22 AM +int hugetlb_pte_fault(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long address, int write_access) +{ + int ret = VM_FAULT_MINOR; + unsigned long idx; + pte_t *pte; + struct page *page;

RE: [RFC] Demand faulting for large pages

2005-08-05 Thread Chen, Kenneth W
from all NUMA nodes. Chen, Kenneth W wrote on Friday, August 05, 2005 2:34 PM Spurious WARN_ON. Calls to hugetlb_pte_fault() is conditioned upon if (is_vm_hugetlb_page(vma)) Broken here. Return VM_FAULT_SIGBUS when *pte is present?? Why can't you move all the logic

RE: Getting rid of SHMMAX/SHMALL ?

2005-08-04 Thread Chen, Kenneth W
Andi Kleen wrote on Thursday, August 04, 2005 3:54 PM > > This might be too low on large system. We usually stress shm pretty hard > > for db application and usually use more than 87% of total memory in just > > one shm segment. So I prefer either no limit or a tunable. > > With large system

RE: Getting rid of SHMMAX/SHMALL ?

2005-08-04 Thread Chen, Kenneth W
Andi Kleen wrote on Thursday, August 04, 2005 6:24 AM > I think we should just get rid of the per process limit and keep > the global limit, but make it auto tuning based on available memory. > That is still not very nice because that would likely keep it < available > memory/2, but I suspect

RE: Getting rid of SHMMAX/SHMALL ?

2005-08-04 Thread Chen, Kenneth W
Andi Kleen wrote on Thursday, August 04, 2005 6:24 AM I think we should just get rid of the per process limit and keep the global limit, but make it auto tuning based on available memory. That is still not very nice because that would likely keep it available memory/2, but I suspect

RE: Getting rid of SHMMAX/SHMALL ?

2005-08-04 Thread Chen, Kenneth W
Andi Kleen wrote on Thursday, August 04, 2005 3:54 PM This might be too low on large system. We usually stress shm pretty hard for db application and usually use more than 87% of total memory in just one shm segment. So I prefer either no limit or a tunable. With large system you mean

RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-29 Thread Chen, Kenneth W
Ingo Molnar wrote on Friday, July 29, 2005 4:26 AM > * Chen, Kenneth W <[EMAIL PROTECTED]> wrote: > > To demonstrate the problem, we turned off these two flags in the cpu > > sd domain and measured a stunning 2.15% performance gain! And > > deleting all the code in t

RE: Add prefetch switch stack hook in scheduler function

2005-07-29 Thread Chen, Kenneth W
Ingo Molnar wrote on Friday, July 29, 2005 1:36 AM > * Chen, Kenneth W <[EMAIL PROTECTED]> wrote: > > It generate slight different code because previous patch asks for a little > > over 5 cache lines worth of bytes and it always go to the for loop. > > ok - fix belo

RE: Add prefetch switch stack hook in scheduler function

2005-07-29 Thread Chen, Kenneth W
Ingo Molnar wrote on Friday, July 29, 2005 12:07 AM > the patch below unrolls the prefetch_range() loop manually, for up to 5 > cachelines prefetched. This patch, ontop of the 4 previous patches, > should generate similar code to the assembly code in your original > patch. The full patch-series

RE: Add prefetch switch stack hook in scheduler function

2005-07-29 Thread Chen, Kenneth W
Keith Owens wrote on Friday, July 29, 2005 12:38 AM > BTW, for ia64 you may as well prefetch pt_regs, that is also quite > large. > > #define MIN_KERNEL_STACK_FOOTPRINT (IA64_SWITCH_STACK_SIZE + > IA64_PT_REGS_SIZE) This has to be carefully done, because you really don't want to overwhelm

RE: Add prefetch switch stack hook in scheduler function

2005-07-29 Thread Chen, Kenneth W
Keith Owens wrote on Friday, July 29, 2005 12:46 AM > On Fri, 29 Jul 2005 00:22:43 -0700, > "Chen, Kenneth W" <[EMAIL PROTECTED]> wrote: > >On ia64, we have two kernel stacks, one for outgoing task, and one for > >incoming task. for outgoing task, we hav

RE: Add prefetch switch stack hook in scheduler function

2005-07-29 Thread Chen, Kenneth W
Ingo Molnar wrote on Friday, July 29, 2005 12:05 AM > --- linux.orig/kernel/sched.c > +++ linux/kernel/sched.c > @@ -2869,7 +2869,14 @@ go_idle: >* its thread_info, its kernel stack and mm: >*/ > prefetch(next->thread_info); > - prefetch(kernel_stack(next)); > + /* >

RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-29 Thread Chen, Kenneth W
Nick Piggin wrote on Thursday, July 28, 2005 7:01 PM > Chen, Kenneth W wrote: > >Well, that's exactly what I'm trying to do: make them not aggressive > >at all by not performing any load balance :-) The workload gets maximum > >benefit with zero aggressiveness. > > Un

RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-29 Thread Chen, Kenneth W
Nick Piggin wrote on Thursday, July 28, 2005 7:01 PM Chen, Kenneth W wrote: Well, that's exactly what I'm trying to do: make them not aggressive at all by not performing any load balance :-) The workload gets maximum benefit with zero aggressiveness. Unfortunately we can't forget about

RE: Add prefetch switch stack hook in scheduler function

2005-07-29 Thread Chen, Kenneth W
Ingo Molnar wrote on Friday, July 29, 2005 12:05 AM --- linux.orig/kernel/sched.c +++ linux/kernel/sched.c @@ -2869,7 +2869,14 @@ go_idle: * its thread_info, its kernel stack and mm: */ prefetch(next-thread_info); - prefetch(kernel_stack(next)); + /* + *

RE: Add prefetch switch stack hook in scheduler function

2005-07-29 Thread Chen, Kenneth W
Keith Owens wrote on Friday, July 29, 2005 12:46 AM On Fri, 29 Jul 2005 00:22:43 -0700, Chen, Kenneth W [EMAIL PROTECTED] wrote: On ia64, we have two kernel stacks, one for outgoing task, and one for incoming task. for outgoing task, we haven't called switch_to() yet. So the switch stack

RE: Add prefetch switch stack hook in scheduler function

2005-07-29 Thread Chen, Kenneth W
Keith Owens wrote on Friday, July 29, 2005 12:38 AM BTW, for ia64 you may as well prefetch pt_regs, that is also quite large. #define MIN_KERNEL_STACK_FOOTPRINT (IA64_SWITCH_STACK_SIZE + IA64_PT_REGS_SIZE) This has to be carefully done, because you really don't want to overwhelm number of

RE: Add prefetch switch stack hook in scheduler function

2005-07-29 Thread Chen, Kenneth W
Ingo Molnar wrote on Friday, July 29, 2005 12:07 AM the patch below unrolls the prefetch_range() loop manually, for up to 5 cachelines prefetched. This patch, ontop of the 4 previous patches, should generate similar code to the assembly code in your original patch. The full patch-series is:

RE: Add prefetch switch stack hook in scheduler function

2005-07-29 Thread Chen, Kenneth W
Ingo Molnar wrote on Friday, July 29, 2005 1:36 AM * Chen, Kenneth W [EMAIL PROTECTED] wrote: It generate slight different code because previous patch asks for a little over 5 cache lines worth of bytes and it always go to the for loop. ok - fix below. But i'm not that sure we want

RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-29 Thread Chen, Kenneth W
Ingo Molnar wrote on Friday, July 29, 2005 4:26 AM * Chen, Kenneth W [EMAIL PROTECTED] wrote: To demonstrate the problem, we turned off these two flags in the cpu sd domain and measured a stunning 2.15% performance gain! And deleting all the code in the try_to_wake_up() pertain to load

RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-28 Thread Chen, Kenneth W
Nick Piggin wrote on Thursday, July 28, 2005 6:46 PM > I'd like to try making them less aggressive first if possible. Well, that's exactly what I'm trying to do: make them not aggressive at all by not performing any load balance :-) The workload gets maximum benefit with zero aggressiveness. -

RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-28 Thread Chen, Kenneth W
Nick Piggin wrote on Thursday, July 28, 2005 6:25 PM > Well pipes are just an example. It could be any type of communication. > What's more, even the synchronous wakeup uses the wake balancing path > (although that could be modified to only do wake balancing for synch > wakeups, I'd have to be

RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-28 Thread Chen, Kenneth W
Nick Piggin wrote on Thursday, July 28, 2005 4:35 PM > Wake balancing provides an opportunity to provide some input bias > into the load balancer. > > For example, if you started 100 pairs of tasks which communicate > through a pipe. On a 2 CPU system without wake balancing, probably > half of

Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-28 Thread Chen, Kenneth W
What sort of workload needs SD_WAKE_AFFINE and SD_WAKE_BALANCE? SD_WAKE_AFFINE are not useful in conjunction with interrupt binding. In fact, it creates more harm than usefulness, causing detrimental process migration and destroy process cache affinity etc. Also SD_WAKE_BALANCE is giving us

RE: Add prefetch switch stack hook in scheduler function

2005-07-28 Thread Chen, Kenneth W
> i.e. like the patch below. Boot-tested on x86. x86, x64 and ia64 have a > real kernel_stack() implementation, the other architectures all return > 'next'. (I've also cleaned up a couple of other things in the > prefetch-next area, see the changelog below.) > > Ken, would this patch generate

RE: Add prefetch switch stack hook in scheduler function

2005-07-28 Thread Chen, Kenneth W
i.e. like the patch below. Boot-tested on x86. x86, x64 and ia64 have a real kernel_stack() implementation, the other architectures all return 'next'. (I've also cleaned up a couple of other things in the prefetch-next area, see the changelog below.) Ken, would this patch generate a

Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-28 Thread Chen, Kenneth W
What sort of workload needs SD_WAKE_AFFINE and SD_WAKE_BALANCE? SD_WAKE_AFFINE are not useful in conjunction with interrupt binding. In fact, it creates more harm than usefulness, causing detrimental process migration and destroy process cache affinity etc. Also SD_WAKE_BALANCE is giving us

RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-28 Thread Chen, Kenneth W
Nick Piggin wrote on Thursday, July 28, 2005 4:35 PM Wake balancing provides an opportunity to provide some input bias into the load balancer. For example, if you started 100 pairs of tasks which communicate through a pipe. On a 2 CPU system without wake balancing, probably half of the

RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-28 Thread Chen, Kenneth W
Nick Piggin wrote on Thursday, July 28, 2005 6:25 PM Well pipes are just an example. It could be any type of communication. What's more, even the synchronous wakeup uses the wake balancing path (although that could be modified to only do wake balancing for synch wakeups, I'd have to be

RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-28 Thread Chen, Kenneth W
Nick Piggin wrote on Thursday, July 28, 2005 6:46 PM I'd like to try making them less aggressive first if possible. Well, that's exactly what I'm trying to do: make them not aggressive at all by not performing any load balance :-) The workload gets maximum benefit with zero aggressiveness. -

Add ia64 specific prefetch switch stack implementation

2005-07-27 Thread Chen, Kenneth W
This patch adds ia64 specific implementation to prefetch switch stack structure. It applies on top of "add prefetch switch stack hook ..." posted earlier. Using my favorite industry standard OLTP workload, we measured 6.2X reduction on cache misses occurred in the context switch code and yielded

Add prefetch switch stack hook in scheduler function

2005-07-27 Thread Chen, Kenneth W
I would like to propose adding a prefetch switch stack hook in the scheduler function. For architecture like ia64, the switch stack structure is fairly large (currently 528 bytes). For context switch intensive application, we found that significant amount of cache misses occurs in switch_to()

Add prefetch switch stack hook in scheduler function

2005-07-27 Thread Chen, Kenneth W
I would like to propose adding a prefetch switch stack hook in the scheduler function. For architecture like ia64, the switch stack structure is fairly large (currently 528 bytes). For context switch intensive application, we found that significant amount of cache misses occurs in switch_to()

Add ia64 specific prefetch switch stack implementation

2005-07-27 Thread Chen, Kenneth W
This patch adds ia64 specific implementation to prefetch switch stack structure. It applies on top of add prefetch switch stack hook ... posted earlier. Using my favorite industry standard OLTP workload, we measured 6.2X reduction on cache misses occurred in the context switch code and yielded

RE: [announce] linux kernel performance project launch at sourceforge.net

2005-07-14 Thread Chen, Kenneth W
Alexey Dobriyan wrote on Thursday, July 14, 2005 3:34 PM > On Friday 15 July 2005 00:21, Chen, Kenneth W wrote: > > I'm pleased to announce that we have established a linux kernel > > performance project, hosted at sourceforge.net: > > > > http://kernel-perf.sourcef

RE: [announce] linux kernel performance project launch at sourceforge.net

2005-07-14 Thread Chen, Kenneth W
[EMAIL PROTECTED] wrote on Thursday, July 14, 2005 3:18 PM > "Chen, Kenneth W" <[EMAIL PROTECTED]> writes: > > I'm pleased to announce that we have established a linux kernel > > performance project, hosted at sourceforge.net: > > > > http://kernel

RE: [announce] linux kernel performance project launch at sourceforge.net

2005-07-14 Thread Chen, Kenneth W
[EMAIL PROTECTED] wrote on Thursday, July 14, 2005 3:18 PM Chen, Kenneth W [EMAIL PROTECTED] writes: I'm pleased to announce that we have established a linux kernel performance project, hosted at sourceforge.net: http://kernel-perf.sourceforge.net That's very cool. Thanks a lot

RE: [announce] linux kernel performance project launch at sourceforge.net

2005-07-14 Thread Chen, Kenneth W
Alexey Dobriyan wrote on Thursday, July 14, 2005 3:34 PM On Friday 15 July 2005 00:21, Chen, Kenneth W wrote: I'm pleased to announce that we have established a linux kernel performance project, hosted at sourceforge.net: http://kernel-perf.sourceforge.net Perhaps, some cool-looking

RE: Processes stuck on D state on Dual Opteron

2005-04-12 Thread Chen, Kenneth W
Nick Piggin wrote on Tuesday, April 12, 2005 4:09 AM > Chen, Kenneth W wrote: > > I like the patch a lot and already did bench it on our db setup. However, > > I'm seeing a negative regression compare to a very very crappy patch (see > > attached, you can laugh at me

RE: [patch] new fifo I/O elevator that really does nothing at all

2005-04-12 Thread Chen, Kenneth W
Chen, Kenneth W wrote on Tuesday, April 05, 2005 5:13 PM > Jens Axboe wrote on Tuesday, April 05, 2005 7:54 AM > > On Tue, Mar 29 2005, Chen, Kenneth W wrote: > > > Jens Axboe wrote on Tuesday, March 29, 2005 12:04 PM > > > > No such promise was ever made, noop

Prototype error in

2005-04-12 Thread Chen, Kenneth W
To lazy to write a patch, the inline debugfs function declaration for the following three functions disagree between CONFIG_DEBUG_FS and !CONFIG_DEBUG_FS 4th argument mismatch, looks like an obvious copy-n-paste error. u16, u32, and u32? static inline struct dentry *debugfs_create_u16(const

RE: Processes stuck on D state on Dual Opteron

2005-04-12 Thread Chen, Kenneth W
On Tue, Apr 12 2005, Nick Piggin wrote: > Actually the patches I have sent you do fix real bugs, but they also > make the block layer less likely to recurse into page reclaim, so it > may be eg. hiding the problem that Neil's patch fixes. Jens Axboe wrote on Tuesday, April 12, 2005 12:08 AM > Can

RE: Processes stuck on D state on Dual Opteron

2005-04-12 Thread Chen, Kenneth W
On Tue, Apr 12 2005, Nick Piggin wrote: Actually the patches I have sent you do fix real bugs, but they also make the block layer less likely to recurse into page reclaim, so it may be eg. hiding the problem that Neil's patch fixes. Jens Axboe wrote on Tuesday, April 12, 2005 12:08 AM Can you

Prototype error in linux/debugfs.h

2005-04-12 Thread Chen, Kenneth W
To lazy to write a patch, the inline debugfs function declaration for the following three functions disagree between CONFIG_DEBUG_FS and !CONFIG_DEBUG_FS 4th argument mismatch, looks like an obvious copy-n-paste error. u16, u32, and u32? static inline struct dentry *debugfs_create_u16(const

RE: [patch] new fifo I/O elevator that really does nothing at all

2005-04-12 Thread Chen, Kenneth W
Chen, Kenneth W wrote on Tuesday, April 05, 2005 5:13 PM Jens Axboe wrote on Tuesday, April 05, 2005 7:54 AM On Tue, Mar 29 2005, Chen, Kenneth W wrote: Jens Axboe wrote on Tuesday, March 29, 2005 12:04 PM No such promise was ever made, noop just means it does 'basically nothing

RE: Processes stuck on D state on Dual Opteron

2005-04-12 Thread Chen, Kenneth W
Nick Piggin wrote on Tuesday, April 12, 2005 4:09 AM Chen, Kenneth W wrote: I like the patch a lot and already did bench it on our db setup. However, I'm seeing a negative regression compare to a very very crappy patch (see attached, you can laugh at me for doing things like that :-). OK

RE: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]

2005-04-07 Thread Chen, Kenneth W
Ingo Molnar wrote on Tuesday, April 05, 2005 11:46 PM > ok, the delay of 16 secs is alot better. Could you send me the full > detection log, how stable is the curve? Full log attached. begin 666 boot.log M0F]O="!PF5D($E40R!W:71H($-052 P("AL87-T(&1I9F8@,R!C>6-L97,L(>&5R M@I#86QI8G)A=

RE: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]

2005-04-07 Thread Chen, Kenneth W
Ingo Molnar wrote on Tuesday, April 05, 2005 11:46 PM ok, the delay of 16 secs is alot better. Could you send me the full detection log, how stable is the curve? Full log attached. begin 666 boot.log M0F]O=!PF]C97-S;W(@:60@,'@P+S!X8S0Q. I#4%4@,3H@WEN8VAR;VYI MF5D($E40R!W:71H($-052

RE: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]

2005-04-05 Thread Chen, Kenneth W
Ingo Molnar wrote on Monday, April 04, 2005 8:05 PM > > latest patch attached. Changes: > > - stabilized calibration even more, by using cache flushing >instructions to generate a predictable working set. The cache >flushing itself is not timed, it is used to create quiescent >cache

RE: [patch] new fifo I/O elevator that really does nothing at all

2005-04-05 Thread Chen, Kenneth W
Jens Axboe wrote on Tuesday, April 05, 2005 7:54 AM > On Tue, Mar 29 2005, Chen, Kenneth W wrote: > > Jens Axboe wrote on Tuesday, March 29, 2005 12:04 PM > > > No such promise was ever made, noop just means it does 'basically > > > nothing'. It never meant FIFO

RE: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]

2005-04-05 Thread Chen, Kenneth W
Ingo Molnar wrote on Sunday, April 03, 2005 11:24 PM > great! How long does the benchmark take (hours?), and is there any way > to speed up the benchmarking (without hurting accuracy), so that > multiple migration-cost settings could be tried? Would it be possible to > try a few other values via

RE: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]

2005-04-05 Thread Chen, Kenneth W
Ingo Molnar wrote on Sunday, April 03, 2005 11:24 PM great! How long does the benchmark take (hours?), and is there any way to speed up the benchmarking (without hurting accuracy), so that multiple migration-cost settings could be tried? Would it be possible to try a few other values via the

RE: [patch] new fifo I/O elevator that really does nothing at all

2005-04-05 Thread Chen, Kenneth W
Jens Axboe wrote on Tuesday, April 05, 2005 7:54 AM On Tue, Mar 29 2005, Chen, Kenneth W wrote: Jens Axboe wrote on Tuesday, March 29, 2005 12:04 PM No such promise was ever made, noop just means it does 'basically nothing'. It never meant FIFO in anyway, we cannot break the semantics

RE: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]

2005-04-05 Thread Chen, Kenneth W
Ingo Molnar wrote on Monday, April 04, 2005 8:05 PM latest patch attached. Changes: - stabilized calibration even more, by using cache flushing instructions to generate a predictable working set. The cache flushing itself is not timed, it is used to create quiescent cache state.

RE: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]

2005-04-04 Thread Chen, Kenneth W
* Chen, Kenneth W <[EMAIL PROTECTED]> wrote: > The cache size information on ia64 is already available at the finger > tip. Here is a patch that I whipped up to set max_cache_size for ia64. Ingo Molnar wrote on Monday, April 04, 2005 4:38 AM > thanks - i've added this to my

RE: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]

2005-04-04 Thread Chen, Kenneth W
* Chen, Kenneth W [EMAIL PROTECTED] wrote: The cache size information on ia64 is already available at the finger tip. Here is a patch that I whipped up to set max_cache_size for ia64. Ingo Molnar wrote on Monday, April 04, 2005 4:38 AM thanks - i've added this to my tree. i've attached

RE: [patch] sched: improve pinned task handling again!

2005-04-03 Thread Chen, Kenneth W
Siddha, Suresh B wrote on Friday, April 01, 2005 8:05 PM > On Sat, Apr 02, 2005 at 01:11:20PM +1000, Nick Piggin wrote: > > How important is this? Any application to real workloads? Even if > > not, I agree it would be nice to improve this more. I don't know > > if I really like this approach - I

RE: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]

2005-04-03 Thread Chen, Kenneth W
Ingo Molnar wrote on Sunday, April 03, 2005 7:30 AM > how close are these numbers to the real worst-case migration costs on > that box? I booted your latest patch on a 4-way SMP box (1.5 GHz, 9MB ia64). This is what it produces. I think the estimate is excellent. [00]: -10.4(0) 10.4(0)

RE: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]

2005-04-03 Thread Chen, Kenneth W
Ingo Molnar wrote on Saturday, April 02, 2005 11:04 PM > the default on ia64 (32MB) was way too large and caused the search to > start from 64MB. That can take a _long_ time. > > i've attached a new patch with your changes included, and a couple of > new things added: > > - removed the 32MB

RE: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]

2005-04-03 Thread Chen, Kenneth W
Ingo Molnar wrote on Saturday, April 02, 2005 11:04 PM the default on ia64 (32MB) was way too large and caused the search to start from 64MB. That can take a _long_ time. i've attached a new patch with your changes included, and a couple of new things added: - removed the 32MB

RE: [patch] sched: improve pinned task handling again!

2005-04-03 Thread Chen, Kenneth W
Siddha, Suresh B wrote on Friday, April 01, 2005 8:05 PM On Sat, Apr 02, 2005 at 01:11:20PM +1000, Nick Piggin wrote: How important is this? Any application to real workloads? Even if not, I agree it would be nice to improve this more. I don't know if I really like this approach - I guess

RE: Industry db benchmark result on recent 2.6 kernels

2005-04-01 Thread Chen, Kenneth W
Paul Jackson wrote on Friday, April 01, 2005 5:45 PM > Kenneth wrote: > > Paul, you definitely want to check this out on your large numa box. > > Interesting - thanks. I can get a kernel patched and booted on a big > box easily enough. I don't know how to run an "industry db benchmark", > and

RE: Industry db benchmark result on recent 2.6 kernels

2005-04-01 Thread Chen, Kenneth W
the problem that would be an easy thing to fix for 2.6.12. Chen, Kenneth W wrote on Thursday, March 31, 2005 9:15 PM > Yes, we are increasing the number in our experiments. It's in the queue > and I should have a result soon. Hot of the press: bumping up cache_hot_time to 10ms on our db s

RE: Linux Kernel Performance Testing

2005-04-01 Thread Chen, Kenneth W
Grecko OSCP wrote on Friday, April 01, 2005 10:22 AM > I noticed yesterday a news article on Linux.org about more kernel > performance testing being called for, and I decided it would be a nice > project to try. I have 10 completely identical systems that can be > used for this, and would like to

<    1   2   3   >