Re: [PATCH] rw_semaphores, optimisations try #3

2001-04-24 Thread David Howells
Linus Torvalds <[EMAIL PROTECTED]> wrote: > - nobody will look up the list because we do have the spinlock at this > point, so a destroyed list doesn't actually _matter_ to anybody I suppose that it'll be okay, provided I take care not to access a block for a task I've just woken up. > -

Re: [PATCH] rw_semaphores, optimisations try #3

2001-04-24 Thread Linus Torvalds
On Tue, 24 Apr 2001, David Howells wrote: > > Yes but the "struct rwsem_waiter" batch would have to be entirely deleted from > the list before any of them are woken, otherwise the waking processes may > destroy their "rwsem_waiter" blocks before they are dequeued (this destruction > is not

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread Andrea Arcangeli
On Tue, Apr 24, 2001 at 02:07:47PM +0100, David Howells wrote: > It was my implementation that triggered it (I haven't tried it with yours), > but the bug occurred because the SUBL happened to make the change outside of > the spinlocked region in the slowpath at the same time as the wakeup

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread David Howells
> so you reproduced a deadlock with my patch applied, or you are saying > you discovered that case with one of you testcases? It was my implementation that triggered it (I haven't tried it with yours), but the bug occurred because the SUBL happened to make the change outside of the spinlocked

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread Andrea Arcangeli
On Tue, Apr 24, 2001 at 02:19:28PM +0200, Andrea Arcangeli wrote: > I'm starting the benchmarks of the C version and I will post a number update > and a new patch in a few minutes. (sorry for the below wrap around, just grow your terminal to read it stright) aa

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread Andrea Arcangeli
There is a bug in both the C version and asm version of my rwsem and it is the slow path where I forgotten to drop the _irq part from the spinlock calls ;) Silly bug. (I inherit it also in the asm fast path version because I started hacking the same C slow path) I catched it now because it locks

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread Andrea Arcangeli
On Tue, Apr 24, 2001 at 11:33:13AM +0100, David Howells wrote: > *grin* Fun ain't it... Try it on a dual athlon or P4 and the answer may come > out differently. compile with -mathlon and the compiler then should generate (%%eax) if that's faster even if the sem is a constant, that's a compiler

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread Andrea Arcangeli
On Tue, Apr 24, 2001 at 11:25:23AM +0100, David Howells wrote: > > I'd love to hear this sequence. Certainly regression testing never generated > > this sequence yet but yes that doesn't mean anything. Note that your slow > > path is very different than mine. > > One of my testcases fell over on

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread David Howells
> I see what you meant here and no, I'm not lucky, I thought about that. gcc x > 2.95.* seems smart enough to produce (%%eax) that you hardcoded when the > sem is not a constant (I'm not clobbering another register, if it does it's > stupid and I consider this a compiler mistake). It is a

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread David Howells
> I'd love to hear this sequence. Certainly regression testing never generated > this sequence yet but yes that doesn't mean anything. Note that your slow > path is very different than mine. One of my testcases fell over on it... > I don't feel the need of any xchg to enforce additional

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread Andrea Arcangeli
On Tue, Apr 24, 2001 at 09:56:11AM +0100, David Howells wrote: > | +: "+m" (sem->count), "+a" (sem) ^^ I think you were comenting on the +m not +a ok > > >From what I've been

Re: [PATCH] rw_semaphores, optimisations try #3

2001-04-24 Thread David Howells
Linus Torvalds <[EMAIL PROTECTED]> wrote: > Note that the generic list structure already has support for "batching". > It only does it for multiple adds right now (see the "list_splice" > merging code), but there is nothing to stop people from doing it for > multiple deletions too. The code is

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread David Howells
> Ok I finished now my asm optimized rwsemaphores and I improved a little my > spinlock based one but without touching the icache usage. And I can break it. There's a very good reason the I changed __up_write() to use CMPXCHG instead of SUBL. I found a sequence of operations that locked up on

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread David Howells
Ok I finished now my asm optimized rwsemaphores and I improved a little my spinlock based one but without touching the icache usage. And I can break it. There's a very good reason the I changed __up_write() to use CMPXCHG instead of SUBL. I found a sequence of operations that locked up on

Re: [PATCH] rw_semaphores, optimisations try #3

2001-04-24 Thread David Howells
Linus Torvalds [EMAIL PROTECTED] wrote: Note that the generic list structure already has support for batching. It only does it for multiple adds right now (see the list_splice merging code), but there is nothing to stop people from doing it for multiple deletions too. The code is something

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread Andrea Arcangeli
On Tue, Apr 24, 2001 at 09:56:11AM +0100, David Howells wrote: | +: +m (sem-count), +a (sem) ^^ I think you were comenting on the +m not +a ok From what I've been told, you're

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread David Howells
I'd love to hear this sequence. Certainly regression testing never generated this sequence yet but yes that doesn't mean anything. Note that your slow path is very different than mine. One of my testcases fell over on it... I don't feel the need of any xchg to enforce additional

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread David Howells
I see what you meant here and no, I'm not lucky, I thought about that. gcc x 2.95.* seems smart enough to produce (%%eax) that you hardcoded when the sem is not a constant (I'm not clobbering another register, if it does it's stupid and I consider this a compiler mistake). It is a compiler

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread Andrea Arcangeli
On Tue, Apr 24, 2001 at 11:25:23AM +0100, David Howells wrote: I'd love to hear this sequence. Certainly regression testing never generated this sequence yet but yes that doesn't mean anything. Note that your slow path is very different than mine. One of my testcases fell over on it...

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread Andrea Arcangeli
On Tue, Apr 24, 2001 at 11:33:13AM +0100, David Howells wrote: *grin* Fun ain't it... Try it on a dual athlon or P4 and the answer may come out differently. compile with -mathlon and the compiler then should generate (%%eax) if that's faster even if the sem is a constant, that's a compiler

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread Andrea Arcangeli
There is a bug in both the C version and asm version of my rwsem and it is the slow path where I forgotten to drop the _irq part from the spinlock calls ;) Silly bug. (I inherit it also in the asm fast path version because I started hacking the same C slow path) I catched it now because it locks

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread Andrea Arcangeli
On Tue, Apr 24, 2001 at 02:19:28PM +0200, Andrea Arcangeli wrote: I'm starting the benchmarks of the C version and I will post a number update and a new patch in a few minutes. (sorry for the below wrap around, just grow your terminal to read it stright) aa RW

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread David Howells
so you reproduced a deadlock with my patch applied, or you are saying you discovered that case with one of you testcases? It was my implementation that triggered it (I haven't tried it with yours), but the bug occurred because the SUBL happened to make the change outside of the spinlocked

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread Andrea Arcangeli
On Tue, Apr 24, 2001 at 02:07:47PM +0100, David Howells wrote: It was my implementation that triggered it (I haven't tried it with yours), but the bug occurred because the SUBL happened to make the change outside of the spinlocked region in the slowpath at the same time as the wakeup routine

Re: [PATCH] rw_semaphores, optimisations try #3

2001-04-24 Thread Linus Torvalds
On Tue, 24 Apr 2001, David Howells wrote: Yes but the struct rwsem_waiter batch would have to be entirely deleted from the list before any of them are woken, otherwise the waking processes may destroy their rwsem_waiter blocks before they are dequeued (this destruction is not guarded by a

Re: [PATCH] rw_semaphores, optimisations try #3

2001-04-24 Thread David Howells
Linus Torvalds [EMAIL PROTECTED] wrote: - nobody will look up the list because we do have the spinlock at this point, so a destroyed list doesn't actually _matter_ to anybody I suppose that it'll be okay, provided I take care not to access a block for a task I've just woken up. -

rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-23 Thread Andrea Arcangeli
On Mon, Apr 23, 2001 at 11:34:35PM +0200, Andrea Arcangeli wrote: > On Mon, Apr 23, 2001 at 09:35:34PM +0100, D . W . Howells wrote: > > This patch (made against linux-2.4.4-pre6) makes a number of changes to the > > rwsem implementation: > > > > (1) Everything in try #2 > > > > plus > > > >

Re: [PATCH] rw_semaphores, optimisations try #3

2001-04-23 Thread Linus Torvalds
On Mon, 23 Apr 2001, D.W.Howells wrote: > > Linus, you suggested that the generic list handling stuff would be faster (2 > unconditional stores) than mine (1 unconditional store and 1 conditional > store and branch to jump round it). You are both right and wrong. The generic > code does two

Re: [PATCH] rw_semaphores, optimisations try #3

2001-04-23 Thread Andrea Arcangeli
On Mon, Apr 23, 2001 at 09:35:34PM +0100, D . W . Howells wrote: > This patch (made against linux-2.4.4-pre6) makes a number of changes to the > rwsem implementation: > > (1) Everything in try #2 > > plus > > (2) Changes proposed by Linus for the generic semaphore code. > > (3) Ideas from

[PATCH] rw_semaphores, optimisations try #3

2001-04-23 Thread D . W . Howells
This patch (made against linux-2.4.4-pre6) makes a number of changes to the rwsem implementation: (1) Everything in try #2 plus (2) Changes proposed by Linus for the generic semaphore code. (3) Ideas from Andrea and how he implemented his semaphores. Linus, you suggested that the generic

Re: [PATCH] rw_semaphores, optimisations try #3

2001-04-23 Thread Andrea Arcangeli
On Mon, Apr 23, 2001 at 09:35:34PM +0100, D . W . Howells wrote: This patch (made against linux-2.4.4-pre6) makes a number of changes to the rwsem implementation: (1) Everything in try #2 plus (2) Changes proposed by Linus for the generic semaphore code. (3) Ideas from Andrea and

Re: [PATCH] rw_semaphores, optimisations try #3

2001-04-23 Thread Linus Torvalds
On Mon, 23 Apr 2001, D.W.Howells wrote: Linus, you suggested that the generic list handling stuff would be faster (2 unconditional stores) than mine (1 unconditional store and 1 conditional store and branch to jump round it). You are both right and wrong. The generic code does two stores

[PATCH] rw_semaphores, optimisations try #3

2001-04-23 Thread D . W . Howells
This patch (made against linux-2.4.4-pre6) makes a number of changes to the rwsem implementation: (1) Everything in try #2 plus (2) Changes proposed by Linus for the generic semaphore code. (3) Ideas from Andrea and how he implemented his semaphores. Linus, you suggested that the generic

rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-23 Thread Andrea Arcangeli
On Mon, Apr 23, 2001 at 11:34:35PM +0200, Andrea Arcangeli wrote: On Mon, Apr 23, 2001 at 09:35:34PM +0100, D . W . Howells wrote: This patch (made against linux-2.4.4-pre6) makes a number of changes to the rwsem implementation: (1) Everything in try #2 plus (2) Changes