Re: [PATCH] rw_semaphores, optimisations try #4

2001-04-26 Thread David Howells
Andrea Arcangeli <[EMAIL PROTECTED]> wrote: > It seems more similar to my code btw (you finally killed the useless > chmxchg ;). CMPXCHG ought to make things better by avoiding the XADD(+1)/XADD(-1) loop, however, I tried various combinations and XADD beats CMPXCHG significantly. Here's a quote

Re: [PATCH] rw_semaphores, optimisations try #4

2001-04-25 Thread Andrea Arcangeli
On Wed, Apr 25, 2001 at 09:06:38PM +0100, D . W . Howells wrote: > This patch (made against linux-2.4.4-pre6 + rwsem-opt3) somewhat improves > performance on the i386 XADD optimised implementation: It seems more similar to my code btw (you finally killed the useless chmxchg ;). I only had a sho

[PATCH] rw_semaphores, optimisations try #4

2001-04-25 Thread D . W . Howells
This patch (made against linux-2.4.4-pre6 + rwsem-opt3) somewhat improves performance on the i386 XADD optimised implementation: A patch against -pre6 can be obtained too: ftp://infradead.org/pub/people/dwh/rwsem-pre6-opt4.diff Here's some benchmarks (take with a pinch of salt of cours

Re: [PATCH] rw_semaphores, optimisations try #3

2001-04-24 Thread David Howells
Linus Torvalds <[EMAIL PROTECTED]> wrote: > - nobody will look up the list because we do have the spinlock at this > point, so a destroyed list doesn't actually _matter_ to anybody I suppose that it'll be okay, provided I take care not to access a block for a task I've just woken up. > - list_

Re: [PATCH] rw_semaphores, optimisations try #3

2001-04-24 Thread Linus Torvalds
On Tue, 24 Apr 2001, David Howells wrote: > > Yes but the "struct rwsem_waiter" batch would have to be entirely deleted from > the list before any of them are woken, otherwise the waking processes may > destroy their "rwsem_waiter" blocks before they are dequeued (this destruction > is not guard

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread Andrea Arcangeli
On Tue, Apr 24, 2001 at 02:07:47PM +0100, David Howells wrote: > It was my implementation that triggered it (I haven't tried it with yours), > but the bug occurred because the SUBL happened to make the change outside of > the spinlocked region in the slowpath at the same time as the wakeup routine

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread David Howells
> so you reproduced a deadlock with my patch applied, or you are saying > you discovered that case with one of you testcases? It was my implementation that triggered it (I haven't tried it with yours), but the bug occurred because the SUBL happened to make the change outside of the spinlocked reg

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread Andrea Arcangeli
On Tue, Apr 24, 2001 at 02:19:28PM +0200, Andrea Arcangeli wrote: > I'm starting the benchmarks of the C version and I will post a number update > and a new patch in a few minutes. (sorry for the below wrap around, just grow your terminal to read it stright) aa RW

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread Andrea Arcangeli
There is a bug in both the C version and asm version of my rwsem and it is the slow path where I forgotten to drop the _irq part from the spinlock calls ;) Silly bug. (I inherit it also in the asm fast path version because I started hacking the same C slow path) I catched it now because it locks

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread Andrea Arcangeli
On Tue, Apr 24, 2001 at 11:33:13AM +0100, David Howells wrote: > *grin* Fun ain't it... Try it on a dual athlon or P4 and the answer may come > out differently. compile with -mathlon and the compiler then should generate (%%eax) if that's faster even if the sem is a constant, that's a compiler is

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread Andrea Arcangeli
On Tue, Apr 24, 2001 at 11:25:23AM +0100, David Howells wrote: > > I'd love to hear this sequence. Certainly regression testing never generated > > this sequence yet but yes that doesn't mean anything. Note that your slow > > path is very different than mine. > > One of my testcases fell over on

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread David Howells
> I see what you meant here and no, I'm not lucky, I thought about that. gcc x > 2.95.* seems smart enough to produce (%%eax) that you hardcoded when the > sem is not a constant (I'm not clobbering another register, if it does it's > stupid and I consider this a compiler mistake). It is a compil

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread David Howells
> I'd love to hear this sequence. Certainly regression testing never generated > this sequence yet but yes that doesn't mean anything. Note that your slow > path is very different than mine. One of my testcases fell over on it... > I don't feel the need of any xchg to enforce additional serializ

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread Andrea Arcangeli
On Tue, Apr 24, 2001 at 09:56:11AM +0100, David Howells wrote: > | +: "+m" (sem->count), "+a" (sem) ^^ I think you were comenting on the +m not +a ok > > >From what I've been told,

Re: [PATCH] rw_semaphores, optimisations try #3

2001-04-24 Thread David Howells
Linus Torvalds <[EMAIL PROTECTED]> wrote: > Note that the generic list structure already has support for "batching". > It only does it for multiple adds right now (see the "list_splice" > merging code), but there is nothing to stop people from doing it for > multiple deletions too. The code is som

Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-24 Thread David Howells
> Ok I finished now my asm optimized rwsemaphores and I improved a little my > spinlock based one but without touching the icache usage. And I can break it. There's a very good reason the I changed __up_write() to use CMPXCHG instead of SUBL. I found a sequence of operations that locked up on thi

rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisations try #3]

2001-04-23 Thread Andrea Arcangeli
On Mon, Apr 23, 2001 at 11:34:35PM +0200, Andrea Arcangeli wrote: > On Mon, Apr 23, 2001 at 09:35:34PM +0100, D . W . Howells wrote: > > This patch (made against linux-2.4.4-pre6) makes a number of changes to the > > rwsem implementation: > > > > (1) Everything in try #2 > > > > plus > > > >

Re: [PATCH] rw_semaphores, optimisations try #3

2001-04-23 Thread Linus Torvalds
On Mon, 23 Apr 2001, D.W.Howells wrote: > > Linus, you suggested that the generic list handling stuff would be faster (2 > unconditional stores) than mine (1 unconditional store and 1 conditional > store and branch to jump round it). You are both right and wrong. The generic > code does two stor

Re: [PATCH] rw_semaphores, optimisations try #3

2001-04-23 Thread Andrea Arcangeli
On Mon, Apr 23, 2001 at 09:35:34PM +0100, D . W . Howells wrote: > This patch (made against linux-2.4.4-pre6) makes a number of changes to the > rwsem implementation: > > (1) Everything in try #2 > > plus > > (2) Changes proposed by Linus for the generic semaphore code. > > (3) Ideas from A

[PATCH] rw_semaphores, optimisations try #3

2001-04-23 Thread D . W . Howells
This patch (made against linux-2.4.4-pre6) makes a number of changes to the rwsem implementation: (1) Everything in try #2 plus (2) Changes proposed by Linus for the generic semaphore code. (3) Ideas from Andrea and how he implemented his semaphores. Linus, you suggested that the generic l

Re: [PATCH] rw_semaphores, optimisations

2001-04-22 Thread Andrea Arcangeli
On Mon, Apr 23, 2001 at 03:04:41AM +0200, Andrea Arcangeli wrote: > that is supposed to be a performance optimization, I do the same too in my code. ah no I see what you mean, yes you are hurted by that. I'm waiting your #try 3 against pre6, by that time I hope to be able to make a run of the ben

Re: [PATCH] rw_semaphores, optimisations

2001-04-22 Thread Andrea Arcangeli
On Sun, Apr 22, 2001 at 11:52:29PM +0100, D . W . Howells wrote: > Hello Andrea, > > Interesting benchmarks... did you compile the test programs with "make > SCHED=yes" by any chance? Also what other software are you running? No I never tried the SCHED=yes. However in my modification of the rws

Re: [PATCH] rw_semaphores, optimisations

2001-04-22 Thread D . W . Howells
Hello Andrea, Interesting benchmarks... did you compile the test programs with "make SCHED=yes" by any chance? Also what other software are you running? The reason I ask is that running a full blown KDE setup running in the background, I get the following numbers on the rwsem-ro test (XADD opt

Re: [PATCH] rw_semaphores, optimisations

2001-04-22 Thread Andrea Arcangeli
On Sun, Apr 22, 2001 at 09:07:03PM +0200, Andrea Arcangeli wrote: > On Sun, Apr 22, 2001 at 01:27:20AM +0100, D . W . Howells wrote: btw, I noticed I answered your previous email but for my benchmarks I really used your latest #try2 posted today at 13 (not last night a 1am), just to avoid mistake

Re: [PATCH] rw_semaphores, optimisations

2001-04-22 Thread Andrea Arcangeli
On Sun, Apr 22, 2001 at 01:27:20AM +0100, D . W . Howells wrote: > This patch (made against linux-2.4.4-pre6) makes a number of changes to the > rwsem implementation: > > (1) Fixes a subtle contention bug between up_write and the down_* functions. > > (2) Optimises the i386 fastpath implementa