Re: [PATCH] 4th try: i386 rw_semaphores fix

2001-04-12 Thread D . W . Howells




Andrew Morton wrote:
> It still doesn't compile with gcc-2.91.66 because of the "#define
> rwsemdebug(FMT, ...)" thing. What can we do about this? 

Hmmm... It probably needs to be made conditional on the version of the 
compiler by means of the macros that hold the version numbers.

> I cooked up a few more tests, generally beat the thing 
> up. This code works. Good stuff. I'll do some more testing 
> later this week - put some delays inside the semaphore code 
> itself to stretch the timing windows, but I don't see how 
> it can break it. 

Excellent. I tried to keep it as simple as possible to make it easier to test 
and follow.

> Manfred says the rep;nop should come *before* the memory 
> operation, but I don't know if he's been heard... 

Probably... It's copied directly out of the spinlock stuff. I noticed it at 
the time, but it didn't stick in my memory.

> The spin_lock_irqsave() in up_read/up_write shouldn't be 
> necessary plus it puts the readers onto the 
> waitqueue with add_wait_queue_exclusive, so an up_write 
> will only release one reader.

Not so... have you looked at the new __wait_ctx_common() function? It 
actually ignores the exclusive flag since it uses other criteria as to whom 
to wake. All the add_wait_queue_exclusive() function does is put the process 
on the _back_ of the queue as far as rwsems are concerned.

> Other architectures need work. 
> If they can live with a spinlock in the fastpath, then 
> the code at http://www.uow.edu.au/~andrewm/linux/rw_semaphore.tar.gz 
> is bog-simple and works.

Sorry to pre-empt you, but have you seen my "advanced" patch? I sent it to 
the list in an email with the subject:

[PATCH] i386 rw_semaphores, general abstraction patch

I added a general fallback implementation with the inline implementation 
functions written wholly in C and using spinlocks.

> Now I think we should specify the API a bit: 

Agreed... I'll think more in it on Tuesday, though... after Easter.

Your points, however, look fairly reasonable.

> Perhaps the writer-wakes-multiple 
> readers stuff isn't happening

It does. Using my test program & module, try:

driver -5 & sleep 1; driver 100 &

Keep doing "ps", and you'll see all the reader processes jump into the S 
state at the same time, once all the writers have finished. You may need to 
increase the delay between the ups and downs in the module to see it clearly.

David Howells
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] 4th try: i386 rw_semaphores fix

2001-04-12 Thread Andrew Morton

David Howells wrote:
> 
> Here's the RW semaphore patch attempt #4. This fixes the bugs that Andrew
> Morton's test cases showed up.
> 


It still doesn't compile with gcc-2.91.66 because of the
"#define rwsemdebug(FMT, ...)" thing.  What can we do
about this?

I cooked up a few more tests, generally beat the thing
up.  This code works.  Good stuff.  I'll do some more testing
later this week - put some delays inside the semaphore code
itself to stretch the timing windows, but I don't see how
it can break it.

Manfred says the rep;nop should come *before* the memory
operation, but I don't know if he's been heard...

parisc looks OK.  ppc tried hard, but didn't get it
right.  The spin_lock_irqsave() in up_read/up_write shouldn't be
necessary plus it puts the readers onto the
waitqueue with add_wait_queue_exclusive, so an up_write
will only release one reader.  Looks like they'll end
up with stuck readers.  Other architectures need work.
If they can live with a spinlock in the fastpath, then
the code at http://www.uow.edu.au/~andrewm/linux/rw_semaphore.tar.gz
is bog-simple and works.



Now I think we should specify the API a bit:

* A task is not allowed to down_read() the same rwsem
more than once time.  Otherwise another task can
do a down_write() in the middle and cause deadlock.
(I don't think this has been specified for read_lock()
and write_lock().  There's no such deadlock on the
x86 implementation of these.  Other architectures?
Heaven knows.  They're probably OK).

* up_read() and up_write() may *not* be called from
interrupt context.

The latter is just a suggestion.  It looks like your
code is safe for interrupt-context upping - but it
isn't tested for this.  The guys who are going to
support rwsems for architectures which have less
rich atomic ops are going to *need* to know the rules
here.   Let's tell 'em.

If we get agreement on these points, then please
drop a comment in there and I believe we're done.

In another email you said:

> schedule() disabled:
>  reads taken: 585629
>  writes taken: 292997

That's interesting.  With two writer threads and
four reader threads hammering the rwsem, the write/read
grant ratio is 0.5003.  Neat, but I'd have expected
readers to do better.  Perhaps the writer-wakes-multiple
readers stuff isn't happening.  I'll test for this.


There was some discussion regarding contention rates a few
days back.  I did dome instrumentation on the normal semaphores.
The highest contention rate I could observe was during a `make
-j3 bzImage' on dual CPU.  With this workload, down() enters
__down() 2% of the time.  That's a heck of a lot.  It means that
the semaphore slow path is by a very large margin the most
expensive part.

The most contention was on i_sem in real_lookup(), then pipe_sem
and vfork_sem.  ext2_lookup is slow; no news there.

But tweaking the semaphore slowpath won't help here
because when a process slept in __down(), it was always
the only sleeper.

With `dbench 12' the contention rate was much lower
(0.01%).  But when someone slept, there was almost
always another sleeper, so the extra wake_up() at
the end of __down() is worth attention.

I don't have any decent multithread workloads which would
allow me to form an opinion on whether we need to optimise
the slowpath, or to more finely thread the contention areas.
Could be with some workloads the contention is significant and
threading is painful. I'll have another look at it sometime...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] 4th try: i386 rw_semaphores fix

2001-04-12 Thread Andrew Morton

David Howells wrote:
 
 Here's the RW semaphore patch attempt #4. This fixes the bugs that Andrew
 Morton's test cases showed up.
 


It still doesn't compile with gcc-2.91.66 because of the
"#define rwsemdebug(FMT, ...)" thing.  What can we do
about this?

I cooked up a few more tests, generally beat the thing
up.  This code works.  Good stuff.  I'll do some more testing
later this week - put some delays inside the semaphore code
itself to stretch the timing windows, but I don't see how
it can break it.

Manfred says the rep;nop should come *before* the memory
operation, but I don't know if he's been heard...

parisc looks OK.  ppc tried hard, but didn't get it
right.  The spin_lock_irqsave() in up_read/up_write shouldn't be
necessary plus it puts the readers onto the
waitqueue with add_wait_queue_exclusive, so an up_write
will only release one reader.  Looks like they'll end
up with stuck readers.  Other architectures need work.
If they can live with a spinlock in the fastpath, then
the code at http://www.uow.edu.au/~andrewm/linux/rw_semaphore.tar.gz
is bog-simple and works.



Now I think we should specify the API a bit:

* A task is not allowed to down_read() the same rwsem
more than once time.  Otherwise another task can
do a down_write() in the middle and cause deadlock.
(I don't think this has been specified for read_lock()
and write_lock().  There's no such deadlock on the
x86 implementation of these.  Other architectures?
Heaven knows.  They're probably OK).

* up_read() and up_write() may *not* be called from
interrupt context.

The latter is just a suggestion.  It looks like your
code is safe for interrupt-context upping - but it
isn't tested for this.  The guys who are going to
support rwsems for architectures which have less
rich atomic ops are going to *need* to know the rules
here.   Let's tell 'em.

If we get agreement on these points, then please
drop a comment in there and I believe we're done.

In another email you said:

 schedule() disabled:
  reads taken: 585629
  writes taken: 292997

That's interesting.  With two writer threads and
four reader threads hammering the rwsem, the write/read
grant ratio is 0.5003.  Neat, but I'd have expected
readers to do better.  Perhaps the writer-wakes-multiple
readers stuff isn't happening.  I'll test for this.


There was some discussion regarding contention rates a few
days back.  I did dome instrumentation on the normal semaphores.
The highest contention rate I could observe was during a `make
-j3 bzImage' on dual CPU.  With this workload, down() enters
__down() 2% of the time.  That's a heck of a lot.  It means that
the semaphore slow path is by a very large margin the most
expensive part.

The most contention was on i_sem in real_lookup(), then pipe_sem
and vfork_sem.  ext2_lookup is slow; no news there.

But tweaking the semaphore slowpath won't help here
because when a process slept in __down(), it was always
the only sleeper.

With `dbench 12' the contention rate was much lower
(0.01%).  But when someone slept, there was almost
always another sleeper, so the extra wake_up() at
the end of __down() is worth attention.

I don't have any decent multithread workloads which would
allow me to form an opinion on whether we need to optimise
the slowpath, or to more finely thread the contention areas.
Could be with some workloads the contention is significant and
threading is painful. I'll have another look at it sometime...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[PATCH] 4th try: i386 rw_semaphores fix

2001-04-11 Thread David Howells

Here's the RW semaphore patch attempt #4. This fixes the bugs that Andrew
Morton's test cases showed up.

It simplifies the __wake_up_ctx_common() function and adds an iterative clause
to the end of rwsem_wake().

David


diff -uNr linux-2.4.3/arch/i386/config.in linux-rwsem/arch/i386/config.in
--- linux-2.4.3/arch/i386/config.in Thu Apr  5 14:44:04 2001
+++ linux-rwsem/arch/i386/config.in Wed Apr 11 08:38:04 2001
@@ -47,11 +47,13 @@
 
 if [ "$CONFIG_M386" = "y" ]; then
define_bool CONFIG_X86_CMPXCHG n
+   define_bool CONFIG_X86_XADD n
define_int  CONFIG_X86_L1_CACHE_SHIFT 4
 else
define_bool CONFIG_X86_WP_WORKS_OK y
define_bool CONFIG_X86_INVLPG y
define_bool CONFIG_X86_CMPXCHG y
+   define_bool CONFIG_X86_XADD y
define_bool CONFIG_X86_BSWAP y
define_bool CONFIG_X86_POPAD_OK y
 fi
diff -uNr linux-2.4.3/arch/i386/kernel/semaphore.c 
linux-rwsem/arch/i386/kernel/semaphore.c
--- linux-2.4.3/arch/i386/kernel/semaphore.cThu Apr  5 14:38:34 2001
+++ linux-rwsem/arch/i386/kernel/semaphore.cWed Apr 11 22:30:59 2001
@@ -14,7 +14,6 @@
  */
 #include 
 #include 
-
 #include 
 
 /*
@@ -179,6 +178,7 @@
  * value..
  */
 asm(
+".text\n"
 ".align 4\n"
 ".globl __down_failed\n"
 "__down_failed:\n\t"
@@ -193,6 +193,7 @@
 );
 
 asm(
+".text\n"
 ".align 4\n"
 ".globl __down_failed_interruptible\n"
 "__down_failed_interruptible:\n\t"
@@ -205,6 +206,7 @@
 );
 
 asm(
+".text\n"
 ".align 4\n"
 ".globl __down_failed_trylock\n"
 "__down_failed_trylock:\n\t"
@@ -217,6 +219,7 @@
 );
 
 asm(
+".text\n"
 ".align 4\n"
 ".globl __up_wakeup\n"
 "__up_wakeup:\n\t"
@@ -232,197 +235,292 @@
 
 asm(
 "
+.text
 .align 4
 .globl __down_read_failed
 __down_read_failed:
pushl   %edx
pushl   %ecx
-   jnc 2f
-
-3: calldown_read_failed_biased
-
-1: popl%ecx
+   calldown_read_failed
+   popl%ecx
popl%edx
ret
-
-2: calldown_read_failed
-   " LOCK "subl$1,(%eax)
-   jns 1b
-   jnc 2b
-   jmp 3b
 "
 );
 
 asm(
 "
+.text
 .align 4
 .globl __down_write_failed
 __down_write_failed:
pushl   %edx
pushl   %ecx
-   jnc 2f
-
-3: calldown_write_failed_biased
-
-1: popl%ecx
+   calldown_write_failed
+   popl%ecx
popl%edx
ret
-
-2: calldown_write_failed
-   " LOCK "subl$" RW_LOCK_BIAS_STR ",(%eax)
-   jz  1b
-   jnc 2b
-   jmp 3b
 "
 );
 
-struct rw_semaphore *FASTCALL(rwsem_wake_readers(struct rw_semaphore *sem));
-struct rw_semaphore *FASTCALL(rwsem_wake_writer(struct rw_semaphore *sem));
+asm(
+"
+.text
+.align 4
+.globl __rwsem_wake
+__rwsem_wake:
+   pushl   %edx
+   pushl   %ecx
+   callrwsem_wake
+   popl%ecx
+   popl%edx
+   ret
+"
+);
 
-struct rw_semaphore *FASTCALL(down_read_failed_biased(struct rw_semaphore *sem));
-struct rw_semaphore *FASTCALL(down_write_failed_biased(struct rw_semaphore *sem));
+struct rw_semaphore *FASTCALL(rwsem_wake(struct rw_semaphore *sem));
 struct rw_semaphore *FASTCALL(down_read_failed(struct rw_semaphore *sem));
 struct rw_semaphore *FASTCALL(down_write_failed(struct rw_semaphore *sem));
 
-struct rw_semaphore *down_read_failed_biased(struct rw_semaphore *sem)
+/*
+ * implement exchange and add functionality
+ */
+static inline int rwsem_atomic_update(int delta, struct rw_semaphore *sem)
 {
-   struct task_struct *tsk = current;
-   DECLARE_WAITQUEUE(wait, tsk);
-
-   add_wait_queue(>wait, );  /* put ourselves at the head of the 
list */
+   int tmp = delta;
 
-   for (;;) {
-   if (sem->read_bias_granted && xchg(>read_bias_granted, 0))
-   break;
-   set_task_state(tsk, TASK_UNINTERRUPTIBLE);
-   if (!sem->read_bias_granted)
-   schedule();
-   }
-
-   remove_wait_queue(>wait, );
-   tsk->state = TASK_RUNNING;
+#ifndef CONFIG_USING_SPINLOCK_BASED_RWSEM
+   __asm__ __volatile__(
+   LOCK_PREFIX "xadd %0,(%1)"
+   : "+r"(tmp)
+   : "r"(sem)
+   : "memory");
+
+#else
+
+   __asm__ __volatile__(
+   "# beginning rwsem_atomic_update\n\t"
+#ifdef CONFIG_SMP
+LOCK_PREFIX"  decb  "RWSEM_SPINLOCK_OFFSET_STR"(%1)\n\t" /* try to grab the 
+spinlock */
+   "  js3f\n" /* jump if failed */
+   "1:\n\t"
+#endif
+   "  xchgl %0,(%1)\n\t" /* retrieve the old value */
+   "  addl  %0,(%1)\n\t" /* add 0x0001, result in memory */
+#ifdef CONFIG_SMP
+   "  movb  $1,"RWSEM_SPINLOCK_OFFSET_STR"(%1)\n\t" /* release the 
+spinlock */
+#endif
+   ".section .text.lock,\"ax\"\n"
+#ifdef CONFIG_SMP
+   "3:\n\t" /* spin on the spinlock till we get it */
+   "  cmpb  $0,"RWSEM_SPINLOCK_OFFSET_STR"(%1)\n\t"
+   "  rep;nop   

[PATCH] 4th try: i386 rw_semaphores fix

2001-04-11 Thread David Howells

Here's the RW semaphore patch attempt #4. This fixes the bugs that Andrew
Morton's test cases showed up.

It simplifies the __wake_up_ctx_common() function and adds an iterative clause
to the end of rwsem_wake().

David


diff -uNr linux-2.4.3/arch/i386/config.in linux-rwsem/arch/i386/config.in
--- linux-2.4.3/arch/i386/config.in Thu Apr  5 14:44:04 2001
+++ linux-rwsem/arch/i386/config.in Wed Apr 11 08:38:04 2001
@@ -47,11 +47,13 @@
 
 if [ "$CONFIG_M386" = "y" ]; then
define_bool CONFIG_X86_CMPXCHG n
+   define_bool CONFIG_X86_XADD n
define_int  CONFIG_X86_L1_CACHE_SHIFT 4
 else
define_bool CONFIG_X86_WP_WORKS_OK y
define_bool CONFIG_X86_INVLPG y
define_bool CONFIG_X86_CMPXCHG y
+   define_bool CONFIG_X86_XADD y
define_bool CONFIG_X86_BSWAP y
define_bool CONFIG_X86_POPAD_OK y
 fi
diff -uNr linux-2.4.3/arch/i386/kernel/semaphore.c 
linux-rwsem/arch/i386/kernel/semaphore.c
--- linux-2.4.3/arch/i386/kernel/semaphore.cThu Apr  5 14:38:34 2001
+++ linux-rwsem/arch/i386/kernel/semaphore.cWed Apr 11 22:30:59 2001
@@ -14,7 +14,6 @@
  */
 #include linux/config.h
 #include linux/sched.h
-
 #include asm/semaphore.h
 
 /*
@@ -179,6 +178,7 @@
  * value..
  */
 asm(
+".text\n"
 ".align 4\n"
 ".globl __down_failed\n"
 "__down_failed:\n\t"
@@ -193,6 +193,7 @@
 );
 
 asm(
+".text\n"
 ".align 4\n"
 ".globl __down_failed_interruptible\n"
 "__down_failed_interruptible:\n\t"
@@ -205,6 +206,7 @@
 );
 
 asm(
+".text\n"
 ".align 4\n"
 ".globl __down_failed_trylock\n"
 "__down_failed_trylock:\n\t"
@@ -217,6 +219,7 @@
 );
 
 asm(
+".text\n"
 ".align 4\n"
 ".globl __up_wakeup\n"
 "__up_wakeup:\n\t"
@@ -232,197 +235,292 @@
 
 asm(
 "
+.text
 .align 4
 .globl __down_read_failed
 __down_read_failed:
pushl   %edx
pushl   %ecx
-   jnc 2f
-
-3: calldown_read_failed_biased
-
-1: popl%ecx
+   calldown_read_failed
+   popl%ecx
popl%edx
ret
-
-2: calldown_read_failed
-   " LOCK "subl$1,(%eax)
-   jns 1b
-   jnc 2b
-   jmp 3b
 "
 );
 
 asm(
 "
+.text
 .align 4
 .globl __down_write_failed
 __down_write_failed:
pushl   %edx
pushl   %ecx
-   jnc 2f
-
-3: calldown_write_failed_biased
-
-1: popl%ecx
+   calldown_write_failed
+   popl%ecx
popl%edx
ret
-
-2: calldown_write_failed
-   " LOCK "subl$" RW_LOCK_BIAS_STR ",(%eax)
-   jz  1b
-   jnc 2b
-   jmp 3b
 "
 );
 
-struct rw_semaphore *FASTCALL(rwsem_wake_readers(struct rw_semaphore *sem));
-struct rw_semaphore *FASTCALL(rwsem_wake_writer(struct rw_semaphore *sem));
+asm(
+"
+.text
+.align 4
+.globl __rwsem_wake
+__rwsem_wake:
+   pushl   %edx
+   pushl   %ecx
+   callrwsem_wake
+   popl%ecx
+   popl%edx
+   ret
+"
+);
 
-struct rw_semaphore *FASTCALL(down_read_failed_biased(struct rw_semaphore *sem));
-struct rw_semaphore *FASTCALL(down_write_failed_biased(struct rw_semaphore *sem));
+struct rw_semaphore *FASTCALL(rwsem_wake(struct rw_semaphore *sem));
 struct rw_semaphore *FASTCALL(down_read_failed(struct rw_semaphore *sem));
 struct rw_semaphore *FASTCALL(down_write_failed(struct rw_semaphore *sem));
 
-struct rw_semaphore *down_read_failed_biased(struct rw_semaphore *sem)
+/*
+ * implement exchange and add functionality
+ */
+static inline int rwsem_atomic_update(int delta, struct rw_semaphore *sem)
 {
-   struct task_struct *tsk = current;
-   DECLARE_WAITQUEUE(wait, tsk);
-
-   add_wait_queue(sem-wait, wait);  /* put ourselves at the head of the 
list */
+   int tmp = delta;
 
-   for (;;) {
-   if (sem-read_bias_granted  xchg(sem-read_bias_granted, 0))
-   break;
-   set_task_state(tsk, TASK_UNINTERRUPTIBLE);
-   if (!sem-read_bias_granted)
-   schedule();
-   }
-
-   remove_wait_queue(sem-wait, wait);
-   tsk-state = TASK_RUNNING;
+#ifndef CONFIG_USING_SPINLOCK_BASED_RWSEM
+   __asm__ __volatile__(
+   LOCK_PREFIX "xadd %0,(%1)"
+   : "+r"(tmp)
+   : "r"(sem)
+   : "memory");
+
+#else
+
+   __asm__ __volatile__(
+   "# beginning rwsem_atomic_update\n\t"
+#ifdef CONFIG_SMP
+LOCK_PREFIX"  decb  "RWSEM_SPINLOCK_OFFSET_STR"(%1)\n\t" /* try to grab the 
+spinlock */
+   "  js3f\n" /* jump if failed */
+   "1:\n\t"
+#endif
+   "  xchgl %0,(%1)\n\t" /* retrieve the old value */
+   "  addl  %0,(%1)\n\t" /* add 0x0001, result in memory */
+#ifdef CONFIG_SMP
+   "  movb  $1,"RWSEM_SPINLOCK_OFFSET_STR"(%1)\n\t" /* release the 
+spinlock */
+#endif
+   ".section .text.lock,\"ax\"\n"
+#ifdef CONFIG_SMP
+   "3:\n\t" /* spin on the spinlock till we get it */
+   "  cmpb