Re: [patch] 2.4.4 alpha semaphores optimization

2001-05-06 Thread Ivan Kokshaysky
> > If you do > > (perhaps to coordinate with devices) then the barriers are required. > > For IO space access mb's are required, but ll/sc are of no use, AFAIK. Ugh. You are right, of course. I forgot that drivers are also using atomic.h, and the intelligent device could be counted as another

Re: [patch] 2.4.4 alpha semaphores optimization

2001-05-06 Thread Ivan Kokshaysky
If you do (perhaps to coordinate with devices) then the barriers are required. For IO space access mb's are required, but ll/sc are of no use, AFAIK. Ugh. You are right, of course. I forgot that drivers are also using atomic.h, and the intelligent device could be counted as another CPU to

Re: [patch] 2.4.4 alpha semaphores optimization

2001-05-05 Thread Ivan Kokshaysky
On Fri, May 04, 2001 at 02:13:18PM -0700, Richard Henderson wrote: > Eh? Would you give me an example that isn't working properly? Sure. bar.c: - extern void rarely_executed_code(void); static inline void foo_no_be(void) { int ret; __asm__ __volatile__("nop\n":

Re: [patch] 2.4.4 alpha semaphores optimization

2001-05-05 Thread Ivan Kokshaysky
On Fri, May 04, 2001 at 02:12:40PM -0700, Richard Henderson wrote: > > - removed some mb's for non-SMP > > This isn't correct. Either you need atomic updates or you don't. > If you don't, then you shouldn't be using ll/sc at all. I don't think so. On a single CPU system we need atomic updates

Re: [patch] 2.4.4 alpha semaphores optimization

2001-05-05 Thread Ivan Kokshaysky
On Fri, May 04, 2001 at 02:12:40PM -0700, Richard Henderson wrote: - removed some mb's for non-SMP This isn't correct. Either you need atomic updates or you don't. If you don't, then you shouldn't be using ll/sc at all. I don't think so. On a single CPU system we need atomic updates to

Re: [patch] 2.4.4 alpha semaphores optimization

2001-05-05 Thread Ivan Kokshaysky
On Fri, May 04, 2001 at 02:13:18PM -0700, Richard Henderson wrote: Eh? Would you give me an example that isn't working properly? Sure. bar.c: - extern void rarely_executed_code(void); static inline void foo_no_be(void) { int ret; __asm__ __volatile__(nop\n: =r

Re: [patch] 2.4.4 alpha semaphores optimization

2001-05-04 Thread Richard Henderson
On Thu, May 03, 2001 at 07:47:47PM +0400, Ivan Kokshaysky wrote: > Initially I tried to use __builtin_expect in the rwsem.h, but found > that it doesn't help at all in the small inline functions - it works > as expected only in a reasonably large block of code. Eh? Would you give me an example

Re: [patch] 2.4.4 alpha semaphores optimization

2001-05-04 Thread Richard Henderson
On Thu, May 03, 2001 at 07:47:47PM +0400, Ivan Kokshaysky wrote: > - removed some mb's for non-SMP This isn't correct. Either you need atomic updates or you don't. If you don't, then you shouldn't be using ll/sc at all. If you do (perhaps to coordinate with devices) then the barriers are

Re: [patch] 2.4.4 alpha semaphores optimization

2001-05-04 Thread Andrea Arcangeli
On Fri, May 04, 2001 at 09:02:33PM +0400, Ivan Kokshaysky wrote: > But I can't imagine how this "feature" could be useful in a real life :-) It will be required by the time we can fork more than 2^16 tasks (which I'm wondering if it could be just the case if you use CLONE_PID as root, I didn't

Re: [patch] 2.4.4 alpha semaphores optimization

2001-05-04 Thread Ivan Kokshaysky
On Fri, May 04, 2001 at 04:33:59PM +0200, Andrea Arcangeli wrote: > the 2^16 limit is not a per-user limit it is a global one so the max > user process ulimit is irrelevant. > > Only the number of pid and the max number of tasks supported by the > architecture is a relevant limit for this.

Re: [patch] 2.4.4 alpha semaphores optimization

2001-05-04 Thread Ivan Kokshaysky
On Fri, May 04, 2001 at 10:22:53AM +0100, David Howells wrote: > I don't know whether it will (a) compile, or (b) work... I don't have an alpha > to play with. Neither (a) nor (b) ;-) Corrected asm-alpha/rwsem.h attached. Also small fix for lib/rwsem.c -- RWSEM_WAITING_BIAS-RWSEM_ACTIVE_BIAS

Re: [patch] 2.4.4 alpha semaphores optimization

2001-05-04 Thread Andrea Arcangeli
On Fri, May 04, 2001 at 01:15:28PM +0400, Ivan Kokshaysky wrote: > However, there are 3 reasons why I prefer 16-bit counters: I assume you mean 32bit counter. (that gives max 2^16 sleepers) > a. "max user processes" ulimit is much lower than 64K anyway; the 2^16 limit is not a per-user limit

Re: [patch] 2.4.4 alpha semaphores optimization

2001-05-04 Thread Ivan Kokshaysky
On Fri, May 04, 2001 at 10:22:53AM +0100, David Howells wrote: > Hello Ivan, Hello David! > I don't know whether it will (a) compile, or (b) work... I don't have an alpha > to play with. It looks ok at a first glance, I can try it today. > I also don't know the alpha function calling

Re: [patch] 2.4.4 alpha semaphores optimization

2001-05-04 Thread David Howells
Hello Ivan, One reason I picked "signed long" as the count type in the lib/rwsem.c is that this would be 64 bits on a 64-bit arch such as the alpha. So I've taken your idea for include/asm-alpha/rwsem.h and modified it a little. You'll find it attached at the bottom. I don't know whether it

Re: [patch] 2.4.4 alpha semaphores optimization

2001-05-04 Thread Ivan Kokshaysky
On Thu, May 03, 2001 at 07:28:48PM +0200, Andrea Arcangeli wrote: > I'd love if you could port it on top of this one and to fix it so that > it can handle up to 2^32 sleepers and not only 2^16 like we have to do > on the 32bit archs to get good performance: > >

Re: [patch] 2.4.4 alpha semaphores optimization

2001-05-04 Thread Ivan Kokshaysky
On Thu, May 03, 2001 at 07:28:48PM +0200, Andrea Arcangeli wrote: I'd love if you could port it on top of this one and to fix it so that it can handle up to 2^32 sleepers and not only 2^16 like we have to do on the 32bit archs to get good performance:

Re: [patch] 2.4.4 alpha semaphores optimization

2001-05-04 Thread David Howells
Hello Ivan, One reason I picked signed long as the count type in the lib/rwsem.c is that this would be 64 bits on a 64-bit arch such as the alpha. So I've taken your idea for include/asm-alpha/rwsem.h and modified it a little. You'll find it attached at the bottom. I don't know whether it will

Re: [patch] 2.4.4 alpha semaphores optimization

2001-05-04 Thread Ivan Kokshaysky
On Fri, May 04, 2001 at 10:22:53AM +0100, David Howells wrote: Hello Ivan, Hello David! I don't know whether it will (a) compile, or (b) work... I don't have an alpha to play with. It looks ok at a first glance, I can try it today. I also don't know the alpha function calling convention,

Re: [patch] 2.4.4 alpha semaphores optimization

2001-05-04 Thread Andrea Arcangeli
On Fri, May 04, 2001 at 01:15:28PM +0400, Ivan Kokshaysky wrote: However, there are 3 reasons why I prefer 16-bit counters: I assume you mean 32bit counter. (that gives max 2^16 sleepers) a. max user processes ulimit is much lower than 64K anyway; the 2^16 limit is not a per-user limit it is

Re: [patch] 2.4.4 alpha semaphores optimization

2001-05-04 Thread Ivan Kokshaysky
On Fri, May 04, 2001 at 10:22:53AM +0100, David Howells wrote: I don't know whether it will (a) compile, or (b) work... I don't have an alpha to play with. Neither (a) nor (b) ;-) Corrected asm-alpha/rwsem.h attached. Also small fix for lib/rwsem.c -- RWSEM_WAITING_BIAS-RWSEM_ACTIVE_BIAS

Re: [patch] 2.4.4 alpha semaphores optimization

2001-05-04 Thread Ivan Kokshaysky
On Fri, May 04, 2001 at 04:33:59PM +0200, Andrea Arcangeli wrote: the 2^16 limit is not a per-user limit it is a global one so the max user process ulimit is irrelevant. Only the number of pid and the max number of tasks supported by the architecture is a relevant limit for this. Thanks

Re: [patch] 2.4.4 alpha semaphores optimization

2001-05-04 Thread Andrea Arcangeli
On Fri, May 04, 2001 at 09:02:33PM +0400, Ivan Kokshaysky wrote: But I can't imagine how this feature could be useful in a real life :-) It will be required by the time we can fork more than 2^16 tasks (which I'm wondering if it could be just the case if you use CLONE_PID as root, I didn't

Re: [patch] 2.4.4 alpha semaphores optimization

2001-05-04 Thread Richard Henderson
On Thu, May 03, 2001 at 07:47:47PM +0400, Ivan Kokshaysky wrote: - removed some mb's for non-SMP This isn't correct. Either you need atomic updates or you don't. If you don't, then you shouldn't be using ll/sc at all. If you do (perhaps to coordinate with devices) then the barriers are

Re: [patch] 2.4.4 alpha semaphores optimization

2001-05-04 Thread Richard Henderson
On Thu, May 03, 2001 at 07:47:47PM +0400, Ivan Kokshaysky wrote: Initially I tried to use __builtin_expect in the rwsem.h, but found that it doesn't help at all in the small inline functions - it works as expected only in a reasonably large block of code. Eh? Would you give me an example

Re: [patch] 2.4.4 alpha semaphores optimization

2001-05-03 Thread Andrea Arcangeli
On Thu, May 03, 2001 at 07:47:47PM +0400, Ivan Kokshaysky wrote: > Initially I tried to use __builtin_expect in the rwsem.h, but found > that it doesn't help at all in the small inline functions - it works > as expected only in a reasonably large block of code. Converting these > functions into

[patch] 2.4.4 alpha semaphores optimization

2001-05-03 Thread Ivan Kokshaysky
Initially I tried to use __builtin_expect in the rwsem.h, but found that it doesn't help at all in the small inline functions - it works as expected only in a reasonably large block of code. Converting these functions into the macros won't help, as callers are inline functions also. On the other

[patch] 2.4.4 alpha semaphores optimization

2001-05-03 Thread Ivan Kokshaysky
Initially I tried to use __builtin_expect in the rwsem.h, but found that it doesn't help at all in the small inline functions - it works as expected only in a reasonably large block of code. Converting these functions into the macros won't help, as callers are inline functions also. On the other

Re: [patch] 2.4.4 alpha semaphores optimization

2001-05-03 Thread Andrea Arcangeli
On Thu, May 03, 2001 at 07:47:47PM +0400, Ivan Kokshaysky wrote: Initially I tried to use __builtin_expect in the rwsem.h, but found that it doesn't help at all in the small inline functions - it works as expected only in a reasonably large block of code. Converting these functions into the