On Wed, Oct 11, 2017 at 03:15:42PM +0200, Martin Pieuchot wrote:
> On 10/10/17(Tue) 20:19, Mateusz Guzik wrote:
> > On Tue, Oct 10, 2017 at 10:15:48AM +0200, Martin Pieuchot wrote:
> > > Hello Mateusz,
> > >
> > > On 09/10/17(Mon) 21:43, Mateusz Guzik wrote:
> > > > I was looking at rw lock code out of curiosity and noticed you always do
> > > > membar_enter which on MP-enabled amd64 kernel translates to mfence.
> > > > This makes the entire business a little bit slower.
> > > >
> > > > Interestingly you already have relevant macros for amd64:
> > > > #define membar_enter_after_atomic() __membar("")
> > > > #define membar_exit_before_atomic() __membar("")
> > >
> > > If you look at the history, you'll see that theses macro didn't exist
> > > when membar_* where added to rw locks.
> > >
> > 
> > I know. Addition of such a macro sounds like an accelent opportunity to
> > examine existing users. I figured rw locks were ommitted by accident
> > as the original posting explicitly mentions mutexes.
> > 
> > https://marc.info/?l=openbsd-tech&m=149555411827749&w=2
> > 
> > > > And there is even a default variant for archs which don't provide one.
> > > > I guess the switch should be easy.
> > >
> > > Then please do it :)  At lot of stuff is easy on OpenBSD but we are not
> > > enough.
> > >
> > > Do it, test it, explain it, get oks and commit it 8)
> > >
> > 
> > As noted earlier the patch is trivial. I have no easy means to even
> > compile-test it though as I'm not using OpenBSD. Only had a look out of
> > curiosity.
> 
> Why don't you start using it?  Testing is what makes the difference.
> Many of us have ideas of how to improve the kernel but we're lacking
> the time to test & debug them all.
> 

I'm playing with concurrency as a hobby and smpifying single-threaded
code is above the pain threshold I'm willing to accept in my spare time.

> Sometimes a "correct" change exposes a bug and we try not to break
> -current, so we cannot afford to commit a non-tested diff.
> 

I definitely don't advocate for just committing stuff. I do state though
that there should be no binary change for archs other than i386 and
amd64 (== nothing to test there). Given the nature of the change for
these two it really seemed it just fell through the cracks earlier and I
just wanted to point it out as perhaps someone will find cycles to pick
it up and test on relevant platforms.

> > Nonetheless, here is the diff which can be split in 2. One part deals
> > with barriers, another one cleans up rw_status.
> >
> > [...]
> >
> > Read from the lock only once in rw_status. The lock word is marked as
> > volatile which forces re-read on each access. There is no correctness
> > issue here, but the assembly is worse than it needs to be and in case
> > of contention this adds avoidable cache traffic.
> 
> Here's the diff to cleans rw_status.  The difference on amd64 with
> clang 4.0  is the following:
> 
>       48 8b 0f                mov    (%rdi),%rcx
>       f6 c1 04                test   $0x4,%cl
>       75 0c                   jne    738 <rrw_status+0x18>
>       31 c0                   xor    %eax,%eax
> -     48 83 3f 00             cmpq   $0x0,(%rdi)
> +     48 85 c9                test   %rcx,%rcx
> 
> ok?


There should be another change past rrw_status+0x18:
ffffffff812557e8:       65 48 8b 04 25 08 00    mov    %gs:0x8,%rax
ffffffff812557ef:       00 00
ffffffff812557f1:       48 8b 80 90 02 00 00    mov    0x290(%rax),%rax
ffffffff812557f8:       48 33 07                xor    (%rdi),%rax

> 
> Index: kern/kern_rwlock.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_rwlock.c,v
> retrieving revision 1.30
> diff -u -p -r1.30 kern_rwlock.c
> --- kern/kern_rwlock.c        12 Aug 2017 23:27:44 -0000      1.30
> +++ kern/kern_rwlock.c        11 Oct 2017 12:59:19 -0000
> @@ -312,13 +312,15 @@ _rw_exit(struct rwlock *rwl LOCK_FL_VARS
>  int
>  rw_status(struct rwlock *rwl)
>  {
> -     if (rwl->rwl_owner & RWLOCK_WRLOCK) {
> -             if (RW_PROC(curproc) == RW_PROC(rwl->rwl_owner))
> +     unsigned long owner = rwl->rwl_owner;
> +
> +     if (owner & RWLOCK_WRLOCK) {
> +             if (RW_PROC(curproc) == RW_PROC(owner))
>                       return RW_WRITE;
>               else
>                       return RW_WRITE_OTHER;
>       }
> -     if (rwl->rwl_owner)
> +     if (owner)
>               return RW_READ;
>       return (0);
>  }

-- 
Mateusz Guzik

Reply via email to