rw_downgrade/tryupgrade semantics

Taylor R Campbell Wed, 22 Feb 2023 09:41:10 -0800

Questions about rwlock(9) memory ordering semantics:

1. Is rw_tryupgrade an acquire operation, so that all prior reader
   critical sections happen-before anything sequenced after it?


   Or does it only order stores issued by the caller, while loads in
   the following write section can be speculatively executed in the
   preceding read section?

2. Is rw_downgrade a release operation, so that whatever is sequenced
   before it happens-before any subsequent reader critical section?

   Or does it only order stores issued by the caller, while loads in
   the preceding write section can be deferred until the following
   reader section?

The man page isn't clear on this.  Here's an example to illustrate the
difference.

Consider the following data structure:

struct foo {
        krwlock_t lock;
        volatile unsigned nreaders;
        ...
};

Assume every reader increments nreaders on entry, and decrements
nreaders on exit:

        rw_enter(&foo->lock, RW_READER);
        atomic_inc_uint(&foo->nreaders);
        ...
        atomic_dec_uint(&foo->nreaders);
        rw_exit(&foo->lock);

Under that premise, a writer always see nreaders as zero.  So the
following assertions should never fail:

        rw_enter(&foo->lock, RW_WRITER);
        KASSERT(atomic_load_relaxed(&foo->nreaders) == 0);
        ...
        KASSERT(atomic_load_relaxed(&foo->nreaders) == 0);
        rw_exit(&foo->lock);

I believe this is guaranteed as rwlock(9) is currently implemented.
(Whether it is _intended_ to be guaranteed, I'm not sure.)

However, very similar assertions may fail, in principle, when using
rw_tryupgrade or rw_downgrade, as rwlock(9) is currently implemented:

        /* upgrade reader to writer */
        rw_enter(&foo->lock, RW_READER);
        atomic_inc_uint(&foo->nreaders);
        ...
        atomic_dec_uint(&foo->nreaders);
        if (rw_tryupgrade(&foo->nreaders)) {
                /* we're a writer now */
                KASSERT(atomic_load_relaxed(&foo->nreaders) == 0);
                ...
        }
        rw_exit(&foo->lock);

        /* downgrade writer to reader */
        rw_enter(&foo->lock, RW_WRITER);
        ...
        unsigned nreaders = atomic_load_relaxed(&foo->nreaders);
        rw_downgrade(&foo->lock);
        rw_exit(&foo->lock);

        KASSERT(nreaders == 0);

The reason these can fail is that rw_tryupgrade and rw_downgrade only
issue store-before-store barriers (membar_producer), putting no
restrictions on when the CPU can load foo->nreaders relative to
rw_tryupgrade or rw_downgrade.

So, as rwlock(9) is currently implemented:

- In the upgrade example, the CPU can speculatively load foo->nreaders
  before rw_tryupgrade, and thereby witness a nonzero value from
  before the last reader has decremented it to zero and exited the
  read section (quickly enough for rw_tryupgrade to succeed, of
  course).

- In the downgrade example, the CPU can defer the load of
  foo->nreaders past rw_downgrade, and thereby witness a nonzero value
  from one of the readers that rw_downgrade allows to begin executing.

Even though the reader is not technically _just_ a `reader', in that
it issues stores as well as loads, I think these possibilities are
extremely counterintuitive and possibly dangerous, especially for a
relatively high-level API like rwlock(9) that prioritizes ease of use
over maximal parallelism -- merely taking a read lock can lead to
shared memory contention, which is why we also have harder-to-use but
cheaper options like pserialize(9), psref(9), and localcount(9).

I'm open to other opinions; perhaps it is intended that loads in a
writer can bleed into an adjacent read section, and that readers
aren't supposed to issue stores anyway, and perhaps there's a good
reason for all this.

But absent a good reason, I think rw_downgrade should be a release
operation, and rw_tryupgrade should be an acquire operation, just like
rw_exit and rw_enter.  Which means they need to use membar_release and
membar_acquire inside, not membar_producer (and we need to issue
pullups).

rw_downgrade/tryupgrade semantics

Reply via email to