On 19 Nov, 2014, at 01:53 , Taylor R Campbell 
<campbell+netbsd-tech-k...@mumble.net> wrote:
>   Date: Tue, 18 Nov 2014 23:13:34 +0900
>   From: Masao Uebayashi <uebay...@gmail.com>
> 
>   In pserialize_perform(), context switches are made on all CPUs.  After
>   pserialize_perform(), all readers on all CPUs see the update data.
>   The data old data item is safely destroyed.
> 
>   In the TAILQ case, where readers iterate a list by TAILQ_FOREACH(),
>   TAILQ_REMOVE() is safely used as the update operation, because:
> 
>   - Readers only see tqe_next in TAILQ_FOREACH(), and
>   - Pointer assignment (done in TAILQ_REMOVE()) is atomic.
> 
>   If this is correct, pserialize(9) should be updated to be clearer;
>   probably this TAILQ example is worth being added there.
> 
> That is correct.
> 
> The one tricky detail is that after a reader has fetched the tqe_next
> pointer, it must issue a membar_consumer before dereferencing the
> pointer: otherwise there is no guarantee about the order in which the
> CPU will fetch the tqe_next pointer and its contents (which it may
> have cached).

I don't think it is correct to use a membar_consumer().  The reads done
by dereferencing tqe_next are dependent reads, as in they cannot be
done until the read of tqe_next has completed.  This is sufficient for
the reader to observe the write ordering, without a read barrier, on
every machine except a multiprocessor DEC Alpha.  See, e.g., section 7.1
of this:

    http://www.puppetmastertrading.com/images/hwViewForSwHackers.pdf

Since most machines don't need any barrier here it would be extremely
inefficient to add a membar_consumer() since that would make all machines
pay for the idiosyncrasies unique to the DEC Alpha.

So the question is, is concurrent reader code we write expected to run
on a multiprocessor DEC Alpha?  If it is then there is a missing memory
barrier primitive that is required for dependent reads, one that does
something on a DEC Alpha and nothing on any other machine.  If code
like this doesn't need to run on a multiprocessor DEC Alpha, however,
then the code is correct with no read barrier.

Dennis Ferguson

Reply via email to