Date: Mon, 24 Nov 2014 18:36:43 +0100
   From: Rhialto <rhia...@falu.nl>

   I must admit that it is not clear to me how on all these architectures
   the CPU is to know in general that *p depends on p = *pp.

Issuing membar_producer on CPU 1 may flush its store buffer to RAM and
force all other CPUs to discard their caches for everything in its
store buffer.

Then when CPU 2 executes p = *pp, if it sees the new version of *pp,
it must have discarded its cache for *p, so when it executes v = *p,
it will see the value of *p that CPU 1 put in there before assigning
*pp = p.

That doesn't address control-dependent loads, which is why even CPUs
that do the above may require membar_consumer for control-dependent
loads.  Specifically, if CPU 1 does

        value[i] = 5;
        membar_producer();
        ok[i] = 1;

then because there's no data dependency, CPU 2 might still reorder

        if (ok[i])
                v = value[i];

into

        tmp = value[i];
        if (ok[i])
                v = tmp;

just by executing instructions out of order, with no cache involved.

Reply via email to