On Wed, 2024-03-27 at 08:40 +0100, Jan Beulich wrote:
...

> > > > > > +/* This is required to provide a full barrier on success.
> > > > > > */
> > > > > > +static inline int atomic_add_unless(atomic_t *v, int a,
> > > > > > int u)
> > > > > > +{
> > > > > > +       int prev, rc;
> > > > > > +
> > > > > > +    asm volatile (
> > > > > > +        "0: lr.w     %[p],  %[c]\n"
> > > > > > +        "   beq      %[p],  %[u], 1f\n"
> > > > > > +        "   add      %[rc], %[p], %[a]\n"
> > > > > > +        "   sc.w.rl  %[rc], %[rc], %[c]\n"
> > > > > > +        "   bnez     %[rc], 0b\n"
> > > > > > +        RISCV_FULL_BARRIER
> > > > > 
> > > > > With this and no .aq on the load, why the .rl on the store?
> > > > It is something that LKMM requires [1].
> > > > 
> > > > This is not fully clear to me what is so specific in LKMM, but
> > > > accoring
> > > > to the spec:
> > > >    Ordering Annotation Fence-based Equivalent
> > > >    l{b|h|w|d|r}.aq     l{b|h|w|d|r}; fence r,rw
> > > >    l{b|h|w|d|r}.aqrl   fence rw,rw; l{b|h|w|d|r}; fence r,rw
> > > >    s{b|h|w|d|c}.rl     fence rw,w; s{b|h|w|d|c}
> > > >    s{b|h|w|d|c}.aqrl   fence rw,w; s{b|h|w|d|c}
> > > >    amo<op>.aq          amo<op>; fence r,rw
> > > >    amo<op>.rl          fence rw,w; amo<op>
> > > >    amo<op>.aqrl        fence rw,rw; amo<op>; fence rw,rw
> > > >    Table 2.2: Mappings from .aq and/or .rl to fence-based
> > > > equivalents.
> > > >    An alternative mapping places a fence rw,rw after the
> > > > existing 
> > > >    s{b|h|w|d|c} mapping rather than at the front of the
> > > >    l{b|h|w|d|r} mapping.
> > > 
> > > I'm afraid I can't spot the specific case in this table. None of
> > > the
> > > stores in the right column have a .rl suffix.
> > Yes, it is expected.
> > 
> > I am reading this table as (f.e.) amo<op>.rl is an equivalent of
> > fence
> > rw,w; amo<op>. (without .rl) 
> 
> In which case: How does quoting the table answer my original
> question?
Agree, it is starting to be confusing, so let me give an answer to your
question below.

> 
> > > >    It is also safe to translate any .aq, .rl, or .aqrl
> > > > annotation
> > > > into
> > > >    the fence-based snippets of
> > > >    Table 2.2. These can also be used as a legal implementation
> > > > of
> > > >    l{b|h|w|d} or s{b|h|w|d} pseu-
> > > >    doinstructions for as long as those instructions are not
> > > > added
> > > > to
> > > >    the ISA.
> > > > 
> > > > So according to the spec, it should be:
> > > >  sc.w ...
> > > >  RISCV_FULL_BARRIER.
> > > > 
> > > > Considering [1] and how this code looks before, it seems to me
> > > > that
> > > > it
> > > > is safe to use lr.w.aq/sc.w.rl here or an fence equivalent.
> > > 
> > > Here you say "or". Then why dos the code use sc.?.rl _and_ a
> > > fence?
> > I confused this line with amo<op>.aqrl, so based on the table 2.2
> > above
> > s{b|h|w|d|c}.aqrl is transformed to "fence rw,w; s{b|h|w|d|c}", but
> > Linux kernel decided to strengthen it with full barrier:
> >    -              "0:\n\t"
> >    -              "lr.w.aqrl  %[p],  %[c]\n\t"
> >    -              "beq        %[p],  %[u], 1f\n\t"
> >    -              "add       %[rc],  %[p], %[a]\n\t"
> >    -              "sc.w.aqrl %[rc], %[rc], %[c]\n\t"
> >    -              "bnez      %[rc], 0b\n\t"
> >    -              "1:"
> >    +               "0:     lr.w     %[p],  %[c]\n"
> >    +               "       beq      %[p],  %[u], 1f\n"
> >    +               "       add      %[rc], %[p], %[a]\n"
> >    +               "       sc.w.rl  %[rc], %[rc], %[c]\n"
> >    +               "       bnez     %[rc], 0b\n"
> >    +               "       fence    rw, rw\n"
> >    +               "1:\n"
> > As they have the following issue:
> >    implementations of atomics such as atomic_cmpxchg() and
> >    atomic_add_unless() rely on LR/SC pairs,
> >    which do not give full-ordering with .aqrl; for example, current
> >    implementations allow the "lr-sc-aqrl-pair-vs-full-barrier" test
> >    below to end up with the state indicated in the "exists" clause.
> 
> Is that really "current implementations", not "the abstract model"?
> If so, the use of an explicit fence would be more like a workaround
> (and would hence want commenting to that effect).
> 
> In neither case can I see my original question answered: Why the .rl
> on the store when there is a full fence later?
The good explanation for that was provided in the commit addressing a
similar issue for ARM64 [
https://patchwork.kernel.org/project/linux-arm-kernel/patch/1391516953-14541-1-git-send-email-will.dea...@arm.com/
].
The same holds true for RISC-V since ARM also employs WMO.

I would also like to mention another point, as I indicated in another
email thread
[ https://lists.xen.org/archives/html/xen-devel/2024-03/msg01582.html ]
, that now this fence can be omitted and .aqrl can be used instead.

This was confirmed by Dan (the author of the RVWMO spec)
[https://lore.kernel.org/linux-riscv/41e01514-74ca-84f2-f5cc-2645c444f...@nvidia.com/
]

I hope this addresses your original question. Does it?

~ Oleksii


Reply via email to