On Thu, 2024-03-21 at 13:07 +0100, Jan Beulich wrote:
> On 15.03.2024 19:06, Oleksii Kurochko wrote:
> > The header was taken from Linux kernl 6.4.0-rc1.
> > 
> > Addionally, were updated:
> > * add emulation of {cmp}xchg for 1/2 byte types using 32-bit atomic
> >   access.
> > * replace tabs with spaces
> > * replace __* variale with *__
> > * introduce generic version of xchg_* and cmpxchg_*.
> > * drop {cmp}xchg{release,relaxed,acquire} as Xen doesn't use them
> 
> With this, ...
> 
> > * drop barries and use instruction suffixices instead ( .aq, .rl,
> > .aqrl )
> > 
> > Implementation of 4- and 8-byte cases were updated according to the
> > spec:
> > ```
> >               ....
> > Linux Construct         RVWMO AMO Mapping
> > atomic <op> relaxed     amo<op>.{w|d}
> > atomic <op> acquire     amo<op>.{w|d}.aq
> > atomic <op> release     amo<op>.{w|d}.rl
> > atomic <op>             amo<op>.{w|d}.aqrl
> > Linux Construct         RVWMO LR/SC Mapping
> > atomic <op> relaxed     loop: lr.{w|d}; <op>; sc.{w|d}; bnez loop
> > atomic <op> acquire     loop: lr.{w|d}.aq; <op>; sc.{w|d}; bnez
> > loop
> > atomic <op> release     loop: lr.{w|d}; <op>; sc.{w|d}.aqrl∗ ; bnez
> > loop OR
> >                         fence.tso; loop: lr.{w|d}; <op>; sc.{w|d}∗
> > ; bnez loop
> > atomic <op>             loop: lr.{w|d}.aq; <op>; sc.{w|d}.aqrl;
> > bnez loop
> > 
> > Table A.5: Mappings from Linux memory primitives to RISC-V
> > primitives
> > 
> > ```
> 
> ... I consider quoting this table in full, without any further
> remarks, as
> confusing: Three of the lines each are inapplicable now, aiui.
I'll shorten the table then.

> 
> Further what are the two * telling us? Quite likely they aren't there
> just
> accidentally.
> 
> Finally, why sc.{w|d}.aqrl when in principle one would expect just
> sc.{w|d}.rl?
Because according to the last line of table A.5:
    atomic <op>             loop: lr.{w|d}.aq; <op>; sc.{w|d}.aqrl;
Here it is used sc.{w|d}.aqrl form, so I decided to stick to the what
is mentioned in the table.

> 
> > +})
> > +
> > +static always_inline unsigned long __xchg(volatile void *ptr,
> > unsigned long new, int size)
> > +{
> > +    unsigned long ret;
> > +
> > +    switch ( size )
> > +    {
> > +    case 1:
> > +        ret = emulate_xchg_1_2((volatile uint8_t *)ptr, new,
> > ".aq", ".aqrl");
> > +        break;
> > +    case 2:
> > +        ret = emulate_xchg_1_2((volatile uint16_t *)ptr, new,
> > ".aq", ".aqrl");
> > +        break;
> > +    case 4:
> > +        __amoswap_generic((volatile uint32_t *)ptr, new, ret,
> > ".w.aqrl");
> > +        break;
> > +#ifndef CONFIG_32BIT
> 
> There's no 32BIT Kconfig symbol; all we have is a 64BIT one.
I meant here CONFIG_RISCV_32.

> 
> > +    case 8:
> > +        __amoswap_generic((volatile uint64_t *)ptr, new, ret,
> > ".d.aqrl");
> > +        break;
> > +#endif
> > +    default:
> > +        STATIC_ASSERT_UNREACHABLE();
> > +    }
> > +
> > +    return ret;
> > +}
> > +
> > +#define xchg(ptr, x) \
> > +({ \
> > +    __typeof__(*(ptr)) n_ = (x); \
> > +    (__typeof__(*(ptr))) \
> > +        __xchg((ptr), (unsigned long)(n_), sizeof(*(ptr))); \
> 
> Nit: While excess parentheses "only" harm readability, they would
> nevertheless better be omitted (here: the first argument passed).
> 
> > +})
> > +
> > +#define __generic_cmpxchg(ptr, old, new, ret, lr_sfx, sc_sfx)      \
> > + ({ \
> > +    register unsigned int rc; \
> 
> Nit: We don't normally use "register", unless accompanied by asm()
> tying
> a variable to a specific one.
> 
> > +    __typeof__(*(ptr)) old__ = (__typeof__(*(ptr)))(old); \
> > +    __typeof__(*(ptr)) new__ = (__typeof__(*(ptr)))(new); \
> 
> The casts aren't very nice to have here; I take they're needed for
> cmpxchg_ptr() to compile?
Not really, I have not faced an compilation issue.
The reason why it was added is that lr instruction places the sign-
extended value in destination register, but if not to do cast value for
old and new were generated without sign extension, so, for example:
   u32= 0xbbbbbbbb;
   cmpxchg(&u32, 0xbbbbbbbb, 0xCCCCCCCC), u32);
Will fail because after:
       "0: lr" lr_sfx " %0, %2\n" 
in %0 we will have 0xFFFFFFFFBBBBBBBB, but in %3 we will have
0xBBBBBBBB, so
       bne  %0, %z3, 1f\n"
%0 and %3 are always inequal in case when the most significat bit of
value read from %2 has 1.

But now I realized that it would be better just to use a mask and not
be dependent from compiler code generation, so it would be better to in
the following way ( of course, the macros should be updated accordingly
to remarks you give me ):
   #define __generic_cmpxchg(ptr, old, new, ret, lr_sfx, sc_sfx)        \
    ({ \
       register unsigned int rc; \
       unsigned long mask = GENMASK(((sizeof(*(ptr))) * BITS_PER_BYTE)
   - 1, 0); \
       asm volatile( \
           "0: lr" lr_sfx " %0, %2\n" \
           "   and  %0, %0, %z[mask]\n" \
           "   bne  %0, %z3, 1f\n" \
           "   sc" sc_sfx " %1, %z4, %2\n" \
           "   bnez %1, 0b\n" \
           "1:\n" \
           : "=&r" (ret), "=&r" (rc), "+A" (*ptr) \
           : "rJ" (old), "rJ" (new), [mask] "rJ" (mask)  \
           : "memory"); \
    })
   
> 
> > +    asm volatile( \
> 
> Nit: Missing blank once again. Would be really nice if you could go
> through and sort this uniformly for the series.
> 
> > +        "0: lr" lr_sfx " %0, %2\n" \
> > +        "   bne  %0, %z3, 1f\n" \
> > +        "   sc" sc_sfx " %1, %z4, %2\n" \
> > +        "   bnez %1, 0b\n" \
> > +        "1:\n" \
> > +        : "=&r" (ret), "=&r" (rc), "+A" (*ptr) \
> > +        : "rJ" (old__), "rJ" (new__) \
> 
> Please could I talk you into using named operands here, too?
Sure, I will add them.

~ Oleksii


Reply via email to