On 22.03.2024 11:32, Oleksii wrote: > On Thu, 2024-03-21 at 13:07 +0100, Jan Beulich wrote: >> On 15.03.2024 19:06, Oleksii Kurochko wrote: >>> The header was taken from Linux kernl 6.4.0-rc1. >>> >>> Addionally, were updated: >>> * add emulation of {cmp}xchg for 1/2 byte types using 32-bit atomic >>> access. >>> * replace tabs with spaces >>> * replace __* variale with *__ >>> * introduce generic version of xchg_* and cmpxchg_*. >>> * drop {cmp}xchg{release,relaxed,acquire} as Xen doesn't use them >> >> With this, ... >> >>> * drop barries and use instruction suffixices instead ( .aq, .rl, >>> .aqrl ) >>> >>> Implementation of 4- and 8-byte cases were updated according to the >>> spec: >>> ``` >>> .... >>> Linux Construct RVWMO AMO Mapping >>> atomic <op> relaxed amo<op>.{w|d} >>> atomic <op> acquire amo<op>.{w|d}.aq >>> atomic <op> release amo<op>.{w|d}.rl >>> atomic <op> amo<op>.{w|d}.aqrl >>> Linux Construct RVWMO LR/SC Mapping >>> atomic <op> relaxed loop: lr.{w|d}; <op>; sc.{w|d}; bnez loop >>> atomic <op> acquire loop: lr.{w|d}.aq; <op>; sc.{w|d}; bnez >>> loop >>> atomic <op> release loop: lr.{w|d}; <op>; sc.{w|d}.aqrl∗ ; bnez >>> loop OR >>> fence.tso; loop: lr.{w|d}; <op>; sc.{w|d}∗ >>> ; bnez loop >>> atomic <op> loop: lr.{w|d}.aq; <op>; sc.{w|d}.aqrl; >>> bnez loop >>> >>> Table A.5: Mappings from Linux memory primitives to RISC-V >>> primitives >>> >>> ``` >> >> ... I consider quoting this table in full, without any further >> remarks, as >> confusing: Three of the lines each are inapplicable now, aiui. > I'll shorten the table then. > >> >> Further what are the two * telling us? Quite likely they aren't there >> just >> accidentally. >> >> Finally, why sc.{w|d}.aqrl when in principle one would expect just >> sc.{w|d}.rl? > Because according to the last line of table A.5: > atomic <op> loop: lr.{w|d}.aq; <op>; sc.{w|d}.aqrl; > Here it is used sc.{w|d}.aqrl form, so I decided to stick to the what > is mentioned in the table.
I understand it's mentioned that way in the table. But it being that way is not explained anywhere. Hence my "Why?" >>> + __typeof__(*(ptr)) old__ = (__typeof__(*(ptr)))(old); \ >>> + __typeof__(*(ptr)) new__ = (__typeof__(*(ptr)))(new); \ >> >> The casts aren't very nice to have here; I take they're needed for >> cmpxchg_ptr() to compile? > Not really, I have not faced an compilation issue. > The reason why it was added is that lr instruction places the sign- > extended value in destination register, but if not to do cast value for > old and new were generated without sign extension, so, for example: > u32= 0xbbbbbbbb; > cmpxchg(&u32, 0xbbbbbbbb, 0xCCCCCCCC), u32); > Will fail because after: > "0: lr" lr_sfx " %0, %2\n" > in %0 we will have 0xFFFFFFFFBBBBBBBB, but in %3 we will have > 0xBBBBBBBB, so > bne %0, %z3, 1f\n" > %0 and %3 are always inequal in case when the most significat bit of > value read from %2 has 1. I'm afraid I don't follow: It's the type conversion you're after, but that would happen also with the casts omitted. > But now I realized that it would be better just to use a mask and not > be dependent from compiler code generation, so it would be better to in > the following way ( of course, the macros should be updated accordingly > to remarks you give me ): > #define __generic_cmpxchg(ptr, old, new, ret, lr_sfx, sc_sfx) \ > ({ \ > register unsigned int rc; \ > unsigned long mask = GENMASK(((sizeof(*(ptr))) * BITS_PER_BYTE) > - 1, 0); \ > asm volatile( \ > "0: lr" lr_sfx " %0, %2\n" \ > " and %0, %0, %z[mask]\n" \ > " bne %0, %z3, 1f\n" \ > " sc" sc_sfx " %1, %z4, %2\n" \ > " bnez %1, 0b\n" \ > "1:\n" \ > : "=&r" (ret), "=&r" (rc), "+A" (*ptr) \ > : "rJ" (old), "rJ" (new), [mask] "rJ" (mask) \ > : "memory"); \ > }) It'll be up to you whether to switch to such an alternative. Jan