Re: [PATCH v3 2/3] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2024-01-08 Thread Wilco Dijkstra
Hi Richard, >> Benchmarking showed that LSE and LSE2 RMW atomics have similar performance >> once >> the atomic is acquire, release or both. Given there is already a significant >> overhead due >> to the function call, PLT indirection and argument setup, it doesn't make >> sense to add >> extra

Re: [PATCH v3 2/3] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2024-01-08 Thread Richard Sandiford
Wilco Dijkstra writes: > Hi, > >>> Is there no benefit to using SWPPL for RELEASE here? Similarly for the >>> others. >> >> We started off implementing all possible memory orderings available. >> Wilco saw value in merging less restricted orderings into more >> restricted ones - mainly to reduce

Re: [PATCH v3 2/3] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2024-01-08 Thread Wilco Dijkstra
Hi, >> Is there no benefit to using SWPPL for RELEASE here?  Similarly for the >> others. > > We started off implementing all possible memory orderings available. > Wilco saw value in merging less restricted orderings into more > restricted ones - mainly to reduce codesize in less frequently use

Re: [PATCH v3 2/3] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2024-01-08 Thread Victor Do Nascimento
On 1/5/24 11:47, Richard Sandiford wrote: Victor Do Nascimento writes: The armv9.4-a architectural revision adds three new atomic operations associated with the LSE128 feature: * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit value held in a pair of registers, with orig

Re: [PATCH v3 2/3] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2024-01-05 Thread Richard Sandiford
Victor Do Nascimento writes: > The armv9.4-a architectural revision adds three new atomic operations > associated with the LSE128 feature: > > * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit > value held in a pair of registers, with original data loaded into > the same 2 regi

[PATCH v3 2/3] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2024-01-02 Thread Victor Do Nascimento
The armv9.4-a architectural revision adds three new atomic operations associated with the LSE128 feature: * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit value held in a pair of registers, with original data loaded into the same 2 registers. * LDSETP - Atomic OR (bitset) of