Hi Richard,
>> Benchmarking showed that LSE and LSE2 RMW atomics have similar performance
>> once
>> the atomic is acquire, release or both. Given there is already a significant
>> overhead due
>> to the function call, PLT indirection and argument setup, it doesn't make
>> sense to add
>> extra
Wilco Dijkstra writes:
> Hi,
>
>>> Is there no benefit to using SWPPL for RELEASE here? Similarly for the
>>> others.
>>
>> We started off implementing all possible memory orderings available.
>> Wilco saw value in merging less restricted orderings into more
>> restricted ones - mainly to reduce
Hi,
>> Is there no benefit to using SWPPL for RELEASE here? Similarly for the
>> others.
>
> We started off implementing all possible memory orderings available.
> Wilco saw value in merging less restricted orderings into more
> restricted ones - mainly to reduce codesize in less frequently use
On 1/5/24 11:47, Richard Sandiford wrote:
Victor Do Nascimento writes:
The armv9.4-a architectural revision adds three new atomic operations
associated with the LSE128 feature:
* LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit
value held in a pair of registers, with orig
Victor Do Nascimento writes:
> The armv9.4-a architectural revision adds three new atomic operations
> associated with the LSE128 feature:
>
> * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit
> value held in a pair of registers, with original data loaded into
> the same 2 regi
The armv9.4-a architectural revision adds three new atomic operations
associated with the LSE128 feature:
* LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit
value held in a pair of registers, with original data loaded into
the same 2 registers.
* LDSETP - Atomic OR (bitset) of