[Bug libstdc++/119796] Atomic Operations Can Deadlock Without Hardware Support

2025-04-14 Thread alpha.and.omega.programmer at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119796

--- Comment #8 from AlphaOmegaProgrammer  ---
> So what about this untested patch?

I've spent about an hour trying to find flaws in the idea, and I can't think of
any. I applied the patch and it fixed the deadlock for me in my test code.

[Bug libstdc++/119796] Atomic Operations Can Deadlock Without Hardware Support

2025-04-14 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119796

--- Comment #7 from Jakub Jelinek  ---
Created attachment 61110
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61110&action=edit
gcc15-pr119796.patch

So what about this untested patch?
Leaving aside whether we really need to lock more than a single lock, at least
for GCC 15 and perhaps backports that doesn't feel like an option.
Note, the patch will lock no locks for n == 0 on a cacheline boundary, I guess
I could add there a special case for that, but with 0 size nothing should be
accessed, so I think no locks are needed.

[Bug libstdc++/119796] Atomic Operations Can Deadlock Without Hardware Support

2025-04-14 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119796

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek  ---
I think libat_lock_n/libat_unlock_n should just count how many of the locks it
will need, and if there is a wrap around first lock the mutexes at the start of
the array and then just fallthrough into the current code which would lock
those at the end of the array and instead of h = 0; break;

But there is another problem, I don't see it taking into account ptr position
within WATCH_SIZE.
Currently libat_lock_n/libat_unlock_n will lock single mutex for n in [0,
WATCH_SIZE],
2 mutexes for n in [WATCH_SIZE + 1, 2 * WATCH_SIZE] and so on.
Regardless how many cachelines it actually crosses.
One could call it on ((uintptr_t) ptr) % WATCH_SIZE == 0 (in that case what the
code does right now is reasonable, I hope nothing calls it for n == 0) or on
((uintptr_t) ptr) % WATCH_SIZE == 32 or on ((uintptr_t) ptr) % WATCH_SIZE ==
63.
I'd think that for the second case only sizes <= 32 should result in a single
lock, then [33, 96] with 2 locks, etc., and for the third case [0, 1] in a
single lock, then [2, 65] 2 locks, ...

[Bug libstdc++/119796] Atomic Operations Can Deadlock Without Hardware Support

2025-04-14 Thread alpha.and.omega.programmer at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119796

--- Comment #5 from AlphaOmegaProgrammer  ---
> Please see https://gcc.gnu.org/contribute.html for how to contribute, which
> includes sending a real patch

My apologies, this is my first time trying to contribute. I'll do this.


> This can only occur for atomic operations on huge objects though, right?

With the current implementation, it can theoretically happen on objects roughly
of size 4096 bytes / # of competing threads.


> it seems that the deadlock could be avoided by always locking the mutexes
> in the same order, i.e. start with the lowest address in the array, rather
> than starting with higher addresses and then wrapping around to a lower 
> address.

I thought about this, but the conclusion that I came to is that there are only
2 possible approaches:
* You can do what the current implementation does and index with the low bits
of the memory address.
* You can try to divide the entire memory space into buckets by indexing with
the high bits of the memory address.

The first approach obviously is prone to deadlocks using the method I've
demonstrated.

The second approach would allow for indexing sequentially and avoid wrapping
the index around, but now if your object is smaller than the total memory space
/ total # of mutexes, then one thread effectively blocks every other thread
trying to touch memory in that entire region of memory, which effectively makes
it a global lock anyway.


> That's not true, std::lock() uses an algorithm that works for that case.

> See https://howardhinnant.github.io/dining_philosophers.html for details on 
> the problem.

I will look into these before submitting a proper patch.

Thanks,
AOP

[Bug libstdc++/119796] Atomic Operations Can Deadlock Without Hardware Support

2025-04-14 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119796

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2025-04-14
   Keywords||wrong-code

[Bug libstdc++/119796] Atomic Operations Can Deadlock Without Hardware Support

2025-04-14 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119796

--- Comment #4 from Jonathan Wakely  ---
(In reply to AlphaOmegaProgrammer from comment #0)
> The problem with the current implementation is that it relies on a hash of
> the lower 12 bits of the pointer address to index an array of mutexes, and
> because the index can wrap around the array, 2 threads trying to perform an
> atomic operation on data requiring more than half of the locks in the array
> will have potential overlap in the range of locks they attempt to acquire.

This can only occur for atomic operations on huge objects though, right? They
should work without deadlock, but it's not a very important use case.

> Because of the wrap around, the competing threads may start locking threads
> at the end of the other thread's range, which causes a deadlock since
> neither thread would be able to acquire all of the locks it needs.
> 
> Any locking scheme that attempts to be performant by using multiple mutexes
> can not be threadsafe because of race conditions,

That's not true, std::lock() uses an algorithm that works for that case.

I don't know why libatomic locks multiple mutexes for a single object, but it
seems that the deadlock could be avoided by always locking the mutexes in the
same order, i.e. start with the lowest address in the array, rather than
starting with higher addresses and then wrapping around to a lower address.

There are more efficient algorithms, as used in the std::lock in libstdc++. See
https://howardhinnant.github.io/dining_philosophers.html for details on the
problem.

Using a single global mutex for all atomics scales very poorly. For the vast
majority of atomic operations, only a single mutex is locked and no deadlock
can occur. Pessimizing those operations by serializing them all because of a
bug when using atomics on huge objects is not going to be acceptable.

[Bug libstdc++/119796] Atomic Operations Can Deadlock Without Hardware Support

2025-04-14 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119796

--- Comment #3 from Jonathan Wakely  ---
(In reply to AlphaOmegaProgrammer from comment #1)
> Created attachment 61108 [details]
> Current implementation of locking algorithm
> 
> This file can be found at /libatomic/config/posix/lock.c

You don't need to attach the file here, it's in the gcc source tree.

(In reply to AlphaOmegaProgrammer from comment #2)
> Created attachment 61109 [details]
> Proposed Patch
> 
> This is my proposed patch. I've tested it and my test case no longer
> deadlocks, and none of my previous tests that did not deadlock seem to be
> affected. I haven't read through all of the contributing rules though and I
> don't have the time or knowledge to test this thoroughly, but I don't expect
> a relatively simple change like this should have any unknown impacts.

This isn't a patch, it's another copy of the file (presumably with some
changes).

Please see https://gcc.gnu.org/contribute.html for how to contribute, which
includes sending a real patch (not a copy of the modified file) to the mailing
list for review, thanks

[Bug libstdc++/119796] Atomic Operations Can Deadlock Without Hardware Support

2025-04-14 Thread alpha.and.omega.programmer at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119796

--- Comment #2 from AlphaOmegaProgrammer  ---
Created attachment 61109
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61109&action=edit
Proposed Patch

This is my proposed patch. I've tested it and my test case no longer deadlocks,
and none of my previous tests that did not deadlock seem to be affected. I
haven't read through all of the contributing rules though and I don't have the
time or knowledge to test this thoroughly, but I don't expect a relatively
simple change like this should have any unknown impacts.

[Bug libstdc++/119796] Atomic Operations Can Deadlock Without Hardware Support

2025-04-14 Thread alpha.and.omega.programmer at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119796

--- Comment #1 from AlphaOmegaProgrammer  ---
Created attachment 61108
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61108&action=edit
Current implementation of locking algorithm

This file can be found at /libatomic/config/posix/lock.c