[Bug libstdc++/119796] Atomic Operations Can Deadlock Without Hardware Support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119796 --- Comment #8 from AlphaOmegaProgrammer --- > So what about this untested patch? I've spent about an hour trying to find flaws in the idea, and I can't think of any. I applied the patch and it fixed the deadlock for me in my test code.
[Bug libstdc++/119796] Atomic Operations Can Deadlock Without Hardware Support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119796 --- Comment #7 from Jakub Jelinek --- Created attachment 61110 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61110&action=edit gcc15-pr119796.patch So what about this untested patch? Leaving aside whether we really need to lock more than a single lock, at least for GCC 15 and perhaps backports that doesn't feel like an option. Note, the patch will lock no locks for n == 0 on a cacheline boundary, I guess I could add there a special case for that, but with 0 size nothing should be accessed, so I think no locks are needed.
[Bug libstdc++/119796] Atomic Operations Can Deadlock Without Hardware Support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119796 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #6 from Jakub Jelinek --- I think libat_lock_n/libat_unlock_n should just count how many of the locks it will need, and if there is a wrap around first lock the mutexes at the start of the array and then just fallthrough into the current code which would lock those at the end of the array and instead of h = 0; break; But there is another problem, I don't see it taking into account ptr position within WATCH_SIZE. Currently libat_lock_n/libat_unlock_n will lock single mutex for n in [0, WATCH_SIZE], 2 mutexes for n in [WATCH_SIZE + 1, 2 * WATCH_SIZE] and so on. Regardless how many cachelines it actually crosses. One could call it on ((uintptr_t) ptr) % WATCH_SIZE == 0 (in that case what the code does right now is reasonable, I hope nothing calls it for n == 0) or on ((uintptr_t) ptr) % WATCH_SIZE == 32 or on ((uintptr_t) ptr) % WATCH_SIZE == 63. I'd think that for the second case only sizes <= 32 should result in a single lock, then [33, 96] with 2 locks, etc., and for the third case [0, 1] in a single lock, then [2, 65] 2 locks, ...
[Bug libstdc++/119796] Atomic Operations Can Deadlock Without Hardware Support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119796 --- Comment #5 from AlphaOmegaProgrammer --- > Please see https://gcc.gnu.org/contribute.html for how to contribute, which > includes sending a real patch My apologies, this is my first time trying to contribute. I'll do this. > This can only occur for atomic operations on huge objects though, right? With the current implementation, it can theoretically happen on objects roughly of size 4096 bytes / # of competing threads. > it seems that the deadlock could be avoided by always locking the mutexes > in the same order, i.e. start with the lowest address in the array, rather > than starting with higher addresses and then wrapping around to a lower > address. I thought about this, but the conclusion that I came to is that there are only 2 possible approaches: * You can do what the current implementation does and index with the low bits of the memory address. * You can try to divide the entire memory space into buckets by indexing with the high bits of the memory address. The first approach obviously is prone to deadlocks using the method I've demonstrated. The second approach would allow for indexing sequentially and avoid wrapping the index around, but now if your object is smaller than the total memory space / total # of mutexes, then one thread effectively blocks every other thread trying to touch memory in that entire region of memory, which effectively makes it a global lock anyway. > That's not true, std::lock() uses an algorithm that works for that case. > See https://howardhinnant.github.io/dining_philosophers.html for details on > the problem. I will look into these before submitting a proper patch. Thanks, AOP
[Bug libstdc++/119796] Atomic Operations Can Deadlock Without Hardware Support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119796 Richard Biener changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2025-04-14 Keywords||wrong-code
[Bug libstdc++/119796] Atomic Operations Can Deadlock Without Hardware Support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119796 --- Comment #4 from Jonathan Wakely --- (In reply to AlphaOmegaProgrammer from comment #0) > The problem with the current implementation is that it relies on a hash of > the lower 12 bits of the pointer address to index an array of mutexes, and > because the index can wrap around the array, 2 threads trying to perform an > atomic operation on data requiring more than half of the locks in the array > will have potential overlap in the range of locks they attempt to acquire. This can only occur for atomic operations on huge objects though, right? They should work without deadlock, but it's not a very important use case. > Because of the wrap around, the competing threads may start locking threads > at the end of the other thread's range, which causes a deadlock since > neither thread would be able to acquire all of the locks it needs. > > Any locking scheme that attempts to be performant by using multiple mutexes > can not be threadsafe because of race conditions, That's not true, std::lock() uses an algorithm that works for that case. I don't know why libatomic locks multiple mutexes for a single object, but it seems that the deadlock could be avoided by always locking the mutexes in the same order, i.e. start with the lowest address in the array, rather than starting with higher addresses and then wrapping around to a lower address. There are more efficient algorithms, as used in the std::lock in libstdc++. See https://howardhinnant.github.io/dining_philosophers.html for details on the problem. Using a single global mutex for all atomics scales very poorly. For the vast majority of atomic operations, only a single mutex is locked and no deadlock can occur. Pessimizing those operations by serializing them all because of a bug when using atomics on huge objects is not going to be acceptable.
[Bug libstdc++/119796] Atomic Operations Can Deadlock Without Hardware Support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119796 --- Comment #3 from Jonathan Wakely --- (In reply to AlphaOmegaProgrammer from comment #1) > Created attachment 61108 [details] > Current implementation of locking algorithm > > This file can be found at /libatomic/config/posix/lock.c You don't need to attach the file here, it's in the gcc source tree. (In reply to AlphaOmegaProgrammer from comment #2) > Created attachment 61109 [details] > Proposed Patch > > This is my proposed patch. I've tested it and my test case no longer > deadlocks, and none of my previous tests that did not deadlock seem to be > affected. I haven't read through all of the contributing rules though and I > don't have the time or knowledge to test this thoroughly, but I don't expect > a relatively simple change like this should have any unknown impacts. This isn't a patch, it's another copy of the file (presumably with some changes). Please see https://gcc.gnu.org/contribute.html for how to contribute, which includes sending a real patch (not a copy of the modified file) to the mailing list for review, thanks
[Bug libstdc++/119796] Atomic Operations Can Deadlock Without Hardware Support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119796 --- Comment #2 from AlphaOmegaProgrammer --- Created attachment 61109 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61109&action=edit Proposed Patch This is my proposed patch. I've tested it and my test case no longer deadlocks, and none of my previous tests that did not deadlock seem to be affected. I haven't read through all of the contributing rules though and I don't have the time or knowledge to test this thoroughly, but I don't expect a relatively simple change like this should have any unknown impacts.
[Bug libstdc++/119796] Atomic Operations Can Deadlock Without Hardware Support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119796 --- Comment #1 from AlphaOmegaProgrammer --- Created attachment 61108 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61108&action=edit Current implementation of locking algorithm This file can be found at /libatomic/config/posix/lock.c