Re: [PATCH] handle unwind tables that are embedded within unwinding code, [PR111731]

2024-03-22 Thread Thomas Neumann
libgcc/ChangeLog:  PR libgcc/111731  * unwind-dw2-fde.c: Split unwind ranges if they contain the  unwind table. And what I'd suggest is committing to the trunk now, then waiting a week or two before backporting to gcc-13. I will do that, thanks for looking at the patch. Best

[PATCH] handle unwind tables that are embedded within unwinding code, [PR111731]

2024-03-15 Thread Thomas Neumann
Original bug report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111731 Given that this is a regression, is this okay for gcc 13 and mainline? The unwinding mechanism registers both the code range and the unwind table itself within a b-tree lookup structure. That data structure assumes that is

[PATCH] preserve base pointer for __deregister_frame [PR110956]

2023-08-10 Thread Thomas Neumann via Gcc-patches
Original bug report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110956 Rainer Orth successfully tested the patch on Solaris with a full bootstrap. Some uncommon unwinding table encodings need to access the base pointer for address computations. We do not have that information in calls to

PING: [PATCH] fix radix sort on 32bit platforms [PR109670]

2023-06-02 Thread Thomas Neumann via Gcc-patches
Summary: The radix sort did not handle the uppermost byte correctly, which sometimes broke win32 exceptions. Bugzilla #109670. The reporter confirmed that the patch fixes the bug. See: https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618000.html Best Thomas

PING^2: [PATCH] release the sorted FDE array when deregistering a frame [PR109685]

2023-06-02 Thread Thomas Neumann via Gcc-patches
Summary: The old linear scan logic called free while searching the list of frames. The atomic fast path finds the frame quickly, but forgot the free call. This patches adds the missing free. Bugzilla #109685. See: https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619026.html Best Thomas

[PATCH v2] release the sorted FDE array when deregistering a frame [PR109685]

2023-05-19 Thread Thomas Neumann via Gcc-patches
Am 19.05.23 um 19:26 schrieb Jeff Law: See: https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617245.html I think this needs an update given the other changes in this space. jeff I have included the updated the patch below. The atomic fastpath bypasses the code that releases the sort

Re: [PATCH] Fix assertion for unwind-dw2-fde.c btree changes

2023-05-15 Thread Thomas Neumann via Gcc-patches
will check that everything works on atomic and non-atomic platforms and commit the trivial move then. Sorry for the breakage. Best Thomas From 550dc27f547a067e96137adeb85148d8a84c81a0 Mon Sep 17 00:00:00 2001 From: Thomas Neumann Date: Mon, 15 May 2023 14:59:22 +0200 Subject: [PATCH] fix assert

Re: [PATCH] Fix assertion for unwind-dw2-fde.c btree changes

2023-05-14 Thread Thomas Neumann via Gcc-patches
even building the flatbuffers repo externally using cmake at the same revision didn't reproduce it. that said, i have attached a dockerfile that shows you what /does/ fail, under the specific conditions. but there is no unpatched libgcc_s, so you'll have to do that part yourself, and build a

Re: [PATCH] Fix assertion for unwind-dw2-fde.c btree changes

2023-05-14 Thread Thomas Neumann via Gcc-patches
Dear Sören, we ran into a regression introduced by these changes. The regression manifests itself in a failing assertion in __deregister_frame_info_bases. The assertion failure was observed while using Chromium's `flatc` build system tool. The failing assertion is: unwind-dw2-fde.c:281

PING: [PATCH] release the sorted FDE array when deregistering a frame [PR109685]

2023-05-12 Thread Thomas Neumann via Gcc-patches
Summary: The old linear scan logic called free while searching the list of frames. The atomic fast path finds the frame quickly, but forgot the free call. This patches adds the missing free. Bugzilla #109685. See: https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617245.html Best Thomas

[PATCH] fix radix sort on 32bit platforms [PR109670]

2023-05-10 Thread Thomas Neumann via Gcc-patches
The radix sort uses two buffers, a1 for input and a2 for output. After every digit the role of the two buffers is swapped. When terminating the sort early the code made sure the output was in a2. However, when we run out of bits, as can happen on 32bit platforms, the sorted result was in a1, was

[PATCH] release the sorted FDE array when deregistering a frame [PR109685]

2023-05-02 Thread Thomas Neumann via Gcc-patches
The atomic fastpath bypasses the code that releases the sort array which was lazily allocated during unwinding. We now check after deregistering if there is an array to free. libgcc/ChangeLog: * unwind-dw2-fde.c: Free sort array in atomic fast path. --- libgcc/unwind-dw2-fde.c | 6

[PATCH] initialize fde objects lazily

2022-12-09 Thread Thomas Neumann via Gcc-patches
When registering an unwind frame with __register_frame_info_bases we currently initialize that fde object eagerly. This has the advantage that it is immutable afterwards and we can safely access it from multiple threads, but it has the disadvantage that we pay the initialization cost even if the

Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-11-22 Thread Thomas Neumann via Gcc-patches
Would it be possible to trigger lazy registration if the version is read as a zero? This would not introduce any additional atomic instructions on the fast path. yes, that is possible. The main problem is the transition from lazy to non-lazy mode when the first exception is thrown. We must

[PATCH] speed up end_fde_sort using radix sort

2022-11-22 Thread Thomas Neumann via Gcc-patches
When registering a dynamic unwinding frame the fde list is sorted. Previously, we split the list into a sorted and an unsorted part, sorted the later using heap sort, and merged both. That can be quite slow due to the large number of (expensive) comparisons. This patch replaces that logic with a

Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-11-21 Thread Thomas Neumann via Gcc-patches
Hi, When dynamically linking a fast enough machine hides the latency, but when Statically linking or on slower devices this change caused a 5x increase in Instruction count and 2x increase in cycle count before getting to main. I have looked at ways to fix that. The problem is that with

Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-11-21 Thread Thomas Neumann via Gcc-patches
It's easy to reproduce on x86 as well. As a testcase: #include int main(int argc, char** argv) { return 0; } And just compile with: g++ -O1 hello.cpp -static -o hello.exe. thanks, I will take a look. Best Thomas

Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-11-21 Thread Thomas Neumann via Gcc-patches
Hi, When dynamically linking a fast enough machine hides the latency, but when Statically linking or on slower devices this change caused a 5x increase in Instruction count and 2x increase in cycle count before getting to main. This has been quite noticeable on smaller devices. Is there a

Re: [PATCH] Avoid depending on destructor order

2022-09-26 Thread Thomas Neumann via Gcc-patches
Hi Iain, You might also want to include Rainer’s patch, AFAIR patches to fix bootstrap are allowed to proceed as an exception to the usual rules, I was not aware of that. I have pushed the patch below now (including Rainer's change), I will update the code if requested. Best Thomas

Re: [PATCH] Avoid depending on destructor order

2022-09-26 Thread Thomas Neumann via Gcc-patches
Hi Claudiu, This change prohibits compiling of ARC backend: + gcc_assert (in_shutdown || ob); in_shutdown is only defined when ATOMIC_FDE_FAST_PATH is defined, while gcc_assert is outside of any ifdef. Please can you revisit this line and change it accordingly. I have a patch ready, I am

Re: [PATCH] Avoid depending on destructor order

2022-09-23 Thread Thomas Neumann via Gcc-patches
+static const bool in_shutdown = false; I'll let Jason or others decide if this is the right solution.  It seems that in_shutdown also could be declared outside the #ifdef and initialized as "false". sure, either is fine. Moving it outside the #ifdef wastes one byte in the executable

Re: [PATCH] Avoid depending on destructor order

2022-09-23 Thread Thomas Neumann via Gcc-patches
This patch broke bootstrap on AIX and probably other targets. #ifdef ATOMIC_FDE_FAST_PATH #include "unwind-dw2-btree.h" static struct btree registered_frames; static bool in_shutdown; ... #else in_shutdown only is defined for ATOMIC_FDE_FAST_PATH but used in code / asserts not protected by

[PATCH] Avoid depending on destructor order

2022-09-19 Thread Thomas Neumann via Gcc-patches
In some scenarios (e.g., when mixing gcc and clang code), it can happen that frames are deregistered after the lookup structure has already been destroyed. That in itself would be fine, but it triggers an assert in __deregister_frame_info_bases that expects to find the frame. To avoid that, we

Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-09-19 Thread Thomas Neumann via Gcc-patches
At least when building LibreOffice with Clang (16 trunk) with ASan and UBsan enabled against libstdc++ (with --gcc-toolchain and LD_LIBRARY_PATH to pick up a libstdc++ trunk build including this change at build and run-time), at least one of the LibreOffice tests executed during the build

Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-09-19 Thread Thomas Neumann via Gcc-patches
I haven't debugged this in any way, nor checked whether it only impacts exactly my below scenario, but noticed the following: At least when building LibreOffice with Clang (16 trunk) with ASan and UBsan enabled against libstdc++ (with --gcc-toolchain and LD_LIBRARY_PATH to pick up a libstdc++

Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-09-18 Thread Thomas Neumann via Gcc-patches
Hi Dimitar, This patch broke avr and pru-elf cross builds: gcc/libgcc/unwind-dw2-fde.c:680:28: error: unknown type name ‘uintptr_t’ 680 |uintptr_t *range) Should uintptr_t be replaced with __UINTPTR_TYPE__? Such change fixes the above broken builds for me. But

Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-09-18 Thread Thomas Neumann via Gcc-patches
Hi Dimitar, This patch broke avr and pru-elf cross builds: gcc/libgcc/unwind-dw2-fde.c:680:28: error: unknown type name ‘uintptr_t’ 680 |uintptr_t *range) Should uintptr_t be replaced with __UINTPTR_TYPE__? Such change fixes the above broken builds for me. But

[PATCH v4] eliminate mutex in fast path of __register_frame

2022-09-16 Thread Thomas Neumann via Gcc-patches
. */ +/* Copyright (C) 2022 Free Software Foundation, Inc. + Contributed by Thomas Neumann + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version

Re: [PATCH v3] eliminate mutex in fast path of __register_frame

2022-09-12 Thread Thomas Neumann via Gcc-patches
Thanks for your feedback, I will update the patch in the next few days, addressing the comments and reorganizing classify_object_over_fdes. Concerning your question: + +restart: +  struct btree_node *iter; +  uintptr_t lock; +  { +    // Accessing the root node requires defending against

PING^2: [PATCH v3] eliminate mutex in fast path of __register_frame

2022-08-26 Thread Thomas Neumann via Gcc-patches
Summary: __register_frame and the corresponding _Unwind_Find_FDE use a global mutex for frame registration and unwinding. This can lead to very poor performance on machines with high core counts. This patch organizes the frames in a b-tree with read-optimized synchronization instead, which

PING: [PATCH v3] eliminate mutex in fast path of __register_frame

2022-07-27 Thread Thomas Neumann via Gcc-patches
Summary: __register_frame and the corresponding _Unwind_Find_FDE use a global mutex for frame registration and unwinding. This can lead to very poor performance on machines with high core counts. This patch organizes the frames in a b-tree with read-optimized synchronization instead, which

[PATCH v3] eliminate mutex in fast path of __register_frame

2022-06-26 Thread Thomas Neumann via Gcc-patches
..3b2b6871b46 --- /dev/null +++ b/libgcc/unwind-dw2-btree.h @@ -0,0 +1,953 @@ +/* Lock-free btree for manually registered unwind frames */ +/* Copyright (C) 2022 Free Software Foundation, Inc. + Contributed by Thomas Neumann + +This file is part of GCC. + +GCC is free software; you can redistribute

PING: [PATCH v2] eliminate mutex in fast path of __register_frame

2022-05-09 Thread Thomas Neumann via Gcc-patches
Summary: __register_frame and the corresponding _Unwind_Find_FDE use a global mutex for frame registration and unwinding. This can lead to very poor performance on machines with high core counts. This patch organizes the frames in a b-tree with read-optimized synchronization instead, which

[PATCH v2] eliminate mutex in fast path of __register_frame

2022-03-04 Thread Thomas Neumann via Gcc-patches
/unwind-dw2-btree.h b/libgcc/unwind-dw2-btree.h new file mode 100644 index 000..0b4835a439a --- /dev/null +++ b/libgcc/unwind-dw2-btree.h @@ -0,0 +1,952 @@ +/* Lock-free btree for manually registered unwind frames */ +/* Copyright (C) 2022 Free Software Foundation, Inc. + Contributed by Thomas

Re: [PATCH] eliminate mutex in fast path of __register_frame

2022-03-03 Thread Thomas Neumann via Gcc-patches
We may have to add a new interface. In some other cases, I've seen errno being used for error reporting, but that requires changes in applications to recognize the error. It's probably better to crash here than to fail mysteriously later. Out of curiosity, how many times do you call the

Re: [PATCH] eliminate mutex in fast path of __register_frame

2022-03-02 Thread Thomas Neumann via Gcc-patches
+// Common logic for version locks +struct version_lock +{ + // The lock itself + uintptr_t version_lock; +}; version_lock must not overflow, right? This means we need a wider counter on 32-bit, too. glibc contains a 62-bit counter that it uses for its own data structure. an overflow

[PATCH] eliminate mutex in fast path of __register_frame

2022-03-01 Thread Thomas Neumann via Gcc-patches
100644 index 000..b16bb9994f8 --- /dev/null +++ b/libgcc/unwind-dw2-btree.h @@ -0,0 +1,870 @@ +/* Lock-free btree for manually registered unwind frames */ +/* Copyright (C) 2022 Free Software Foundation, Inc. + Contributed by Thomas Neumann + +This file is part of GCC. + +GCC is free software