[Bug ada/116832] code after select-then-abort in abortable part executes when outer select-then-abort completes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116832 --- Comment #15 from Liam Powell --- I think PR ada/123990 becomes fixable once this is fixed as Maybe_Reraise_Abort or similar can be called at the start of any exception handler.
[Bug ada/116832] code after select-then-abort in abortable part executes when outer select-then-abort completes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116832 --- Comment #14 from Liam Powell --- Created attachment 63863 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=63863&action=edit fix with debug print I have been running the attached fix on my local compiler to look for any other code that may be impacted by the previous fix. This prints to stderr every time the fix is triggered. If anyone else has a big projects with lots of ATCs then testing would be useful.
[Bug ada/116832] code after select-then-abort in abortable part executes when outer select-then-abort completes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116832 --- Comment #13 from Liam Powell --- I did end up looking at this a bit more. What caused this to break in the first place was the switch to zero cost exceptions. It's possible to fix the original behaviour but it's not trivial. Just restoring Undefer_Abort to the exception handler and setting the correct deferral level is not enough as the delicate ping-pong between Exit_One_ATC_Level and Undefer_Abort still breaks in my examples after the first or second cycle. I really can't come up with any reason why they did it that way compared to my way after more digging through the code. Also just a plain Undefer_Abort in the exception handler won't work if you do try to restore the old behaviour. The reason it used to work is because SJLJ and ZCX used to have different different semantics around exception in exception handlers. SJLJ would defer aborts in exception handlers while ZCX would leave them alone. This was changed in 2020 in commit 05e59503c6e57851104649d8781727c4571a8b2c so now neither option has aborts deferred.
[Bug ada/116832] code after select-then-abort in abortable part executes when outer select-then-abort completes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116832 --- Comment #12 from Liam Powell --- I narrowed this down a bit more to 4.2.4 and 4.3.3 since I have binaries for those. The -gnatG output is effectively identical between these versions so the issue is somewhere in the runtime, however I can't see anything relevant in the diff when searching for keywords.
[Bug ada/116832] code after select-then-abort in abortable part executes when outer select-then-abort completes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116832 --- Comment #11 from Liam Powell --- I did one more bit of testing and confirmed this is a regression. The bug does not occur on 3.2.3 but does occur on 4.8.2.
[Bug ada/116832] code after select-then-abort in abortable part executes when outer select-then-abort completes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116832 --- Comment #10 from Eric Botcazou --- Thanks for the investigation. To be honest, I'm not a specialist of the tasking runtime so I cannot really comment at this point; I'll need to dig in first.
[Bug ada/116832] code after select-then-abort in abortable part executes when outer select-then-abort completes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116832 --- Comment #9 from Liam Powell --- Last thing I want to note and then I'm giving up on trying to fully understand what's going on here: The original Ada support commit has this comment (still present today) regarding the ATC_Hack variable: > -- The solution really belongs in the Abort_Signal handler > -- for async. entry calls. The present hack is very > -- fragile. It relies that the very next point after > -- Exit_One_ATC_Level at which the task becomes abortable > -- will be the call to Undefer_Abort in the > -- Abort_Signal handler. It seems like what they're talking about here is exactly what I have done in my patch. I can't figure out why they didn't just do what I've done since it's much simpler then this whole dance with Abort_Undefer. There must be some more complexity to this that my patch doesn't address correctly.
[Bug ada/116832] code after select-then-abort in abortable part executes when outer select-then-abort completes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116832 --- Comment #8 from Liam Powell --- Looking at this more, it appears that nested ATCs likely worked in the very first commit which added Ada support. The Abort_Signal exception handler used to call Undefer_Abort which would eventually lead to Exit_One_ATC_Level where you'll still find this comment today: > -- Force the next Undefer_Abort to re-raise Abort_Signal It appears this was broken a long time ago, but I haven't tracked down exactly when, it was before GCC 8.2. At a much later point the call to Undefer_Abort was removed as it did nothing at that time. Presumably the breakage occurred after e9906cbf174623cc53b32ad2a0f6d603d6f975b5 as this would have hidden whatever causes Deferral_Level to be set to zero, which is what causes the procedure to do nothing. So really my patch shouldn't be used and the original behaviour should be restored, however I can not figure out what's different from the first Ada commit that broke it. Specifically I can not find where the Deferral_Level is set to non-zero, but I know it has to be there as there's an assertion to that effect in Undefer_Abort.
[Bug ada/116832] code after select-then-abort in abortable part executes when outer select-then-abort completes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116832 --- Comment #7 from Liam Powell --- Created attachment 63570 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=63570&action=edit possible fix I think I've fixed this, refer to attached patch. Previous comment appears unrelated but I'm still not sure if it's correct or not. The issue here is that any level of ATC may raise an Abort_Signal, but Build_Abort_Block always assumes that a caught Abort_Signal belongs to the innermost ATC. All the runtime work for tracking ATC levels is in place with Pending_ATC_Level and ATC_Nesting_Level so the fix here is straightforward: Just add a procedure to the runtime which raises Abort_Signal if Pending_ATC_Level < ATC_Nesting_Level and call it from the Build_Abort_Block exception handler. I've placed this procedure inside System.Tasking.Stages, which is almost certainly the wrong place, but I don't know what the right place is. Testing with the examples above: The first example prints nothing, as you'd expect. The second example prints AB and does not raise a Tasking_Error, which I'm pretty sure is correct but I have not dug in to the RM to check.
[Bug ada/116832] code after select-then-abort in abortable part executes when outer select-then-abort completes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116832 --- Comment #6 from Liam Powell --- I have not worked out exactly what's going on here, however there is a very suspect line in System.Tasking.Utilities.Cancel_Queued_Entry_Calls. Specifically `Entry_Call.State := Done` seems like it should be `Entry_Call.State := Cancelled`. Changing this alone does not fix the issue but it could be related as there's various checks for cancellation in Exp_Ch9. Done is documented as: -- Done indicates the call has been completed, without cancellation, ...
