Hi! On 2024-06-28T00:41:54+0200, I wrote: > On 2024-06-27T23:20:18+0200, I wrote: >> On 2024-06-27T22:27:21+0200, I wrote: >>> On 2024-06-27T18:49:17+0200, I wrote: >>>> On 2023-10-24T19:49:10+0100, Richard Sandiford <richard.sandif...@arm.com> >>>> wrote: >>>>> This patch adds a combine pass that runs late in the pipeline. >>> >>> [After sending, I realized I replied to a previous thread of this work.] >>> >>>> I've beek looking a bit through recent nvptx target code generation >>>> changes for GCC target libraries, and thought I'd also share here my >>>> findings for the "late-combine" changes in isolation, for nvptx target. >>>> >>>> First the unexpected thing: >>> >>> So much for "unexpected thing" -- next level of unexpected here... >>> Appreciated if anyone feels like helping me find my way through this, but >>> I totally understand if you've got other things to do. >> >> OK, I found something already. (Unexpectedly quickly...) ;-) >> >>>> there are a few cases where we now see unused >>>> registers get declared > >> But in fact, for both cases > > Now tested: 's%both%all'. :-) > >> the unexpected difference goes away if after >> 'pass_late_combine' I inject a 'pass_fast_rtl_dce'.
The following will be unnecessary assuming that Richard's proposed "Give fast DCE a separate dirty flag" gets accepted, but may still be useful if we follow through with the idea to enable (parts of) 'pass_postreload' for nvptx (as discussing with Roger), so, for later: >> The following makes these two cases work, but evidently needs a lot more >> analysis: a lot of other passes are enabled that may be anything between >> beneficial and harmful for 'targetm.no_register_allocation'/nvptx. >> >> --- gcc/passes.cc >> +++ gcc/passes.cc >> @@ -676,17 +676,17 @@ const pass_data pass_data_postreload = >> class pass_postreload : public rtl_opt_pass >> { >> public: >> pass_postreload (gcc::context *ctxt) >> : rtl_opt_pass (pass_data_postreload, ctxt) >> {} >> >> /* opt_pass methods: */ >> - bool gate (function *) final override { return reload_completed; } >> + bool gate (function *) final override { return reload_completed || >> targetm.no_register_allocation; } >> --- gcc/regcprop.cc >> +++ gcc/regcprop.cc >> @@ -1305,17 +1305,17 @@ class pass_cprop_hardreg : public rtl_opt_pass >> public: >> pass_cprop_hardreg (gcc::context *ctxt) >> : rtl_opt_pass (pass_data_cprop_hardreg, ctxt) >> {} >> >> /* opt_pass methods: */ >> bool gate (function *) final override >> { >> - return (optimize > 0 && (flag_cprop_registers)); >> + return (optimize > 0 && flag_cprop_registers && >> !targetm.no_register_allocation); >> } > > Also, that quickly ICEs; more '[...] && !targetm.no_register_allocation' > are needed elsewhere, at least. > > The following simpler thing, however, does work; move 'pass_fast_rtl_dce' > out of 'pass_postreload': > > --- gcc/passes.cc > +++ gcc/passes.cc > @@ -677,14 +677,15 @@ class pass_postreload : public rtl_opt_pass > { > public: > pass_postreload (gcc::context *ctxt) > : rtl_opt_pass (pass_data_postreload, ctxt) > {} > > /* opt_pass methods: */ > + opt_pass * clone () final override { return new pass_postreload > (m_ctxt); } > bool gate (function *) final override { return reload_completed; } > > }; // class pass_postreload > --- gcc/passes.def > +++ gcc/passes.def > @@ -529,7 +529,10 @@ along with GCC; see the file COPYING3. If not see > NEXT_PASS (pass_regrename); > NEXT_PASS (pass_fold_mem_offsets); > NEXT_PASS (pass_cprop_hardreg); > - NEXT_PASS (pass_fast_rtl_dce); > + POP_INSERT_PASSES () > + NEXT_PASS (pass_fast_rtl_dce); > + NEXT_PASS (pass_postreload); > + PUSH_INSERT_PASSES_WITHIN (pass_postreload) > NEXT_PASS (pass_reorder_blocks); > NEXT_PASS (pass_leaf_regs); > NEXT_PASS (pass_split_before_sched2); > > This (only) cleans up "the mess that 'pass_late_combine' created"; no > further changes in GCC target libraries for nvptx. (For avoidance of > doubt: "mess" is a great exaggeration here.) But that then disturbs non-nvptx targets; see (prerequisite) <https://inbox.sourceware.org/87ed8htdwy....@euler.schwinge.ddns.net> "Handle 'NUM' in 'PUSH_INSERT_PASSES_WITHIN'" for why. Then, see the attached -- just for later, for now -- "WIP Move 'pass_fast_rtl_dce' from 'pass_postreload' into 'pass_late_compilation'" for how to make this work properly. (This also puts back 'pass_fast_rtl_dce' into 'pass_late_compilation' instead of running it unconditionally, in order to not change any behavior in that regard.) Grüße Thomas >>> But: should we expect '-fno-late-combine-instructions' vs. >>> '-flate-combine-instructions' to behave in the same way? (After all, >>> '%r22' remains unused also with '-flate-combine-instructions', and >>> doesn't need to be emitted.) This could, of course, also be a nvptx back >>> end issue? >>> >>> I'm happy to supply any dump files etc. Also, 'tmp-libc_a-lnumeric.i.xz' >>> is attached if you'd like to reproduce this with your own nvptx target >>> 'cc1': >>> >>> $ [...]/configure --target=nvptx-none --enable-languages=c >>> $ make -j12 all-gcc >>> $ gcc/cc1 -fpreprocessed tmp-libc_a-lnumeric.i -quiet -dumpbase >>> tmp-libc_a-lnumeric.c -dumpbase-ext .c -misa=sm_30 -g -O2 -fno-builtin -o >>> tmp-libc_a-lnumeric.s -fdump-rtl-all # -fno-late-combine-instructions >>> >>> >>> Grüße >>> Thomas
>From ef14e15c3255059f374e04a47d838e9c98c9da2c Mon Sep 17 00:00:00 2001 From: Thomas Schwinge <tschwi...@baylibre.com> Date: Fri, 28 Jun 2024 00:41:54 +0200 Subject: [PATCH] WIP Move 'pass_fast_rtl_dce' from 'pass_postreload' into 'pass_late_compilation' id:87ed8i2ekt....@euler.schwinge.ddns.net --- gcc/passes.cc | 8 ++++++++ gcc/passes.def | 6 ++++++ 2 files changed, 14 insertions(+) diff --git a/gcc/passes.cc b/gcc/passes.cc index e444b462113..1cdd4a77f5b 100644 --- a/gcc/passes.cc +++ b/gcc/passes.cc @@ -685,6 +685,10 @@ public: {} /* opt_pass methods: */ + opt_pass *clone () final override + { + return new pass_postreload (m_ctxt); + } bool gate (function *) final override { if (reload_completed) @@ -728,6 +732,10 @@ public: {} /* opt_pass methods: */ + opt_pass *clone () final override + { + return new pass_late_compilation (m_ctxt); + } bool gate (function *) final override { return reload_completed || targetm.no_register_allocation; diff --git a/gcc/passes.def b/gcc/passes.def index 72198bc4c4e..cb221438a1e 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -529,7 +529,13 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_regrename); NEXT_PASS (pass_fold_mem_offsets); NEXT_PASS (pass_cprop_hardreg); + POP_INSERT_PASSES () + NEXT_PASS (pass_late_compilation); + PUSH_INSERT_PASSES_WITHIN (pass_late_compilation) NEXT_PASS (pass_fast_rtl_dce); + POP_INSERT_PASSES () + NEXT_PASS (pass_postreload); + PUSH_INSERT_PASSES_WITHIN (pass_postreload) NEXT_PASS (pass_reorder_blocks); NEXT_PASS (pass_leaf_regs); NEXT_PASS (pass_split_before_sched2); -- 2.34.1