[Bug tree-optimization/117467] [15 Regression] 521.wrf_r again explodes memory/compile-time wise
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117467 Bug 117467 depends on bug 116758, which changed state. Bug 116758 Summary: [15 Regression] 25-40% binary size increase and up to 177% compile time increase for SPEC CPU wrf with Ofast since r15-3529-g506417dbc8b1cb https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116758 What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/117467] [15 Regression] 521.wrf_r again explodes memory/compile-time wise
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117467 --- Comment #13 from Richard Biener --- This is somewhat mitigated now but the actual inefficiency is still there.
[Bug tree-optimization/117467] [15 Regression] 521.wrf_r again explodes memory/compile-time wise
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117467 --- Comment #12 from GCC Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:03faac507913803de76eab04fd74e754c70aa8c4 commit r15-6793-g03faac507913803de76eab04fd74e754c70aa8c4 Author: Richard Biener Date: Fri Jan 10 12:30:29 2025 +0100 rtl-optimization/117467 - limit ext-dce memory use The following puts in a hard limit on ext-dce because it might end up requiring memory on the order of the number of basic blocks times the number of pseudo registers. The limiting follows what GCSE based passes do and thus I re-use --param max-gcse-memory here. This doesn't in any way address the implementation issues of the pass, but it reduces the memory-use when compiling the module_first_rk_step_part1.F90 TU from 521.wrf_r from 25GB to 1GB. PR rtl-optimization/117467 PR rtl-optimization/117934 * ext-dce.cc (ext_dce_execute): Do nothing if a memory allocation estimate exceeds what is allowed by --param max-gcse-memory.
[Bug tree-optimization/117467] [15 Regression] 521.wrf_r again explodes memory/compile-time wise
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117467 --- Comment #11 from Richard Biener --- One issue with the dataflow problem is that it doesn't fit what df_simple_dataflow expects - using static bool ext_dce_rd_confluence_n (edge) { return true; } will cause _all_ blocks to be iterated all the time, the idea is that ext_dce_rd_transfer_n would, from live_out compute live_in (rather than mangling both into its single 'livein' bitmap) and compute live_out in ext_dce_rd_confluence_n from the successors live_in, only returning true if live_out changed. That this bug is now defered to stage4 makes a proper complete rewrite (sic!) hardly possible. Most of the problem looks like computing LR but we're intermangling this with using LR as it evolves throughout the BB with defs to do the actual ext-dce. Why's this not simply using DF LR and doing a _single_ backward walk performing the ext-dce?! As said, I think this pass needs to be re-done from scratch, eventually just killed off again for now (not to mention it's the triple duplicate of similar functionality elsehwere...). Alternatively it looks like memory should grow linearly with max_reg_num * 4 * n_basic_blocks_for_fn, so disabling the pass when this becomes large is necessary. OTOH I hardly can see how this would get us to 25GB, so something else is might be broken here. For module_first_rk_step_part1.fppized.f90 we have max_reg_num == 262610 and last_basic_block is 32042, with full 'livein' this would amount to around 8GB of bitmap memory (4 bits per reg, 50% overhead). I have a patch limiting us based on this like we do limit GCSE based passes. We already do Warning: const/copy propagation disabled: 36613 basic blocks and 247372 registers; increase '--param max-gcse-memory' above 1105827 [-Wdisabled-optimization] module_first_rk_step_part1.fppized.f90:1315:36: Warning: PRE disabled: 36613 basic blocks and 247372 registers; increase '--param max-gcse-memory' above 1105827 [-Wdisabled-optimization] module_first_rk_step_part1.fppized.f90:1315:36: Warning: const/copy propagation disabled: 36613 basic blocks and 247372 registers; increase '--param max-gcse-memory' above 1105827 [-Wdisabled-optimization]
[Bug tree-optimization/117467] [15 Regression] 521.wrf_r again explodes memory/compile-time wise
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117467 --- Comment #10 from GCC Commits --- The master branch has been updated by Andrew Macleod : https://gcc.gnu.org/g:c7fd6c4369ef1a009b40c1787ea9d2dad2cf449f commit r15-6000-gc7fd6c4369ef1a009b40c1787ea9d2dad2cf449f Author: Andrew MacLeod Date: Sat Nov 23 14:05:54 2024 -0500 Only add inferred ranges if they change the value. Do not add an inferred range if it is already incorprated in the current range of an SSA_NAME. PR tree-optimization/117467 * gimple-range-infer.cc (infer_range_manager::add_ranges): Check range_of_expr to see if the inferred range is needed.
[Bug tree-optimization/117467] [15 Regression] 521.wrf_r again explodes memory/compile-time wise
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117467 --- Comment #9 from GCC Commits --- The master branch has been updated by Andrew Macleod : https://gcc.gnu.org/g:48eda34624fe5de050ae5ee38a360155ab188c39 commit r15-5998-g48eda34624fe5de050ae5ee38a360155ab188c39 Author: Andrew MacLeod Date: Mon Nov 25 09:50:33 2024 -0500 Do not calculate an entry range for invariant names. If an SSA_NAME is invariant, do not calculate an on_entry value. PR tree-optimization/117467 * gimple-range-cache.cc (ranger_cache::entry_range): Do not invoke range_from_dom for invariant ssa-names.
[Bug tree-optimization/117467] [15 Regression] 521.wrf_r again explodes memory/compile-time wise
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117467 Sam James changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |law at gcc dot gnu.org --- Comment #8 from Sam James --- Assigning based on https://inbox.sourceware.org/gcc-patches/6017e9f1-0e5d-4261-97e5-238442bb4...@gmail.com/.
[Bug tree-optimization/117467] [15 Regression] 521.wrf_r again explodes memory/compile-time wise
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117467 --- Comment #7 from GCC Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:7a07de2c60b3c513b6aef206e9b55b3ffefe8b39 commit r15-5008-g7a07de2c60b3c513b6aef206e9b55b3ffefe8b39 Author: Richard Biener Date: Thu Nov 7 09:23:03 2024 +0100 rtl-optimization/117467 - 33% compile-time in rest of compilation ext-dce uses TV_NONE, that's not OK for a pass taking 33% compile-time. The following adds a timevar to it for proper blaming. PR rtl-optimization/117467 * timevar.def (TV_EXT_DCE): New. * ext-dce.cc (pass_data_ext_dce): Use TV_EXT_DCE.
[Bug tree-optimization/117467] [15 Regression] 521.wrf_r again explodes memory/compile-time wise
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117467 --- Comment #6 from Andrew Pinski --- (In reply to Richard Biener from comment #5) > So confirmed the 25GB memory use is ext-dce, with -fno-ext-dce memory use is > donw to 3GB. The time report then shows VRP as offender: > > tree VRP : 76.20 ( 23%) 125M ( 4%) > dominator optimization : 28.30 ( 8%)84M ( 3%) > > given 25GB memory use is going to trash most machines this is P1. > > The testcase is quite small but has lots of calls with lots of arguments > that might or might not invoke fortran array copying, it's probably > difficult to reduce sensibly (it has lots of module USEs). Jeff and Andrew > should have access to SPEC, so I won't spend time trying at this point. Note I think PR 116758 is the recording the ranger/DOM/VRP side of things too.
[Bug tree-optimization/117467] [15 Regression] 521.wrf_r again explodes memory/compile-time wise
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117467 Richard Biener changed: What|Removed |Added Last reconfirmed||2024-11-07 Priority|P3 |P1 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #5 from Richard Biener --- So confirmed the 25GB memory use is ext-dce, with -fno-ext-dce memory use is donw to 3GB. The time report then shows VRP as offender: tree VRP : 76.20 ( 23%) 125M ( 4%) dominator optimization : 28.30 ( 8%)84M ( 3%) given 25GB memory use is going to trash most machines this is P1. The testcase is quite small but has lots of calls with lots of arguments that might or might not invoke fortran array copying, it's probably difficult to reduce sensibly (it has lots of module USEs). Jeff and Andrew should have access to SPEC, so I won't spend time trying at this point.
[Bug tree-optimization/117467] [15 Regression] 521.wrf_r again explodes memory/compile-time wise
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117467 Richard Biener changed: What|Removed |Added CC||amacleod at redhat dot com --- Comment #4 from Richard Biener --- - 28.13%28.04%590285 f951 f951 [.] bitmap_bit_p(bitmap_head const*, int) ▒ + 8.28% _start ▒ - 2.79% gimple_simplify_PLUS_EXPR(gimple_match_op*, gimple**, tree_node* (*)(tree_node*), code_helper, tree_node*, ▒ - 2.78% gimple_resimplify2(gimple**, gimple_match_op*, tree_node* (*)(tree_node*)) ▒ - gimple_simplify_MULT_EXPR(gimple_match_op*, gimple**, tree_node* (*)(tree_node*), code_helper, tree_node*, ▒ - 2.26% pta_valueize(tree_node*) ▒ range_query::value_of_expr(tree_node*, gimple*) ▒ + gimple_ranger::range_of_expr(vrange&, tree_node*, gimple*) ▒ + 2.36% gimple_ranger::range_of_stmt(vrange&, gimple*, tree_node*) ▒ + 2.23% gimple_simplify_POINTER_PLUS_EXPR(gimple_match_op*, gimple**, tree_node* (*)(tree_node*), code_helper, tree▒ + 2.07% fold_using_range::fold_stmt(vrange&, gimple*, fur_source&, tree_node*) ▒ + 2.03% execute_ranger_vrp(function*, bool) this seems in the end all related to prange and pta_valueize?
[Bug tree-optimization/117467] [15 Regression] 521.wrf_r again explodes memory/compile-time wise
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117467 --- Comment #3 from Richard Biener --- (In reply to Richard Biener from comment #2) > - 32.11% (anonymous namespace)::pass_ext_dce::execute(function*) > ▒ > - ext_dce_execute() > ▒ > - 32.10% df_worklist_dataflow(dataflow*, bitmap_head*, int*, > int) ▒ >- 32.08% ext_dce_rd_transfer_n(int) > ▒ > + 14.75% ext_dce_process_uses(rtx_insn*, rtx_def*, > bitmap_head*, bool) ▒ > + 8.18% bitmap_ior_into(bitmap_head*, bitmap_head const*) > ▒ > + 4.49% ext_dce_process_sets(rtx_insn*, rtx_def*, > bitmap_head*) ▒ > 3.34% bitmap_copy(bitmap_head*, bitmap_head const*) > ▒ > 1.31% bitmap_equal_p(bitmap_head const*, bitmap_head > const*) > > likely (unverified) also the source of 25GB memory use. > > The DF problem seems seriously unoptimized - it lacks a separate "local" > compute > step (the ext_dce_process_sets part that populates live_tmp _per insn_!). That is, usually the transfer function is the IOR of input and appropriate IOR/whatever of the (cached!) local compute result.
[Bug tree-optimization/117467] [15 Regression] 521.wrf_r again explodes memory/compile-time wise
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117467 Richard Biener changed: What|Removed |Added CC||law at gcc dot gnu.org --- Comment #2 from Richard Biener --- - 32.11% (anonymous namespace)::pass_ext_dce::execute(function*) ▒ - ext_dce_execute() ▒ - 32.10% df_worklist_dataflow(dataflow*, bitmap_head*, int*, int) ▒ - 32.08% ext_dce_rd_transfer_n(int) ▒ + 14.75% ext_dce_process_uses(rtx_insn*, rtx_def*, bitmap_head*, bool) ▒ + 8.18% bitmap_ior_into(bitmap_head*, bitmap_head const*) ▒ + 4.49% ext_dce_process_sets(rtx_insn*, rtx_def*, bitmap_head*) ▒ 3.34% bitmap_copy(bitmap_head*, bitmap_head const*) ▒ 1.31% bitmap_equal_p(bitmap_head const*, bitmap_head const*) likely (unverified) also the source of 25GB memory use. The DF problem seems seriously unoptimized - it lacks a separate "local" compute step (the ext_dce_process_sets part that populates live_tmp _per insn_!).
[Bug tree-optimization/117467] [15 Regression] 521.wrf_r again explodes memory/compile-time wise
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117467 --- Comment #1 from Richard Biener --- Samples: 2M of event 'cycles:Pu', Event count (approx.): 2183019518772 Overhead Samples Command Shared Object Symbol 29.34%627170 f951 f951 [.] bitmap_bit_p(bitmap_head const*, int) 10.68%231516 f951 f951 [.] bitmap_set_bit(bitmap_head*, int) 5.62%122003 f951 f951 [.] bitmap_set_range(bitmap_head*, unsigned int, unsigned int) [ 5.23%113260 f951 f951 [.] bitmap_list_insert_element_after(bitmap_head*, bitmap_elemen 4.23% 90654 f951 f951 [.] df_count_refs(bool, bool, bool) 4.03% 87072 f951 f951 [.] bitmap_ior_into(bitmap_head*, bitmap_head const*) 3.53% 77003 f951 f951 [.] bitmap_copy(bitmap_head*, bitmap_head const*) 2.78% 59162 f951 f951 [.] bitmap_and_compl_into(bitmap_head*, bitmap_head const*) 1.60% 34193 f951 f951 [.] lra_remat() yay. + 32.11% (anonymous namespace)::pass_ext_dce::execute(function*) is the "rest of compilation", fixing that.
[Bug tree-optimization/117467] [15 Regression] 521.wrf_r again explodes memory/compile-time wise
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117467 Richard Biener changed: What|Removed |Added Target Milestone|--- |15.0 Keywords||compile-time-hog, ||memory-hog