https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91895
Bug ID: 91895 Summary: Compile the code with -O1 or -O2 is slower than with -O3 and -Os Product: gcc Version: 4.6.4 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: hehaochen at hotmail dot com Target Milestone: --- Compiling the following code in gcc (GCC) 4.6.4: # time gcc -O1 test.cpp real 0m18.180s user 0m16.368s sys 0m1.812s # time gcc -O2 test.cpp real 0m40.856s user 0m38.808s sys 0m2.047s # time gcc -O3 test.cpp real 0m9.954s user 0m9.115s sys 0m0.840s Interestingly, we get: # time gcc -Os test.cpp real 0m0.051s user 0m0.034s sys 0m0.017s According to https://gcc.gnu.org/onlinedocs/gcc-9.2.0/gcc/Optimize-Options.html#Optimize-Options -O3 implies -O1 and -O2, so -O3 should be the slowest. -------------------------------------------- int a, b, c, d, e; int foo() { int x = 0; for (a = 0; a < 17; a++) for (b = 0; b < 17; b++) for (c = 0; c < 17; c++) // change 17 to 18 or larger, the problem "solved" for (d = 0; d < 17; d++) x++; } int main () { foo(); } -------------------------------------------- Another interesting thing is, if you change any iteration count number larger, the problem vanished. If you change the iteration count number smaller, the slow down get relieved. The compile time hog lies in "tree VRP", "tree reassociation" -------------------------------------------- +++ slow +++ -------------------------------------------- time gcc -O1 -ftime-report test.cpp Execution times (seconds) name lookup : 0.02 ( 0%) usr 0.03 ( 2%) sys 0.04 ( 0%) wall 83 kB ( 0%) ggc tree CFG cleanup : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 6 kB ( 0%) ggc tree SSA rewrite : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 2 kB ( 0%) ggc tree SSA other : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc tree SSA incremental : 0.50 ( 3%) usr 0.38 (21%) sys 0.85 ( 5%) wall 6 kB ( 0%) ggc tree operand scan : 0.32 ( 2%) usr 0.53 (30%) sys 0.79 ( 4%) wall 9894 kB (19%) ggc dominator optimization: 0.19 ( 1%) usr 0.19 (11%) sys 0.40 ( 2%) wall 5 kB ( 0%) ggc tree CCP : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1 kB ( 0%) ggc tree reassociation : 14.78 (90%) usr 0.03 ( 2%) sys 14.82 (82%) wall 2 kB ( 0%) ggc tree aggressive DCE : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc complete unrolling : 0.41 ( 3%) usr 0.60 (34%) sys 1.09 ( 6%) wall 41245 kB (78%) ggc tree rename SSA copies: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc out of ssa : 0.00 ( 0%) usr 0.01 ( 1%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc rest of compilation : 0.01 ( 0%) usr 0.01 ( 1%) sys 0.01 ( 0%) wall 5 kB ( 0%) ggc repair loop structures: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 1 kB ( 0%) ggc TOTAL : 16.35 1.78 18.13 52963 kB real 0m18.180s user 0m16.368s sys 0m1.812s -------------------------------------------- +++ VERY SLOW +++ -------------------------------------------- time gcc -O2 -ftime-report test.cpp Execution times (seconds) name lookup : 0.02 ( 0%) usr 0.02 ( 1%) sys 0.05 ( 0%) wall 83 kB ( 0%) ggc tree CFG cleanup : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 6 kB ( 0%) ggc tree VRP : 22.39 (58%) usr 0.60 (30%) sys 22.83 (56%) wall 4513 kB ( 8%) ggc tree SSA rewrite : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 2 kB ( 0%) ggc tree SSA incremental : 0.51 ( 1%) usr 0.32 (16%) sys 1.03 ( 3%) wall 6 kB ( 0%) ggc tree operand scan : 0.58 ( 1%) usr 0.34 (17%) sys 0.94 ( 2%) wall 9894 kB (17%) ggc dominator optimization: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 5 kB ( 0%) ggc tree reassociation : 14.77 (38%) usr 0.02 ( 1%) sys 14.80 (36%) wall 2 kB ( 0%) ggc tree aggressive DCE : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 7 kB ( 0%) ggc complete unrolling : 0.40 ( 1%) usr 0.69 (34%) sys 1.03 ( 3%) wall 41248 kB (72%) ggc tree rename SSA copies: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc out of ssa : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc integrated RA : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.00 ( 0%) wall 3 kB ( 0%) ggc rest of compilation : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 5 kB ( 0%) ggc repair loop structures: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 1 kB ( 0%) ggc TOTAL : 38.78 2.02 40.81 57496 kB real 0m40.856s user 0m38.808s sys 0m2.047s -------------------------------------------- +++ faster than O1/O2, but still slow +++ -------------------------------------------- time gcc -O3 -ftime-report test.cpp Execution times (seconds) ipa cp : 0.00 ( 0%) usr 0.01 ( 1%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc name lookup : 0.02 ( 0%) usr 0.03 ( 4%) sys 0.01 ( 0%) wall 83 kB ( 0%) ggc tree CFG cleanup : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 5 kB ( 0%) ggc tree VRP : 1.34 (15%) usr 0.07 ( 9%) sys 1.41 (14%) wall 929 kB ( 3%) ggc tree SSA rewrite : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 2 kB ( 0%) ggc tree SSA incremental : 0.26 ( 3%) usr 0.22 (27%) sys 0.50 ( 5%) wall 6 kB ( 0%) ggc tree operand scan : 0.19 ( 2%) usr 0.18 (22%) sys 0.31 ( 3%) wall 4966 kB (17%) ggc dominator optimization: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 5 kB ( 0%) ggc tree reassociation : 7.03 (77%) usr 0.02 ( 2%) sys 7.04 (71%) wall 2 kB ( 0%) ggc complete unrolling : 0.21 ( 2%) usr 0.27 (33%) sys 0.52 ( 5%) wall 20656 kB (73%) ggc out of ssa : 0.00 ( 0%) usr 0.01 ( 1%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc rest of compilation : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 5 kB ( 0%) ggc unaccounted todo : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc repair loop structures: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1 kB ( 0%) ggc TOTAL : 9.10 0.81 9.91 28392 kB real 0m9.954s user 0m9.115s sys 0m0.840s