https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91895

            Bug ID: 91895
           Summary: Compile the code with -O1 or -O2 is slower than with
                    -O3 and -Os
           Product: gcc
           Version: 4.6.4
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hehaochen at hotmail dot com
  Target Milestone: ---

Compiling the following code in gcc (GCC) 4.6.4:
# time gcc -O1 test.cpp 
  real    0m18.180s
  user    0m16.368s
  sys     0m1.812s

# time gcc -O2 test.cpp 
  real    0m40.856s
  user    0m38.808s
  sys     0m2.047s

# time gcc -O3 test.cpp 
  real    0m9.954s
  user    0m9.115s
  sys     0m0.840s

Interestingly, we get:

# time gcc -Os test.cpp 
  real    0m0.051s
  user    0m0.034s
  sys     0m0.017s

According to
https://gcc.gnu.org/onlinedocs/gcc-9.2.0/gcc/Optimize-Options.html#Optimize-Options
-O3 implies -O1 and -O2, so -O3 should be the slowest.

--------------------------------------------
int a, b, c, d, e; 

int foo() {
    int x = 0;
    for (a = 0; a < 17; a++)
        for (b = 0; b < 17; b++)
            for (c = 0; c < 17; c++)
                // change 17 to 18 or larger, the problem "solved"
                for (d = 0; d < 17; d++) 
                        x++;
}


int main ()
{
    foo();
}
--------------------------------------------

Another interesting thing is, if you change any iteration count number larger,
the problem vanished. If you change the iteration count number smaller, the
slow down get relieved.

The compile time hog lies in "tree VRP", "tree reassociation"
--------------------------------------------
+++                slow                  +++
--------------------------------------------
 time gcc -O1 -ftime-report test.cpp 

Execution times (seconds)
 name lookup           :   0.02 ( 0%) usr   0.03 ( 2%) sys   0.04 ( 0%) wall   
  83 kB ( 0%) ggc
 tree CFG cleanup      :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall   
   6 kB ( 0%) ggc
 tree SSA rewrite      :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall   
   2 kB ( 0%) ggc
 tree SSA other        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall   
   0 kB ( 0%) ggc
 tree SSA incremental  :   0.50 ( 3%) usr   0.38 (21%) sys   0.85 ( 5%) wall   
   6 kB ( 0%) ggc
 tree operand scan     :   0.32 ( 2%) usr   0.53 (30%) sys   0.79 ( 4%) wall   
9894 kB (19%) ggc
 dominator optimization:   0.19 ( 1%) usr   0.19 (11%) sys   0.40 ( 2%) wall   
   5 kB ( 0%) ggc
 tree CCP              :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   1 kB ( 0%) ggc
 tree reassociation    :  14.78 (90%) usr   0.03 ( 2%) sys  14.82 (82%) wall   
   2 kB ( 0%) ggc
 tree aggressive DCE   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc
 complete unrolling    :   0.41 ( 3%) usr   0.60 (34%) sys   1.09 ( 6%) wall  
41245 kB (78%) ggc
 tree rename SSA copies:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc
 out of ssa            :   0.00 ( 0%) usr   0.01 ( 1%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc
 rest of compilation   :   0.01 ( 0%) usr   0.01 ( 1%) sys   0.01 ( 0%) wall   
   5 kB ( 0%) ggc
 repair loop structures:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall   
   1 kB ( 0%) ggc
 TOTAL                 :  16.35             1.78            18.13             
52963 kB

real    0m18.180s
user    0m16.368s
sys     0m1.812s


--------------------------------------------
+++              VERY SLOW               +++
--------------------------------------------

time gcc -O2 -ftime-report test.cpp 

Execution times (seconds)
 name lookup           :   0.02 ( 0%) usr   0.02 ( 1%) sys   0.05 ( 0%) wall   
  83 kB ( 0%) ggc
 tree CFG cleanup      :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall   
   6 kB ( 0%) ggc
 tree VRP              :  22.39 (58%) usr   0.60 (30%) sys  22.83 (56%) wall   
4513 kB ( 8%) ggc
 tree SSA rewrite      :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall   
   2 kB ( 0%) ggc
 tree SSA incremental  :   0.51 ( 1%) usr   0.32 (16%) sys   1.03 ( 3%) wall   
   6 kB ( 0%) ggc
 tree operand scan     :   0.58 ( 1%) usr   0.34 (17%) sys   0.94 ( 2%) wall   
9894 kB (17%) ggc
 dominator optimization:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   5 kB ( 0%) ggc
 tree reassociation    :  14.77 (38%) usr   0.02 ( 1%) sys  14.80 (36%) wall   
   2 kB ( 0%) ggc
 tree aggressive DCE   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
   7 kB ( 0%) ggc
 complete unrolling    :   0.40 ( 1%) usr   0.69 (34%) sys   1.03 ( 3%) wall  
41248 kB (72%) ggc
 tree rename SSA copies:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall   
   0 kB ( 0%) ggc
 out of ssa            :   0.00 ( 0%) usr   0.01 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc
 integrated RA         :   0.00 ( 0%) usr   0.01 ( 0%) sys   0.00 ( 0%) wall   
   3 kB ( 0%) ggc
 rest of compilation   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   5 kB ( 0%) ggc
 repair loop structures:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall   
   1 kB ( 0%) ggc
 TOTAL                 :  38.78             2.02            40.81             
57496 kB

real    0m40.856s
user    0m38.808s
sys     0m2.047s


--------------------------------------------
+++  faster than O1/O2, but still slow   +++
--------------------------------------------

 time gcc -O3 -ftime-report test.cpp 

Execution times (seconds)
 ipa cp                :   0.00 ( 0%) usr   0.01 ( 1%) sys   0.00 ( 0%) wall   
   0 kB ( 0%) ggc
 name lookup           :   0.02 ( 0%) usr   0.03 ( 4%) sys   0.01 ( 0%) wall   
  83 kB ( 0%) ggc
 tree CFG cleanup      :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   5 kB ( 0%) ggc
 tree VRP              :   1.34 (15%) usr   0.07 ( 9%) sys   1.41 (14%) wall   
 929 kB ( 3%) ggc
 tree SSA rewrite      :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
   2 kB ( 0%) ggc
 tree SSA incremental  :   0.26 ( 3%) usr   0.22 (27%) sys   0.50 ( 5%) wall   
   6 kB ( 0%) ggc
 tree operand scan     :   0.19 ( 2%) usr   0.18 (22%) sys   0.31 ( 3%) wall   
4966 kB (17%) ggc
 dominator optimization:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   5 kB ( 0%) ggc
 tree reassociation    :   7.03 (77%) usr   0.02 ( 2%) sys   7.04 (71%) wall   
   2 kB ( 0%) ggc
 complete unrolling    :   0.21 ( 2%) usr   0.27 (33%) sys   0.52 ( 5%) wall  
20656 kB (73%) ggc
 out of ssa            :   0.00 ( 0%) usr   0.01 ( 1%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc
 rest of compilation   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   5 kB ( 0%) ggc
 unaccounted todo      :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc
 repair loop structures:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   1 kB ( 0%) ggc
 TOTAL                 :   9.10             0.81             9.91             
28392 kB

real    0m9.954s
user    0m9.115s
sys     0m0.840s

Reply via email to