AMD Athlon64 4800+ (dual core, 1MB L2 cache each, SSE, SSE2, SSE3), 4GB DDR400 with [LTO] gfortran -march=native -ffast-math -funroll-loops -flto -O3 [noLTO] gfortran -march=opteron -ffast-math -funroll-loops -ftree-loop-linear -ftree-vectorize -msse3
When running the Polyhedron Benchmark suite, the LTO version beats the non-LTO version in the geometric mean value: 23.91s [100%] vs 23.83s [99%] (http://www.polyhedron.co.uk/MFL6VW74649). However, the capacita benchmark is significantly slower, others are also a bit slower: [noLTO] [LTO] capacita 81.54 [100] 87.41 [107] mdbx 19.49 [100] 20.07 [102] rnflow 34.91 [100] 36.04 [103] test_fpu 21.66 [100] 22.40 [103] (Largest performance gain: aermod (35.14 [100], 31.46 [89]) followed by induct (36.51 [100], 35.18 [96]); the others are 2% to 0% faster with LTO.) -- Summary: 7% slower runtime with -flto than without Product: gcc Version: 4.5.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: lto AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: burnus at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41578