I've extracted the computational kernel of CP2K (see PR 29975) for easier benchmarking. Together with required utility routines to turn it into a self-contained program and data to test it, I have made it available here:
http://www.pci.unizh.ch/vandevondele/tmp/extracted_collocate.tgz the summary is that (yesterday's trunk) gfortran is about 20% slower than ifort (ifort (IFORT) 9.1 20060707) on my machine. To reproduce, untar the above link, and use (after specifying the relevant FC in the Makefile) make make run a run takes a few seconds, and yields gfortran '-O3 -march=native -ffast-math -ffree-form -ftree-vectorize': # of primitives 154502 # computational kernel timings 5 Kernel time 4.612288 Kernel time 4.616289 [...] ifort -xP -O3 -free # of primitives 154502 # computational kernel timings 5 Kernel time 3.796237 Kernel time 3.800237 [...] which is in this case 21.5% slower. I haven't found any options that made gfortran much faster (in fact timings are very unsensitive to the options used), and it is unrelated to any IPO (I actually notice ifort now that is slightly faster at -O2). Since this might be relevant, timings are on: vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz stepping : 6 The computational time is ~80% due to a single routine (collocate_core in grid_fast.F), which in turn is dominated by the inner loops in the select case statement, and of those, the one over ig is (should be) dominant. For example, the loop starting at line 216 of grid_fast.F. If I look at the asm for this loop (with my best guess of what that loop might be, I have little experience), my main observation is that it contains 36 mov* instructions with intel and 51 mov* instructions with gfortran (and the same number of mulsd and addsd), which could explain the slowdown. I'll attach the respective asm. I'm of course happy to try other compile flags for gfortran, and also hints on how to rewrite the kernels in order to get better performance with gfortran would be much appreciated. -- Summary: gfortran 20% slower than ifort on CP2K computational kernel Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: jv244 at cam dot ac dot uk http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31021