I've extracted the computational kernel of CP2K (see PR 29975) for easier
benchmarking. Together with required utility routines to turn it into a
self-contained program and data to test it, I have made it available here:

http://www.pci.unizh.ch/vandevondele/tmp/extracted_collocate.tgz

the summary is that (yesterday's trunk) gfortran is about 20% slower than ifort
(ifort (IFORT) 9.1 20060707) on my machine. To reproduce, untar the above link,
and use (after specifying the relevant FC in the Makefile)
make
make run

a run takes a few seconds, and yields 
gfortran '-O3 -march=native -ffast-math -ffree-form -ftree-vectorize':
 # of primitives       154502
 # computational kernel timings            5
 Kernel time   4.612288
 Kernel time   4.616289
 [...]
ifort  -xP -O3 -free
 # of primitives       154502
 # computational kernel timings            5
 Kernel time   3.796237
 Kernel time   3.800237
[...]

which is in this case 21.5% slower. I haven't found any options that made
gfortran much faster (in fact timings are very unsensitive to the options
used), and it is unrelated to any IPO (I actually notice ifort now that is
slightly faster at -O2). Since this might be relevant, timings are on:

vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz
stepping        : 6

The computational time is ~80% due to a single routine (collocate_core in
grid_fast.F), which in turn is dominated by the inner loops in the select case
statement, and of those, the one over ig is (should be) dominant. For example,
the loop starting at line 216 of grid_fast.F. If I look at the asm for this
loop (with my best guess of what that loop might be, I have little experience),
my main observation is that it contains 36 mov* instructions with intel and 51
mov* instructions with gfortran (and the same number of mulsd and addsd), which
could explain the slowdown. I'll attach the respective asm.

I'm of course happy to try other compile flags for gfortran, and also hints on
how to rewrite the kernels in order to get better performance with  gfortran
would be much appreciated.


-- 
           Summary: gfortran 20% slower than ifort on CP2K computational
                    kernel
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: jv244 at cam dot ac dot uk


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31021

Reply via email to