Dear Fabricio, I reckon there is some inconsistency in the results you are obtaining. The AMD 6380 is a 16-core model. I'm wondering how do you map the process affinity while running on 8 cores. Without controlling such mapping you can obtain substantial performance variation at each execution. Indeed, cache memory and FPU are shared among a given set of cores and the increasing concurrency on shared resources goes along with a degradation of the performances. The AMD 6380 also supports the AVX instruction set extension for vector operations at 256bit. Does your O.S. support that too? Compile a simple source with -mavx and see whether you can run it. Or check if the "avx" flag is present in your /proc/cpuinfo.
For my experience about benchmarking QE on the same CPU system, the combination of the Intel compiler + MKL turned out to be always the best option. Regards, Ivan On 11/12/2013 19:52, Fabricio Cannini wrote: > Em 11-12-2013 15:51, Paolo Giannozzi escreveu: >> > On Tue, 2013-12-10 at 19:49 -0200, Fabricio Cannini wrote: >>> >> Em 10-12-2013 18:34, Paolo Giannozzi escreveu: >>>> >>> First of all you should verify if multi-threading libraries >>>> >>> are conflicting with MPI parallelization. >>> >> >>> >> Yes, i did look into it already. > Hi there > > > So, what else can I look into ? > I did more tests, on the same Opteron 6380 machine, using the same > binaries, but now using the "DEISA medium benchmark" and the results > were interesting. > > http://qe-forge.org/gf/project/q-e/frs/?action=FrsReleaseView&release_id=45 > > > ifort 13.2 + mkl 11.0 / 8 cores = 1h8m > gfortran 4.6 + openblas 0.2.8 / 8 cores = 46m57.62s > > > > This is making me even more suspicious of intel compiler being the problem. > > TIA, > Fabricio