Hi, I have noted recently that I am able to obtain faster binaries of pw.x using the the OpenMP paralellism implemented in the Intel MKL libraries of version 10.xxx, than using MPICH, in the Intel cpus. Previously I had always gotten better performance using MPI. I would like to know of other experience on how to make the machines faster. Let me explain in more details.
Compiling using MPI means using mpif90 as linker and compiler, linking against mkl_ia32 or mkl_em64t, and using link flags -i-static -openmp. This is just the what appears in the make.sys after running configure in version 4cvs, At runtime, I set export OMP_NUM_THREADS=1 export MKL_NUM_THREADS=1 and run using mpiexec -n $NCPUs pw.x <input >output where NCPUs is the number of cores available in the system. The second choice is ./configure --disable-parallel and at runtime export OMP_NUM_THREADS=$NCPU export MKL_NUM_THREADS=$NCPU and run using pw.x <input >output I have tested it in Quadcores (NCPU=4) and with an old Dual Xeon B.C. (before cores) (NCPU=2). Before April 2007, the first choice had always workes faster. After that, when I came to use the MKL 10.xxx, the second choice is working faster. I have found no significant difference between version 3.2.3 and 4cvs. A special comment is for the FFT library. The MKL has a wrapper to the FFTW, that must be compiled after instalation (it is very easy). This creates additional libraries named like libfftw3xf_intel.a and libfftw2xf_intel.a This allows improves the performance in the second choice, specially with libfftw3xf_intel.a. Using MPI, libfftw2xf_intel.a is as fast as using the FFTW source distributed with espresso, i.e., there is no gain in using libfftw2xf_intel.a. With libfftw3xf_intel.a and MPI, I have never been able to run pw.x succesfully, it just aborts. I would like to hear of your experiences. Best regards Eduardo Menendez University of Chile -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.democritos.it/pipermail/pw_forum/attachments/20080506/ca00a740/attachment.htm