Re: [Wien] MPI parallelization failure for lapw1
When using the srun setup of WIEN2k it means that you are tightly integrated into your system and have to follow all your systems default settings. For instance you configured CORES_PER_NODE =1; but I very much doubt that you cluster has only one core per node and srun will probably make certain assumptions about that. Two suggestions for tests: a) run it on only ONE node, but on all cores of this node. The corresponding .machines-file should have 1:machine1:YYwhere YY is the number of cores (16 or 24, ..) b) If your queuing system setup allows to use mpirun, reconfigure WIEN2k (siteconfig) with the default intel+mkl option (not the srun option). It will then suggest to use mpirun ... for starting jobs. Make sure that in your batch job (I assume you are using it) the proper modules are loaded (intel, mkl, intel-mpi). On 11/26/19 7:07 PM, Hanning Chen wrote: Dear WIEN2K community, I am a new user of WIEN2K, and just compiled it using the following options: current:FOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback -assume buffered_io -I$(MKLROOT)/include current:FPOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback -assume buffered_io -I$(MKLROOT)/include current:OMP_SWITCH:-qopenmp current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -lpthread -lm -ldl -liomp5 current:DPARALLEL:'-DParallel' current:R_LIBS:-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core current:FFTWROOT:/home/ec2-user/FFTW338/ current:FFTW_VERSION:FFTW3 current:FFTW_LIB:lib current:FFTW_LIBNAME:fftw3 current:LIBXCROOT: current:LIBXC_FORTRAN: current:LIBXC_LIBNAME: current:LIBXC_LIBDNAME: current:SCALAPACKROOT:$(MKLROOT)/lib/ current:SCALAPACK_LIBNAME:mkl_scalapack_lp64 current:BLACSROOT:$(MKLROOT)/lib/ current:BLACS_LIBNAME:mkl_blacs_intelmpi_lp64 current:ELPAROOT: current:ELPA_VERSION: current:ELPA_LIB: current:ELPA_LIBNAME: current:MPIRUN:srun -K -N_nodes_ -n_NP_ -r_offset_ _PINNING_ _EXEC_ current:CORES_PER_NODE:1 current:MKL_TARGET_ARCH:intel64 setenv TASKSET "no" if ( ! $?USE_REMOTE ) setenv USE_REMOTE 1 if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 0 setenv WIEN_GRANULARITY 1 setenv DELAY 0.1 setenv SLEEPY 1 setenv WIEN_MPIRUN "srun -K -N_nodes_ -n_NP_ -r_offset_ _PINNING_ _EXEC_" if ( ! $?CORES_PER_NODE) setenv CORES_PER_NODE1 # if ( ! $?PINNING_COMMAND) setenv PINNING_COMMAND "--cpu_bind=map_cpu:" # if ( ! $?PINNING_LIST ) setenv PINNING_LIST "0,8,1,9,2,10,3,11,4,12,5,13,6,14,7,15" Then, I ran a k-point parallelization with the .machines file below, and it worked perfectly: granularity:1 1:machine1 2:machine2 extrafine:1 But, when I tried to parallelize it over MPI with the new .machines file: granularity:1 1:machine1 machine2 extrafine:1 lapw1 crashed with the error message as ** Error in Parallel LAPW1 **. LAPW1 STOPPED ** check ERROR FILES! SEP INFO = -21 ‘SECLR4’. -SYEVX (Scalapack/LAPACK) failed Although I understand that the 21st parameter of the SYEVX subroutine is incorrect, I am not sure how to fix the problem. I actually have linked WIEN2K with NETLIB’s SCALAPACK/LAPACK/BLAS instead of MKL. But the same error appeared again. Please help me out. Thanks. Hanning Chen, Ph.D. Department of Chemistry American University Washington, DC 20016 ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html -- P.Blaha -- Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna Phone: +43-1-58801-165300 FAX: +43-1-58801-165982 Email: bl...@theochem.tuwien.ac.atWIEN2k: http://www.wien2k.at WWW: http://www.imc.tuwien.ac.at/TC_Blaha -- ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
Re: [Wien] MPI parallelization failure for lapw1
A guess: your srun is setup to use openmpi or something else, not intel impi which is what you compiled for. Check what you have loaded, e.g. use "which mpirun". N.B. testing using lapw0 is simpler. On Tue, Nov 26, 2019 at 12:07 PM Hanning Chen wrote: > Dear WIEN2K community, > > > > I am a new user of WIEN2K, and just compiled it using the following > options: > > current:FOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML > -traceback -assume buffered_io -I$(MKLROOT)/include > > current:FPOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML > -traceback -assume buffered_io -I$(MKLROOT)/include > > current:OMP_SWITCH:-qopenmp > > current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -lpthread -lm > -ldl -liomp5 > > current:DPARALLEL:'-DParallel' > > current:R_LIBS:-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core > > current:FFTWROOT:/home/ec2-user/FFTW338/ > > current:FFTW_VERSION:FFTW3 > > current:FFTW_LIB:lib > > current:FFTW_LIBNAME:fftw3 > > current:LIBXCROOT: > > current:LIBXC_FORTRAN: > > current:LIBXC_LIBNAME: > > current:LIBXC_LIBDNAME: > > current:SCALAPACKROOT:$(MKLROOT)/lib/ > > current:SCALAPACK_LIBNAME:mkl_scalapack_lp64 > > current:BLACSROOT:$(MKLROOT)/lib/ > > current:BLACS_LIBNAME:mkl_blacs_intelmpi_lp64 > > current:ELPAROOT: > > current:ELPA_VERSION: > > current:ELPA_LIB: > > current:ELPA_LIBNAME: > > current:MPIRUN:srun -K -N_nodes_ -n_NP_ -r_offset_ _PINNING_ _EXEC_ > > current:CORES_PER_NODE:1 > > current:MKL_TARGET_ARCH:intel64 > > > > setenv TASKSET "no" > > if ( ! $?USE_REMOTE ) setenv USE_REMOTE 1 > > if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 0 > > setenv WIEN_GRANULARITY 1 > > setenv DELAY 0.1 > > setenv SLEEPY 1 > > setenv WIEN_MPIRUN "srun -K -N_nodes_ -n_NP_ -r_offset_ _PINNING_ _EXEC_" > > if ( ! $?CORES_PER_NODE) setenv CORES_PER_NODE 1 > > # if ( ! $?PINNING_COMMAND) setenv PINNING_COMMAND "--cpu_bind=map_cpu:" > > # if ( ! $?PINNING_LIST ) setenv PINNING_LIST > "0,8,1,9,2,10,3,11,4,12,5,13,6,14,7,15" > > > > > > Then, I ran a k-point parallelization with the .machines file below, and > it worked perfectly: > > > > granularity:1 > > 1:machine1 > > 2:machine2 > > extrafine:1 > > > > But, when I tried to parallelize it over MPI with the new .machines file: > > > > granularity:1 > > 1:machine1 machine2 > > extrafine:1 > > > > lapw1 crashed with the error message as > > > > ** Error in Parallel LAPW1 > > **. LAPW1 STOPPED > > ** check ERROR FILES! > > SEP INFO = -21 > > ‘SECLR4’. -SYEVX (Scalapack/LAPACK) failed > > > > Although I understand that the 21st parameter of the SYEVX subroutine is > incorrect, I am not sure how to fix the problem. I actually have linked > WIEN2K with NETLIB’s SCALAPACK/LAPACK/BLAS instead of MKL. But the same > error appeared again. > > > > Please help me out. Thanks. > > > > Hanning Chen, Ph.D. > > Department of Chemistry > > American University > > Washington, DC 20016 > > > ___ > Wien mailing list > Wien@zeus.theochem.tuwien.ac.at > > https://urldefense.proofpoint.com/v2/url?u=http-3A__zeus.theochem.tuwien.ac.at_mailman_listinfo_wien&d=DwICAg&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=WhQIVizHlskm1qr_hjJ-ydUa1nNmadEDmF7JYzComVg&s=EqMM1kdY2BwNqIwqgV0Ta4vW1LxanVM7jqCZyvYg3dw&e= > SEARCH the MAILING-LIST at: > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mail-2Darchive.com_wien-40zeus.theochem.tuwien.ac.at_index.html&d=DwICAg&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=WhQIVizHlskm1qr_hjJ-ydUa1nNmadEDmF7JYzComVg&s=4y6WQbCSpPScNbNdXbfuY796kun3cl9f2BmVm8NJ3xI&e= > -- Professor Laurence Marks Department of Materials Science and Engineering Northwestern University www.numis.northwestern.edu Corrosion in 4D: www.numis.northwestern.edu/MURI Co-Editor, Acta Cryst A "Research is to see what everybody else has seen, and to think what nobody else has thought" Albert Szent-Gyorgi ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
[Wien] MPI parallelization failure for lapw1
Dear WIEN2K community, I am a new user of WIEN2K, and just compiled it using the following options: current:FOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback -assume buffered_io -I$(MKLROOT)/include current:FPOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback -assume buffered_io -I$(MKLROOT)/include current:OMP_SWITCH:-qopenmp current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -lpthread -lm -ldl -liomp5 current:DPARALLEL:'-DParallel' current:R_LIBS:-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core current:FFTWROOT:/home/ec2-user/FFTW338/ current:FFTW_VERSION:FFTW3 current:FFTW_LIB:lib current:FFTW_LIBNAME:fftw3 current:LIBXCROOT: current:LIBXC_FORTRAN: current:LIBXC_LIBNAME: current:LIBXC_LIBDNAME: current:SCALAPACKROOT:$(MKLROOT)/lib/ current:SCALAPACK_LIBNAME:mkl_scalapack_lp64 current:BLACSROOT:$(MKLROOT)/lib/ current:BLACS_LIBNAME:mkl_blacs_intelmpi_lp64 current:ELPAROOT: current:ELPA_VERSION: current:ELPA_LIB: current:ELPA_LIBNAME: current:MPIRUN:srun -K -N_nodes_ -n_NP_ -r_offset_ _PINNING_ _EXEC_ current:CORES_PER_NODE:1 current:MKL_TARGET_ARCH:intel64 setenv TASKSET "no" if ( ! $?USE_REMOTE ) setenv USE_REMOTE 1 if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 0 setenv WIEN_GRANULARITY 1 setenv DELAY 0.1 setenv SLEEPY 1 setenv WIEN_MPIRUN "srun -K -N_nodes_ -n_NP_ -r_offset_ _PINNING_ _EXEC_" if ( ! $?CORES_PER_NODE) setenv CORES_PER_NODE 1 # if ( ! $?PINNING_COMMAND) setenv PINNING_COMMAND "--cpu_bind=map_cpu:" # if ( ! $?PINNING_LIST ) setenv PINNING_LIST "0,8,1,9,2,10,3,11,4,12,5,13,6,14,7,15" Then, I ran a k-point parallelization with the .machines file below, and it worked perfectly: granularity:1 1:machine1 2:machine2 extrafine:1 But, when I tried to parallelize it over MPI with the new .machines file: granularity:1 1:machine1 machine2 extrafine:1 lapw1 crashed with the error message as ** Error in Parallel LAPW1 **. LAPW1 STOPPED ** check ERROR FILES! SEP INFO = -21 ‘SECLR4’. -SYEVX (Scalapack/LAPACK) failed Although I understand that the 21st parameter of the SYEVX subroutine is incorrect, I am not sure how to fix the problem. I actually have linked WIEN2K with NETLIB’s SCALAPACK/LAPACK/BLAS instead of MKL. But the same error appeared again. Please help me out. Thanks. Hanning Chen, Ph.D. Department of Chemistry American University Washington, DC 20016 ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html