Re: [Wien] MPI parallelization failure for lapw1

2019-11-27 Thread Peter Blaha
When using the srun setup of WIEN2k it means that you are tightly 
integrated into your system and have to follow all your systems default 
settings.


For instance you configured CORES_PER_NODE =1; but I very much doubt 
that you cluster has only one core per node and srun will probably make 
certain assumptions about that.


Two suggestions for tests:

a) run it on only ONE node, but on all cores of this node. The 
corresponding .machines-file should have

1:machine1:YYwhere YY is the number of cores (16 or 24, ..)

b) If your queuing system setup allows to use mpirun, reconfigure WIEN2k 
(siteconfig) with the default intel+mkl option (not the srun option). It 
will then suggest to use mpirun ... for starting jobs.


Make sure that in your batch job (I assume you are using it) the proper 
modules are loaded (intel, mkl, intel-mpi).



On 11/26/19 7:07 PM, Hanning Chen wrote:

Dear WIEN2K community,

   I am a new user of WIEN2K, and just compiled it using the following 
options:


current:FOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML 
-traceback -assume buffered_io -I$(MKLROOT)/include


current:FPOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML 
-traceback -assume buffered_io -I$(MKLROOT)/include


current:OMP_SWITCH:-qopenmp

current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -lpthread 
-lm -ldl -liomp5


current:DPARALLEL:'-DParallel'

current:R_LIBS:-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core

current:FFTWROOT:/home/ec2-user/FFTW338/

current:FFTW_VERSION:FFTW3

current:FFTW_LIB:lib

current:FFTW_LIBNAME:fftw3

current:LIBXCROOT:

current:LIBXC_FORTRAN:

current:LIBXC_LIBNAME:

current:LIBXC_LIBDNAME:

current:SCALAPACKROOT:$(MKLROOT)/lib/

current:SCALAPACK_LIBNAME:mkl_scalapack_lp64

current:BLACSROOT:$(MKLROOT)/lib/

current:BLACS_LIBNAME:mkl_blacs_intelmpi_lp64

current:ELPAROOT:

current:ELPA_VERSION:

current:ELPA_LIB:

current:ELPA_LIBNAME:

current:MPIRUN:srun -K -N_nodes_ -n_NP_ -r_offset_ _PINNING_ _EXEC_

current:CORES_PER_NODE:1

current:MKL_TARGET_ARCH:intel64

setenv TASKSET "no"

if ( ! $?USE_REMOTE ) setenv USE_REMOTE 1

if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 0

setenv WIEN_GRANULARITY 1

setenv DELAY 0.1

setenv SLEEPY 1

setenv WIEN_MPIRUN "srun -K -N_nodes_ -n_NP_ -r_offset_ _PINNING_ _EXEC_"

if ( ! $?CORES_PER_NODE) setenv CORES_PER_NODE1

# if ( ! $?PINNING_COMMAND) setenv PINNING_COMMAND "--cpu_bind=map_cpu:"

# if ( ! $?PINNING_LIST ) setenv PINNING_LIST 
"0,8,1,9,2,10,3,11,4,12,5,13,6,14,7,15"


   Then, I ran a k-point parallelization with the .machines file below, 
and it worked perfectly:


     granularity:1

1:machine1

2:machine2

extrafine:1

   But, when I tried to parallelize it over MPI with the new .machines file:

   granularity:1

   1:machine1 machine2

extrafine:1

lapw1 crashed with the error message as

**   Error in Parallel LAPW1

**.  LAPW1 STOPPED

** check ERROR FILES!

   SEP INFO = -21

‘SECLR4’. -SYEVX (Scalapack/LAPACK) failed

Although I understand that the 21st parameter of the SYEVX subroutine is 
incorrect, I am not sure how to fix the problem. I actually have linked 
WIEN2K with NETLIB’s SCALAPACK/LAPACK/BLAS instead of MKL. But the same 
error appeared again.


Please help me out. Thanks.

Hanning Chen, Ph.D.

Department of Chemistry

American University

Washington, DC 20016


___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html



--

  P.Blaha
--
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300 FAX: +43-1-58801-165982
Email: bl...@theochem.tuwien.ac.atWIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at/TC_Blaha
--
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] MPI parallelization failure for lapw1

2019-11-26 Thread Laurence Marks
A guess: your srun is setup to use openmpi or something else, not intel
impi which is what you compiled for. Check what you have loaded, e.g. use
"which mpirun".

N.B. testing using lapw0 is simpler.

On Tue, Nov 26, 2019 at 12:07 PM Hanning Chen  wrote:

> Dear WIEN2K community,
>
>
>
>   I am a new user of WIEN2K, and just compiled it using the following
> options:
>
> current:FOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML
> -traceback -assume buffered_io -I$(MKLROOT)/include
>
> current:FPOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML
> -traceback -assume buffered_io -I$(MKLROOT)/include
>
> current:OMP_SWITCH:-qopenmp
>
> current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -lpthread -lm
> -ldl -liomp5
>
> current:DPARALLEL:'-DParallel'
>
> current:R_LIBS:-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core
>
> current:FFTWROOT:/home/ec2-user/FFTW338/
>
> current:FFTW_VERSION:FFTW3
>
> current:FFTW_LIB:lib
>
> current:FFTW_LIBNAME:fftw3
>
> current:LIBXCROOT:
>
> current:LIBXC_FORTRAN:
>
> current:LIBXC_LIBNAME:
>
> current:LIBXC_LIBDNAME:
>
> current:SCALAPACKROOT:$(MKLROOT)/lib/
>
> current:SCALAPACK_LIBNAME:mkl_scalapack_lp64
>
> current:BLACSROOT:$(MKLROOT)/lib/
>
> current:BLACS_LIBNAME:mkl_blacs_intelmpi_lp64
>
> current:ELPAROOT:
>
> current:ELPA_VERSION:
>
> current:ELPA_LIB:
>
> current:ELPA_LIBNAME:
>
> current:MPIRUN:srun -K -N_nodes_ -n_NP_ -r_offset_ _PINNING_ _EXEC_
>
> current:CORES_PER_NODE:1
>
> current:MKL_TARGET_ARCH:intel64
>
>
>
> setenv TASKSET "no"
>
> if ( ! $?USE_REMOTE ) setenv USE_REMOTE 1
>
> if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 0
>
> setenv WIEN_GRANULARITY 1
>
> setenv DELAY 0.1
>
> setenv SLEEPY 1
>
> setenv WIEN_MPIRUN "srun -K -N_nodes_ -n_NP_ -r_offset_ _PINNING_ _EXEC_"
>
> if ( ! $?CORES_PER_NODE) setenv CORES_PER_NODE  1
>
> # if ( ! $?PINNING_COMMAND) setenv PINNING_COMMAND "--cpu_bind=map_cpu:"
>
> # if ( ! $?PINNING_LIST ) setenv PINNING_LIST
> "0,8,1,9,2,10,3,11,4,12,5,13,6,14,7,15"
>
>
>
>
>
>   Then, I ran a k-point parallelization with the .machines file below, and
> it worked perfectly:
>
>
>
> granularity:1
>
> 1:machine1
>
> 2:machine2
>
> extrafine:1
>
>
>
>   But, when I tried to parallelize it over MPI with the new .machines file:
>
>
>
>   granularity:1
>
>   1:machine1 machine2
>
> extrafine:1
>
>
>
> lapw1 crashed with the error message as
>
>
>
> **   Error in Parallel LAPW1
>
> **.  LAPW1 STOPPED
>
> ** check ERROR FILES!
>
>   SEP INFO = -21
>
> ‘SECLR4’. -SYEVX (Scalapack/LAPACK) failed
>
>
>
> Although I understand that the 21st parameter of the SYEVX subroutine is
> incorrect, I am not sure how to fix the problem. I actually have linked
> WIEN2K with NETLIB’s SCALAPACK/LAPACK/BLAS instead of MKL. But the same
> error appeared again.
>
>
>
> Please help me out. Thanks.
>
>
>
> Hanning Chen, Ph.D.
>
> Department of Chemistry
>
> American University
>
> Washington, DC 20016
>
>
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__zeus.theochem.tuwien.ac.at_mailman_listinfo_wien=DwICAg=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0=WhQIVizHlskm1qr_hjJ-ydUa1nNmadEDmF7JYzComVg=EqMM1kdY2BwNqIwqgV0Ta4vW1LxanVM7jqCZyvYg3dw=
> SEARCH the MAILING-LIST at:
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mail-2Darchive.com_wien-40zeus.theochem.tuwien.ac.at_index.html=DwICAg=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0=WhQIVizHlskm1qr_hjJ-ydUa1nNmadEDmF7JYzComVg=4y6WQbCSpPScNbNdXbfuY796kun3cl9f2BmVm8NJ3xI=
>


-- 
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu
Corrosion in 4D: www.numis.northwestern.edu/MURI
Co-Editor, Acta Cryst A
"Research is to see what everybody else has seen, and to think what nobody
else has thought"
Albert Szent-Gyorgi
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


[Wien] MPI parallelization failure for lapw1

2019-11-26 Thread Hanning Chen
Dear WIEN2K community,

  I am a new user of WIEN2K, and just compiled it using the following options:

current:FOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback 
-assume buffered_io -I$(MKLROOT)/include

current:FPOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback 
-assume buffered_io -I$(MKLROOT)/include

current:OMP_SWITCH:-qopenmp

current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -lpthread -lm -ldl 
-liomp5

current:DPARALLEL:'-DParallel'

current:R_LIBS:-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core

current:FFTWROOT:/home/ec2-user/FFTW338/

current:FFTW_VERSION:FFTW3

current:FFTW_LIB:lib

current:FFTW_LIBNAME:fftw3

current:LIBXCROOT:

current:LIBXC_FORTRAN:

current:LIBXC_LIBNAME:

current:LIBXC_LIBDNAME:

current:SCALAPACKROOT:$(MKLROOT)/lib/

current:SCALAPACK_LIBNAME:mkl_scalapack_lp64

current:BLACSROOT:$(MKLROOT)/lib/

current:BLACS_LIBNAME:mkl_blacs_intelmpi_lp64

current:ELPAROOT:

current:ELPA_VERSION:

current:ELPA_LIB:

current:ELPA_LIBNAME:

current:MPIRUN:srun -K -N_nodes_ -n_NP_ -r_offset_ _PINNING_ _EXEC_

current:CORES_PER_NODE:1

current:MKL_TARGET_ARCH:intel64


setenv TASKSET "no"

if ( ! $?USE_REMOTE ) setenv USE_REMOTE 1

if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 0

setenv WIEN_GRANULARITY 1

setenv DELAY 0.1

setenv SLEEPY 1

setenv WIEN_MPIRUN "srun -K -N_nodes_ -n_NP_ -r_offset_ _PINNING_ _EXEC_"

if ( ! $?CORES_PER_NODE) setenv CORES_PER_NODE  1

# if ( ! $?PINNING_COMMAND) setenv PINNING_COMMAND "--cpu_bind=map_cpu:"

# if ( ! $?PINNING_LIST ) setenv PINNING_LIST 
"0,8,1,9,2,10,3,11,4,12,5,13,6,14,7,15"


  Then, I ran a k-point parallelization with the .machines file below, and it 
worked perfectly:

granularity:1
1:machine1
2:machine2
extrafine:1

  But, when I tried to parallelize it over MPI with the new .machines file:

  granularity:1
  1:machine1 machine2
extrafine:1

lapw1 crashed with the error message as

**   Error in Parallel LAPW1
**.  LAPW1 STOPPED
** check ERROR FILES!
  SEP INFO = -21
‘SECLR4’. -SYEVX (Scalapack/LAPACK) failed

Although I understand that the 21st parameter of the SYEVX subroutine is 
incorrect, I am not sure how to fix the problem. I actually have linked WIEN2K 
with NETLIB’s SCALAPACK/LAPACK/BLAS instead of MKL. But the same error appeared 
again.

Please help me out. Thanks.

Hanning Chen, Ph.D.
Department of Chemistry
American University
Washington, DC 20016

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html