Re: [Wien] MPI parallelization failure for lapw1
When using the srun setup of WIEN2k it means that you are tightly integrated into your system and have to follow all your systems default settings. For instance you configured CORES_PER_NODE =1; but I very much doubt that you cluster has only one core per node and srun will probably make certain assumptions about that. Two suggestions for tests: a) run it on only ONE node, but on all cores of this node. The corresponding .machines-file should have 1:machine1:YYwhere YY is the number of cores (16 or 24, ..) b) If your queuing system setup allows to use mpirun, reconfigure WIEN2k (siteconfig) with the default intel+mkl option (not the srun option). It will then suggest to use mpirun ... for starting jobs. Make sure that in your batch job (I assume you are using it) the proper modules are loaded (intel, mkl, intel-mpi). On 11/26/19 7:07 PM, Hanning Chen wrote: Dear WIEN2K community, I am a new user of WIEN2K, and just compiled it using the following options: current:FOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback -assume buffered_io -I$(MKLROOT)/include current:FPOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback -assume buffered_io -I$(MKLROOT)/include current:OMP_SWITCH:-qopenmp current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -lpthread -lm -ldl -liomp5 current:DPARALLEL:'-DParallel' current:R_LIBS:-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core current:FFTWROOT:/home/ec2-user/FFTW338/ current:FFTW_VERSION:FFTW3 current:FFTW_LIB:lib current:FFTW_LIBNAME:fftw3 current:LIBXCROOT: current:LIBXC_FORTRAN: current:LIBXC_LIBNAME: current:LIBXC_LIBDNAME: current:SCALAPACKROOT:$(MKLROOT)/lib/ current:SCALAPACK_LIBNAME:mkl_scalapack_lp64 current:BLACSROOT:$(MKLROOT)/lib/ current:BLACS_LIBNAME:mkl_blacs_intelmpi_lp64 current:ELPAROOT: current:ELPA_VERSION: current:ELPA_LIB: current:ELPA_LIBNAME: current:MPIRUN:srun -K -N_nodes_ -n_NP_ -r_offset_ _PINNING_ _EXEC_ current:CORES_PER_NODE:1 current:MKL_TARGET_ARCH:intel64 setenv TASKSET "no" if ( ! $?USE_REMOTE ) setenv USE_REMOTE 1 if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 0 setenv WIEN_GRANULARITY 1 setenv DELAY 0.1 setenv SLEEPY 1 setenv WIEN_MPIRUN "srun -K -N_nodes_ -n_NP_ -r_offset_ _PINNING_ _EXEC_" if ( ! $?CORES_PER_NODE) setenv CORES_PER_NODE1 # if ( ! $?PINNING_COMMAND) setenv PINNING_COMMAND "--cpu_bind=map_cpu:" # if ( ! $?PINNING_LIST ) setenv PINNING_LIST "0,8,1,9,2,10,3,11,4,12,5,13,6,14,7,15" Then, I ran a k-point parallelization with the .machines file below, and it worked perfectly: granularity:1 1:machine1 2:machine2 extrafine:1 But, when I tried to parallelize it over MPI with the new .machines file: granularity:1 1:machine1 machine2 extrafine:1 lapw1 crashed with the error message as ** Error in Parallel LAPW1 **. LAPW1 STOPPED ** check ERROR FILES! SEP INFO = -21 ‘SECLR4’. -SYEVX (Scalapack/LAPACK) failed Although I understand that the 21st parameter of the SYEVX subroutine is incorrect, I am not sure how to fix the problem. I actually have linked WIEN2K with NETLIB’s SCALAPACK/LAPACK/BLAS instead of MKL. But the same error appeared again. Please help me out. Thanks. Hanning Chen, Ph.D. Department of Chemistry American University Washington, DC 20016 ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html -- P.Blaha -- Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna Phone: +43-1-58801-165300 FAX: +43-1-58801-165982 Email: bl...@theochem.tuwien.ac.atWIEN2k: http://www.wien2k.at WWW: http://www.imc.tuwien.ac.at/TC_Blaha -- ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
Re: [Wien] lapw2 crashed error
Thank you all ! The problem is solved by running the same command with using additional flag "-NI". Dear Gavin Sir, 'sed' is already installed, still it showed *"sed: Command not found".* On Tue, Nov 26, 2019 at 5:53 AM Peeyush kumar kamlesh < peeyush.physik@gmail.com> wrote: > Sir, > I am using single node of four cores. Mu machine file is below: > __ > 100:localhost > 100:localhost > 100:localhost > 100:localhost > granularity:1 > extrafine:1 > omp_global:4 > > > On Mon, Nov 25, 2019 at 10:06 PM Peeyush kumar kamlesh < > peeyush.physik@gmail.com> wrote: > >> Hello Wien2k user, >> Greetings! >> I am running scf cycle with hf potential. When I run the command >> "run_lapw -hf -p", then after successful completion of 7 cycles, I found >> error in cycle 8. In terminal it is represented as follows: >> >> in cycle 8ETEST: .000491915000 CTEST: .0035867 >> hup: Command not found. >> LAPW0 END >> LAPW0 END >> LAPW1 END >> LAPW1 END >> LAPW1 END >> LAPW1 END >> sed: Command not found. >> LAPW2 - Error. Check file lapw2.error >> cp: cannot stat '.in.tmp': No such file or directory >> >> > stop error >> >> -- >> >> When I checked lapw2.error file I found following details: >> _ >> 'LAPW2' - can't open unit: 10 >> >> 'LAPW2' -filename: /case.vector >> >> 'LAPW2' - status: unknown form: unformatted >> >> ** testerror: Error in Parallel LAPW2 >> >> --- >> >> I also tried to search and understand the previous threads, but I was >> unable to do so. Kindly suggest me why this error is appearing and how can >> it be resolved? >> >> Thanks and Regards >> Peeyush Kumar Kamlesh >> > ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
Re: [Wien] Spin-orbit coupling SCF not converging
As Peter has already responded, TETRA is not appropriate for 2D structures. You got away with it without SOC, as that exactly solves the eigenproblem for RKMAX etc. However, SOC uses the finite set of lapw1 eigenvectors. There will therefore be something similar to telegraph noise as which eigenvectors are used changes with iterations. This leads to ill-conditioning which is amplified by the use of TETRA which I suspect can also introduce something similar to telegraph noise. The total conditioning of a particular case is a product of the condition numbers of different parts. As nband--> infinity the SOC calculation should be better conditioned. Small values of Beta and the total step are clear indicators that the problem is ill-conditioned. I would guesstimate that values smaller than 0.5 are an indicator of ill-conditioning if they persist; an occasional value is OK. Small GREED may also be an indicator of problems, particularly near convergence. However, GREED is more complicated and can legitimately be small far from the fixed-point solution. Wikipedia seems to have a reasonable page, https://en.wikipedia.org/wiki/Condition_number . I have never tried to calculate these for different parts of Wien2k, my understanding is somewhat empirical experience. Peter has more experience so has a better feel, although I am not sure he has ever tried to actually calculate the numbers. (A good project for someone.) On Wed, Nov 27, 2019 at 9:54 AM Luigi Maduro - TNW wrote: > I am using WIEN2k_19.1 and Using grep MULTISECANT *.scfm I get the > following: > > > > * MULTISECANT MIXING VER9 RELEASE 10.4.0 > > > > For the input of the SCF calculation with SOC the output of a SCF > calculation without SOC was used (with TETRA). For the SCF calculation with > SOC the following criteria were used: RKmax = 7.0, 21 k-points in IBZ, > charge convergence of 0.001e and energy convergence of 0.0001 Ry. These are > the same criteria as the original SCF calculation without SOC. > > > If I understood correctly, then looking at the size GREED and Beta should > be sufficient for determining if the calculation has converged, if so then > how small is too small for these parameters? > > > > > > *From:* Laurence Marks [mailto:laurence.ma...@gmail.com] > *Sent:* dinsdag 26 november 2019 18:19 > *To:* A Mailing list for WIEN2k users > *Cc:* Luigi Maduro - TNW > *Subject:* Re: [Wien] Spin-orbit coupling SCF not converging > > > > What version of Wien2k are you using, particularly the mixer (grep > MULTISECANT *.scfm)? > > > > Your calculations are "starving to death". The step size is so small (both > the GREED and Beta) that it is bouncing around on numerical noise. It may > well have already converged to the limits of the noise/conditioning in your > calculation, which is linked to RKMAX and the k-mesh and also (Peter's > response) from TETRA. The iterative diagonalizations also introduce some > noise. > > > > For the specific case I would remove the prior history (rm *.broyd*) and > continue it. > > > > On Tue, Nov 26, 2019 at 11:01 AM Luigi Maduro - TNW > wrote: > > Hello Laurence, > > > > This is the result I get when using Check-mixing (this is with the > thinnest slab model, and using SCALA with Emax = 10.0 Ry) > > > > > > :DIRQ : |MSR1|= 1.472E-06 |PRATT|= 3.852E-03 ANGLE= 79.0 DEGREES > > :DIRT : |MSR1|= 1.516E-06 |PRATT|= 4.100E-03 ANGLE= 79.2 DEGREES > > :MIX : MSE1 REGULARIZATION: 9.15E-04 GREED: 0.00200 Newton 1.00 > 0.0004 > > :ENE : ** TOTAL ENERGY IN Ry = -58196.30065156 > > :DIS : CHARGE DISTANCE ( 0.0046033 for atom7 spin 1) > 0.0009683 > > :PLANE: PW TOTAL 6.0026 DISTAN 3.20E-03 5.33E-02 % > > :CHARG: CLM/ATOM 74.0417 DISTAN 5.58E-04 7.54E-04 % > > :RANK : ACTIVE 14.44/16 = 90.26 % ; YY RANK 14.44/16 = 90.25 % > > :DIRM : MEMORY 16/12 SCALE 1.000 RED 2.57 PRED 0.95 NEXT 0.95 BETA > 0.05 > > :DIRP : |MSR1|= 1.024E-06 |PRATT|= 3.198E-03 ANGLE= 102.9 DEGREES > > :DIRQ : |MSR1|= 4.046E-06 |PRATT|= 1.005E-02 ANGLE= 82.5 DEGREES > > :DIRT : |MSR1|= 4.174E-06 |PRATT|= 1.054E-02 ANGLE= 84.0 DEGREES > > :MIX : MSE1 REGULARIZATION: 1.33E-03 GREED: 0.00500 Newton 1.00 > 0.0004 > > :ENE : ** TOTAL ENERGY IN Ry = -58196.30346394 > > :DIS : CHARGE DISTANCE ( 0.0012891 for atom8 spin 1) > 0.0002073 > > :PLANE: PW TOTAL 6.0026 DISTAN 1.51E-03 2.51E-02 % > > :CHARG: CLM/ATOM 74.0417 DISTAN 2.13E-04 2.88E-04 % > > :RANK : ACTIVE 15.31/16 = 95.68 % ; YY RANK 15.31/16 = 95.72 % > > :DIRM : MEMORY 16/12 SCALE 1.000 RED 0.39 PRED 0.95 NEXT 0.95 BETA > 0.05 > > :DIRP : |MSR1|= 3.562E-07 |PRATT|= 1.508E-03 ANGLE= 72.0 DEGREES > > :DIRQ : |MSR1|= 1.479E-06 |PRATT|= 3.835E-03 ANGLE= 80.1 DEGREES > > :DIRT : |MSR1|= 1.522E-06 |PRATT|= 4.121E-03 ANGLE= 79.5 DEGREES > > :MIX : MSE1 REGULARIZATION: 9.97E-04 GREED: 0.00200 Newton 1.00 > 0.0004 > > :ENE : ** TOTAL ENERGY IN Ry = -58196.2990
Re: [Wien] lapw2 crashed error
If the "Command not found" is not from the sed command itself, it might be caused by the arguments to the sed command. However, it cannot be determined what the cause and solution is unless the script and line with the sed command is identified. If you can provide information on the computing system, perhaps someone will encounter the same problem or be able to reproduce it. Linux version (example, Ubuntu 18.04.3): sed version (example, output of command 'sed --version' [1]): tcsh version (example, output of command 'dpkg -l tcsh' [2]): csh version (example, output of command 'dpkg -l csh' [3]): WIEN2k version (example, output of command 'cat WIEN2k_VERSION' [4]): Did you just check 'sed' in your user account? Since you are doing parallel, did you also check if sed works when you login to your nodes (which appears to be 'ssh localhost' in your case)? If you use root, su, or sudo for running WIEN2k, I suggest use of the user account instead and not those unless your experienced with that as those administrative environments can sometimes behave differently with unexpected consequences [5]. [1] http://manpages.ubuntu.com/manpages/trusty/man1/sed.1.html [2] https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg18018.html [3] https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg16030.html [4] https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg18956.html [5] https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg10594.html On 11/27/2019 4:32 AM, Peeyush kumar kamlesh wrote: Thank you all ! The problem is solved by running the same command with using additional flag "-NI". Dear Gavin Sir, 'sed' is already installed, still it showed *"sed: Command not found". * On Tue, Nov 26, 2019 at 5:53 AM Peeyush kumar kamlesh mailto:peeyush.physik@gmail.com>> wrote: Sir, I am using single node of four cores. Mu machine file is below: __ 100:localhost 100:localhost 100:localhost 100:localhost granularity:1 extrafine:1 omp_global:4 On Mon, Nov 25, 2019 at 10:06 PM Peeyush kumar kamlesh mailto:peeyush.physik@gmail.com>> wrote: Hello Wien2k user, Greetings! I am running scf cycle with hf potential. When I run the command "run_lapw -hf -p", then after successful completion of 7 cycles, I found error in cycle 8. In terminal it is represented as follows: in cycle 8 ETEST: .000491915000 CTEST: .0035867 hup: Command not found. LAPW0 END LAPW0 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END sed: Command not found. LAPW2 - Error. Check file lapw2.error cp: cannot stat '.in.tmp': No such file or directory > stop error -- When I checked lapw2.error file I found following details: _ 'LAPW2' - can't open unit: 10 'LAPW2' - filename: /case.vector 'LAPW2' - status: unknown form: unformatted ** testerror: Error in Parallel LAPW2 --- I also tried to search and understand the previous threads, but I was unable to do so. Kindly suggest me why this error is appearing and how can it be resolved? Thanks and Regards Peeyush Kumar Kamlesh ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html