date:20191127

Re: [Wien] MPI parallelization failure for lapw1

2019-11-27 Thread Peter Blaha

When using the srun setup of WIEN2k it means that you are tightly 
integrated into your system and have to follow all your systems default 
settings.


For instance you configured CORES_PER_NODE =1; but I very much doubt 
that you cluster has only one core per node and srun will probably make 
certain assumptions about that.


Two suggestions for tests:

a) run it on only ONE node, but on all cores of this node. The 
corresponding .machines-file should have

1:machine1:YYwhere YY is the number of cores (16 or 24, ..)

b) If your queuing system setup allows to use mpirun, reconfigure WIEN2k 
(siteconfig) with the default intel+mkl option (not the srun option). It 
will then suggest to use mpirun ... for starting jobs.


Make sure that in your batch job (I assume you are using it) the proper 
modules are loaded (intel, mkl, intel-mpi).



On 11/26/19 7:07 PM, Hanning Chen wrote:

Dear WIEN2K community,

   I am a new user of WIEN2K, and just compiled it using the following 
options:


current:FOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML 
-traceback -assume buffered_io -I$(MKLROOT)/include


current:FPOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML 
-traceback -assume buffered_io -I$(MKLROOT)/include


current:OMP_SWITCH:-qopenmp

current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -lpthread 
-lm -ldl -liomp5


current:DPARALLEL:'-DParallel'

current:R_LIBS:-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core

current:FFTWROOT:/home/ec2-user/FFTW338/

current:FFTW_VERSION:FFTW3

current:FFTW_LIB:lib

current:FFTW_LIBNAME:fftw3

current:LIBXCROOT:

current:LIBXC_FORTRAN:

current:LIBXC_LIBNAME:

current:LIBXC_LIBDNAME:

current:SCALAPACKROOT:$(MKLROOT)/lib/

current:SCALAPACK_LIBNAME:mkl_scalapack_lp64

current:BLACSROOT:$(MKLROOT)/lib/

current:BLACS_LIBNAME:mkl_blacs_intelmpi_lp64

current:ELPAROOT:

current:ELPA_VERSION:

current:ELPA_LIB:

current:ELPA_LIBNAME:

current:MPIRUN:srun -K -N_nodes_ -n_NP_ -r_offset_ _PINNING_ _EXEC_

current:CORES_PER_NODE:1

current:MKL_TARGET_ARCH:intel64

setenv TASKSET "no"

if ( ! $?USE_REMOTE ) setenv USE_REMOTE 1

if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 0

setenv WIEN_GRANULARITY 1

setenv DELAY 0.1

setenv SLEEPY 1

setenv WIEN_MPIRUN "srun -K -N_nodes_ -n_NP_ -r_offset_ _PINNING_ _EXEC_"

if ( ! $?CORES_PER_NODE) setenv CORES_PER_NODE1

# if ( ! $?PINNING_COMMAND) setenv PINNING_COMMAND "--cpu_bind=map_cpu:"

# if ( ! $?PINNING_LIST ) setenv PINNING_LIST 
"0,8,1,9,2,10,3,11,4,12,5,13,6,14,7,15"


   Then, I ran a k-point parallelization with the .machines file below, 
and it worked perfectly:


     granularity:1

1:machine1

2:machine2

extrafine:1

   But, when I tried to parallelize it over MPI with the new .machines file:

   granularity:1

   1:machine1 machine2

extrafine:1

lapw1 crashed with the error message as

**   Error in Parallel LAPW1

**.  LAPW1 STOPPED

** check ERROR FILES!

   SEP INFO = -21

‘SECLR4’. -SYEVX (Scalapack/LAPACK) failed

Although I understand that the 21st parameter of the SYEVX subroutine is 
incorrect, I am not sure how to fix the problem. I actually have linked 
WIEN2K with NETLIB’s SCALAPACK/LAPACK/BLAS instead of MKL. But the same 
error appeared again.


Please help me out. Thanks.

Hanning Chen, Ph.D.

Department of Chemistry

American University

Washington, DC 20016


___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html



--

  P.Blaha
--
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300 FAX: +43-1-58801-165982
Email: bl...@theochem.tuwien.ac.atWIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at/TC_Blaha
--
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] lapw2 crashed error

2019-11-27 Thread Peeyush kumar kamlesh

Thank you all !
The problem is solved by running the same command with using additional
flag "-NI".
Dear Gavin Sir, 'sed' is already installed, still it showed
*"sed: Command not found".*

On Tue, Nov 26, 2019 at 5:53 AM Peeyush kumar kamlesh <
peeyush.physik@gmail.com> wrote:

> Sir,
> I am using single node of four cores. Mu machine file is below:
> __
> 100:localhost
> 100:localhost
> 100:localhost
> 100:localhost
> granularity:1
> extrafine:1
> omp_global:4
> 
>
> On Mon, Nov 25, 2019 at 10:06 PM Peeyush kumar kamlesh <
> peeyush.physik@gmail.com> wrote:
>
>> Hello Wien2k user,
>> Greetings!
>> I am running scf cycle with hf potential. When I run the command
>> "run_lapw -hf -p", then after successful completion of 7 cycles, I found
>> error in cycle 8. In terminal it is represented as follows:
>> 
>> in cycle 8ETEST: .000491915000   CTEST: .0035867
>> hup: Command not found.
>>  LAPW0 END
>>  LAPW0 END
>>  LAPW1 END
>>  LAPW1 END
>>  LAPW1 END
>>  LAPW1 END
>> sed: Command not found.
>> LAPW2 - Error. Check file lapw2.error
>> cp: cannot stat '.in.tmp': No such file or directory
>>
>> >   stop error
>>
>> --
>>
>> When I checked lapw2.error file I found following details:
>> _
>> 'LAPW2' - can't open unit: 10
>>
>>  'LAPW2' -filename: /case.vector
>>
>>  'LAPW2' -  status: unknown  form: unformatted
>>
>> **  testerror: Error in Parallel LAPW2
>>
>> ---
>>
>> I also tried to search and understand the previous threads, but I was
>> unable to do so. Kindly suggest me why this error is appearing and how can
>> it be resolved?
>>
>> Thanks and Regards
>> Peeyush Kumar Kamlesh
>>
>
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] Spin-orbit coupling SCF not converging

2019-11-27 Thread Laurence Marks

As Peter has already responded, TETRA is not appropriate for 2D structures.
You got away with it without SOC, as that exactly solves the eigenproblem
for RKMAX etc. However, SOC uses the finite set of lapw1 eigenvectors.
There will therefore be something similar to telegraph noise as which
eigenvectors are used changes with iterations. This leads to
ill-conditioning which is amplified by the use of TETRA which I suspect can
also introduce something similar to telegraph noise. The total conditioning
of a particular case is a product of the condition numbers of different
parts. As nband--> infinity the SOC calculation should be better
conditioned.

Small values of Beta and the total step are clear indicators that the
problem is ill-conditioned. I would guesstimate that values smaller than
0.5 are an indicator of ill-conditioning if they persist; an occasional
value is OK. Small GREED may also be an indicator of problems, particularly
near convergence. However, GREED is more complicated and can legitimately
be small far from the fixed-point solution.

Wikipedia seems to have a reasonable page,
https://en.wikipedia.org/wiki/Condition_number . I have never tried to
calculate these for different parts of Wien2k, my understanding is somewhat
empirical experience. Peter has more experience so has a better feel,
although I am not sure he has ever tried to actually calculate the numbers.
(A good project for someone.)

On Wed, Nov 27, 2019 at 9:54 AM Luigi Maduro - TNW 
wrote:

> I am using WIEN2k_19.1 and Using grep MULTISECANT *.scfm I get the
> following:
>
>
>
> * MULTISECANT MIXING VER9 RELEASE 10.4.0
>
>
>
> For the input of the SCF calculation with SOC the output of a SCF
> calculation without SOC was used (with TETRA). For the SCF calculation with
> SOC the following criteria were used: RKmax = 7.0, 21 k-points in IBZ,
> charge convergence of 0.001e and energy convergence of 0.0001 Ry. These are
> the same criteria as the original SCF calculation without SOC.
>
>
> If I understood correctly, then looking at the size GREED and Beta should
> be sufficient for determining if the calculation has converged, if so then
> how small is too small for these parameters?
>
>
>
>
>
> *From:* Laurence Marks [mailto:laurence.ma...@gmail.com]
> *Sent:* dinsdag 26 november 2019 18:19
> *To:* A Mailing list for WIEN2k users
> *Cc:* Luigi Maduro - TNW
> *Subject:* Re: [Wien] Spin-orbit coupling SCF not converging
>
>
>
> What version of Wien2k are you using, particularly the mixer (grep
> MULTISECANT *.scfm)?
>
>
>
> Your calculations are "starving to death". The step size is so small (both
> the GREED and Beta) that it is bouncing around on numerical noise. It may
> well have already converged to the limits of the noise/conditioning in your
> calculation, which is linked to RKMAX and the k-mesh and also (Peter's
> response) from TETRA. The iterative diagonalizations also introduce some
> noise.
>
>
>
> For the specific case I would remove the prior history (rm *.broyd*) and
> continue it.
>
>
>
> On Tue, Nov 26, 2019 at 11:01 AM Luigi Maduro - TNW 
> wrote:
>
> Hello Laurence,
>
>
>
> This is the result I get when using Check-mixing (this is with the
> thinnest slab model, and using SCALA with Emax = 10.0 Ry)
>
>
>
>
>
> :DIRQ :  |MSR1|= 1.472E-06 |PRATT|= 3.852E-03 ANGLE=  79.0 DEGREES
>
> :DIRT :  |MSR1|= 1.516E-06 |PRATT|= 4.100E-03 ANGLE=  79.2 DEGREES
>
> :MIX  :   MSE1   REGULARIZATION:  9.15E-04 GREED: 0.00200  Newton 1.00
> 0.0004
>
> :ENE  : ** TOTAL ENERGY IN Ry =   -58196.30065156
>
> :DIS  :  CHARGE DISTANCE   ( 0.0046033 for atom7 spin 1)
> 0.0009683
>
> :PLANE:  PW TOTAL  6.0026 DISTAN   3.20E-03  5.33E-02 %
>
> :CHARG:  CLM/ATOM 74.0417 DISTAN   5.58E-04  7.54E-04 %
>
> :RANK :  ACTIVE  14.44/16 =  90.26 % ; YY RANK  14.44/16 =  90.25 %
>
> :DIRM :  MEMORY 16/12 SCALE   1.000 RED  2.57 PRED  0.95 NEXT  0.95 BETA
> 0.05
>
> :DIRP :  |MSR1|= 1.024E-06 |PRATT|= 3.198E-03 ANGLE= 102.9 DEGREES
>
> :DIRQ :  |MSR1|= 4.046E-06 |PRATT|= 1.005E-02 ANGLE=  82.5 DEGREES
>
> :DIRT :  |MSR1|= 4.174E-06 |PRATT|= 1.054E-02 ANGLE=  84.0 DEGREES
>
> :MIX  :   MSE1   REGULARIZATION:  1.33E-03 GREED: 0.00500  Newton 1.00
> 0.0004
>
> :ENE  : ** TOTAL ENERGY IN Ry =   -58196.30346394
>
> :DIS  :  CHARGE DISTANCE   ( 0.0012891 for atom8 spin 1)
> 0.0002073
>
> :PLANE:  PW TOTAL  6.0026 DISTAN   1.51E-03  2.51E-02 %
>
> :CHARG:  CLM/ATOM 74.0417 DISTAN   2.13E-04  2.88E-04 %
>
> :RANK :  ACTIVE  15.31/16 =  95.68 % ; YY RANK  15.31/16 =  95.72 %
>
> :DIRM :  MEMORY 16/12 SCALE   1.000 RED  0.39 PRED  0.95 NEXT  0.95 BETA
> 0.05
>
> :DIRP :  |MSR1|= 3.562E-07 |PRATT|= 1.508E-03 ANGLE=  72.0 DEGREES
>
> :DIRQ :  |MSR1|= 1.479E-06 |PRATT|= 3.835E-03 ANGLE=  80.1 DEGREES
>
> :DIRT :  |MSR1|= 1.522E-06 |PRATT|= 4.121E-03 ANGLE=  79.5 DEGREES
>
> :MIX  :   MSE1   REGULARIZATION:  9.97E-04 GREED: 0.00200  Newton 1.00
> 0.0004
>
> :ENE  : ** TOTAL ENERGY IN Ry =   -58196.2990

Re: [Wien] lapw2 crashed error

2019-11-27 Thread Gavin Abo

If the "Command not found" is not from the sed command itself, it might 
be caused by the arguments to the sed command.  However, it cannot be 
determined what the cause and solution is unless the script and line 
with the sed command is identified.


If you can provide information on the computing system, perhaps someone 
will encounter the same problem or be able to reproduce it.


Linux version (example, Ubuntu 18.04.3):

sed version (example, output of command 'sed --version' [1]):

tcsh version (example, output of command 'dpkg -l tcsh' [2]):

csh version (example, output of command 'dpkg -l csh' [3]):

WIEN2k version (example, output of command 'cat WIEN2k_VERSION' [4]):

Did you just check 'sed' in your user account?  Since you are doing 
parallel, did you also check if sed works when you login to your nodes 
(which appears to be 'ssh localhost' in your case)?  If you use root, 
su, or sudo for running WIEN2k, I suggest use of the user account 
instead and not those unless your experienced with that as those 
administrative environments can sometimes behave differently with 
unexpected consequences [5].


[1] http://manpages.ubuntu.com/manpages/trusty/man1/sed.1.html
[2] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg18018.html
[3] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg16030.html
[4] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg18956.html
[5] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg10594.html


On 11/27/2019 4:32 AM, Peeyush kumar kamlesh wrote:

Thank you all !
The problem is solved by running the same command with using 
additional flag "-NI".
Dear Gavin Sir, 'sed' is already installed, still it showed *"sed: 
Command not found".

*

On Tue, Nov 26, 2019 at 5:53 AM Peeyush kumar kamlesh 
mailto:peeyush.physik@gmail.com>> 
wrote:


Sir,
I am using single node of four cores. Mu machine file is below:
__
100:localhost
100:localhost
100:localhost
100:localhost
granularity:1
extrafine:1
omp_global:4


On Mon, Nov 25, 2019 at 10:06 PM Peeyush kumar kamlesh
mailto:peeyush.physik@gmail.com>> wrote:

Hello Wien2k user,
Greetings!
I am running scf cycle with hf potential. When I run the
command "run_lapw -hf -p", then after successful completion of
7 cycles, I found error in cycle 8. In terminal it is
represented as follows:

in cycle 8    ETEST: .000491915000   CTEST: .0035867
hup: Command not found.
 LAPW0 END
 LAPW0 END
 LAPW1 END
 LAPW1 END
 LAPW1 END
 LAPW1 END
sed: Command not found.
LAPW2 - Error. Check file lapw2.error
cp: cannot stat '.in.tmp': No such file or directory

>   stop error

--

When I checked lapw2.error file I found following details:
_
'LAPW2' - can't open unit: 10
 'LAPW2' -        filename: /case.vector
 'LAPW2' -          status: unknown      form: unformatted
**  testerror: Error in Parallel LAPW2

---

I also tried to search and understand the previous threads,
but I was unable to do so. Kindly suggest me why this error is
appearing and how can it be resolved?

Thanks and Regards
Peeyush Kumar Kamlesh

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] MPI parallelization failure for lapw1

Re: [Wien] lapw2 crashed error

Re: [Wien] Spin-orbit coupling SCF not converging

Re: [Wien] lapw2 crashed error

4 matches

Site Navigation

Mail list logo

Footer information