The different runtime fractions for small and large systems is due to
the scaling of the time.
lapw0 scales basically linear with the number of atoms, but lapw1 scales
cubically with the basisset.
And here is the second problem: for your nanowire you get a matix size
of about 130000x130000, and this for just 97 atoms.
It is not the number of atoms which is determining the memory, but the
plane wave basis set. This info is printed in the :RKM line of the scf
file and you can even get it using
x lapw1 -nmat_only
So your cell dimensions / RMT settings must be very bad. Remember: Also
"vacuum" costs a lot in plane wave methods. You have to optimize your
RMT and reduce cell parameters (vacuum).
lapw2: you can set a line in .machines:
lapw2_vector_split:4 (or 8 or 16)
which will reduce the memory consumption of lapw2.
On 11/07/2017 04:09 PM, Luigi Maduro - TNW wrote:
There are 2 different things:
lapw0para executes:
$remote $machine "cd $PWD;$t $exe $def.def"
where $remote is either ssh or rsh (depending on your configuration setup)
once this is defined, it goes to the remote node and executes
$exe, which usually refers to mpirun
mpirun is a script on your system, and it may acknowledge this
I_MPI_HYDRA_BOOTSTRAP=rsh variable, while by default it seems to do ssh (even if your
system does not support this). WIEN2k does not know about such variable and assumes
that a plain mpirun will do the correct thing. The sysadmin should >>setup the
system such that rsh is used by default with mpirun, or should tell
people, which mpi-commands/variables they should set.
PS: I do not quite understand how it can happen that you get rsh in lapw1para,
but ssh in lapw0para ??
I do not understand either, because when I check the lapw2para script I
see that “set remote = rsh”
I have a couple of questions concerning the parallel version of WIEN2k,
one concerning insufficient virtual memory and the other concerning lapw1.
I’ve been trying to do simulations of MoS2 in two types of
configurations. One is a monolayer calculation (4x4x1 unit cells) with
48 atoms,
and another calculation deals with a “nanowire” (13x2x1 unit cells) with
97 atoms.
For the 4x4x1 unit cell I have an rkmax of 6.0 and a 10 k-point mesh.
For the calculation I used 2 nodes and 20 processors per node (so 40 in
total).
The command run is: run_lapw –p –nlvdw –ec 0.0001.
What I noticed is that both lapw1 and nlvdw take a long time to run.
Lapw0 takes about a minute, as does lapw2. Lapw1 and nlvdw take about
16-19 minutes to run.
When I log into the nodes and use the ‘top’ command to check the CPU% I
see that all processors are at 100%, however I’ve been notified that
only 2% of the requested CPU time is actually used.
I don’t really understand why there is such a big discrepancy of the
computation time between lapw1 and lapw2. In smaller calculations lapw1
and lapw2 are in the same order of magnitude in computation time.
For the nanowire calculation I chose an rkmax of 6.0 and a single
k-point and only used LDA because I want to compare LDA with NLVDW later
on. I always get an “forrtl: severe (41): insufficient virtual memory”
error at lapw1 or lapw2 at the first SCF cycle no matter the amount of
nodes I request, from 1 node to 20 nodes.
Each time I requested 20 processors per node. Only with the 20 nodes and
20 processors did the SCF cycle make it to lapw2, but it crashed not
long after reaching lapw2. Each node is equipped with 128 Gb of memory,
and the end of output1_1 looks like this:
MPI-parallel calculation using 400 processors
Scalapack processors array (row,col): 20 20
Matrix size 136632
Nice Optimum Blocksize 112 Excess % 0.000D+00
allocate H 712.2 MB dimensions 6832 6832
allocate S 712.2 MB dimensions 6832 6832
allocate spanel 11.7 MB dimensions 6832 112
allocate hpanel 11.7 MB dimensions 6832 112
allocate spanelus 11.7 MB dimensions 6832 112
allocate slen 5.8 MB dimensions 6832 112
allocate x2 5.8 MB dimensions 6832 112
allocate legendre 75.9 MB dimensions 6832 13 112
allocate al,bl (row) 2.3 MB dimensions 6832 11
allocate al,bl (col) 0.0 MB dimensions 112 11
allocate YL 1.7 MB dimensions 15 6832 1
Time for al,bl (hamilt, cpu/wall) : 14.7 14.7
Time for legendre (hamilt, cpu/wall) : 4.1 4.1
Time for phase (hamilt, cpu/wall) : 29.7 30.2
Time for us (hamilt, cpu/wall) : 38.8 39.2
Time for overlaps (hamilt, cpu/wall) : 115.6 116.3
Time for distrib (hamilt, cpu/wall) : 0.3 0.3
Time sum iouter (hamilt, cpu/wall) : 203.5 205.7
number of local orbitals, nlo (hamilt) 749
allocate YL 33.4 MB dimensions 15136632 1
allocate phsc 2.1 MB dimensions136632
Time for los (hamilt, cpu/wall) : 0.4 0.4
Time for alm (hns) : 1.0
Time for vector (hns) : 7.2
Time for vector2 (hns) : 6.8
Time for VxV (hns) : 114.8
Wall Time for VxV (hns) : 1.2
Scalapack Workspace size 100.38 and 804.35 Mb
Any help is appreciated.
Kind regards,
Luigi
--
P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300 FAX: +43-1-58801-165982
Email: bl...@theochem.tuwien.ac.at WIEN2k: http://www.wien2k.at
WWW: http://www.imc.tuwien.ac.at/TC_Blaha
--------------------------------------------------------------------------
_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html