It shows EXECUTING: /usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2
-mode current -green -scratch /scratch/WIEN2k/ -noco
in all cases and in htop the values I provided below.
Best regards,
Michael
Am 12.05.2024 um 16:01 schrieb Peter Blaha:
This makes sense.
Please let me know if it shows
EXECUTING: /usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode
current -green -scratch /scratch/WIEN2k/ -noco
or only nmr -case ...
In any case, it is running correctly.
PS: I know that also the current step needs a lot of memory, after all
it needs to read the eigenvectors of all eigenvalues, ...
PPS: -quota 8 (or 24) might help and still utilizing all cores, but
I'm not sure if it would save enough memory in the current steps.
Am 12.05.2024 um 10:09 schrieb Michael Fechtelkord via Wien:
Hello all, hello Peter,
That is what is really running in the background (from htop: this is
a new job with 4 nodes but it was the same with 8 nodes -p 1 - 8), so
no nmr_mpi.
TIME+ Command
96.0 14.9 19h06:05 /usr/local/WIEN2k/nmr -case MS_2M1_A12 -mode
current -green -scratch /scratch/WIEN2k/ -noco -p 3
95.8 14.9 19h05:10 /usr/local/WIEN2k/nmr -case MS_2M1_A12 -mode
current -green -scratch /scratch/WIEN2k/ -noco -p 1
95.1 14.9 19h06:00 /usr/local/WIEN2k/nmr -case MS_2M1_A12 -mode
current -green -scratch /scratch/WIEN2K/ -noco -p 2
95.5 15.4 19h08:10 /usr/local/WIEN2k/nmr -case MS_2M1_A12 -mode
current -green -scratch /scratch/WIEN2k/ -noco -p 4
94.6 14.9 18h35:33 /usr/local/WIEN2k/nmr -case MS_2M1_A12 -mode
current -green -scratch /scratch/WIEN2k/ -noco -p 3
93.3 15.4 18h36:24 /usr/local/WIEN2k/nmr-case MS_2M1_Al2 -mode
current -green -scratch /scratch/WIEN2k/ -noco -p 4
93.3 14.9 18h33:02 /usr/local/WIEN2k/nmr-case MS_2M1_A12 -mode
current -green -scratch/scratch/WIEN2k/ -noco -p2
94.0 14.9 18h38:44 /usr/local/WIEN2k/nmr -case MS_2M1_A12 -mode
current -green -scratch /scratch/WIEN2k/ -noco -p 1
Regards,
Michael
Am 11.05.2024 um 20:10 schrieb Michael Fechtelkord via Wien:
Hello Peter,
I just use "x_nmr_lapw -p" and the rest is initiated by the nmr
script. The Line "/usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode
current -green -scratch /scratch/WIEN2k/ -noco " is just
part of the whole procedure and not initiated by me manually.. (I
only copied the last lines of the calculation).
Best regards,
Michael
Am 11.05.2024 um 18:08 schrieb Peter Blaha:
Hallo Michael,
I don't understand the line:
/usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode current
-green -scratch /scratch/WIEN2k/ -noco
The mode current should run only k-parallel, not in mpi ??
PS: The repetition of
nmr_integ:localhost is useless.
nmr mode integ runs only once (not k-parallel, sumpara has already
summed up the currents)
But one can use nmr_integ:localhost:8
Best regards
Am 11.05.2024 um 16:19 schrieb Michael Fechtelkord via Wien:
Hello Peter,
this is the .machines file content:
granulartity:1
omp_lapw0:8
omp_global:2
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
nmr_integ:localhost
nmr_integ:localhost
nmr_integ:localhost
nmr_integ:localhost
nmr_integ:localhost
nmr_integ:localhost
nmr_integ:localhost
nmr_integ:localhost
Best regards,
Michael
Am 11.05.2024 um 14:58 schrieb Peter Blaha:
Hmm. ?
Are you using k-parallel AND mpi-parallel ?? This could
overload the machine.
How does the .machines file look like ?
Am 10.05.2024 um 18:15 schrieb Michael Fechtelkord via Wien:
Dear all,
the following problem occurs to me using the NMR part of WIEN2k
(23.2) on a opensuse LEAP 15.5 Intel platform. WIEN2k was
compiled using one-api 2024.1 ifort and gcc 13.2.1. I am using
ELPA 2024.03.01, Libxc 6.22, fftw 3.3.10 and MPICH 4.2.1 and the
one-api 2024.1 MKL libraries. The CPU is a I9 14900k with 24
cores where I use eight for the calculations. The RAM is 130 Gb
and a swap file of 16 GB on a Samsung PCIE 4.0 NVME SSD. The BUS
width is 5600 MT / s.
The structure is a layersilicate and to simulate the ratio of
Si:Al = 3:1 I use a 1:1:2 supercell currently. The monoclinic
symmetry of the new structure (original is C 2/c) is P 2/c and
contains 40 atoms (K, Al, Si, O, and F).
I use 3 NMR LOs for K and O and 10 for Si, Al, and F (where I
need the chemical shifts). The k mesh is 40k points.
The interesting thing is that the RAM is sufficient during NMR
vector calculations (always under 100 Gb RAM occupied) and at
the beginning of the electron current calculation. However, the
RAM use increases to a critical point in the calculation and
more and more data is outsourced into the SWAP File which is
sometimes 80% occupied.
As you see this time only one core failed because of memory
overflow. But using 48k points 3 cores crashed and so the whole
current calculation. The reason is of the crash clear to me. But
I do not understand, why the current calculation reacts so
sensitive with so few atoms and a small k mesh. I made
calculations with more atoms and a 1000K point mesh on 4 cores
.. they worked fine. So can it be that the Intel MKL library is
the source of failure? So I better get back to 4 cores, even
with longer calculation times?
Have all a nice weekend!
Best wishes from
Michael Fechtelkord
-----------------------------------------------
cd ./ ... x lcore -f MS_2M1_Al2
CORE END
0.685u 0.028s 0:00.71 98.5% 0+0k 2336+16168io 5pf+0w
lcore .... ready
EXECUTING: /usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode
current -green -scratch /scratch/WIEN2k/ -noco
[1] 20253
[2] 20257
[3] 20261
[4] 20265
[5] 20269
[6] 20273
[7] 20277
[8] 20281
[8] + Abgebrochen ( cd $dir; $exec2 >>
nmr.out.${loop} ) >& nmr.err.$loop
[7] + Fertig ( cd $dir; $exec2 >>
nmr.out.${loop} ) >& nmr.err.$loop
[6] + Fertig ( cd $dir; $exec2 >>
nmr.out.${loop} ) >& nmr.err.$loop
[5] + Fertig ( cd $dir; $exec2 >>
nmr.out.${loop} ) >& nmr.err.$loop
[4] + Fertig ( cd $dir; $exec2 >>
nmr.out.${loop} ) >& nmr.err.$loop
[3] + Fertig ( cd $dir; $exec2 >>
nmr.out.${loop} ) >& nmr.err.$loop
[2] + Fertig ( cd $dir; $exec2 >>
nmr.out.${loop} ) >& nmr.err.$loop
[1] + Fertig ( cd $dir; $exec2 >>
nmr.out.${loop} ) >& nmr.err.$loop
EXECUTING: /usr/local/WIEN2k/nmr -case MS_2M1_Al2 -mode
sumpara -p 8 -green -scratch /scratch/WIEN2k/
current .... ready
EXECUTING: mpirun -np 1 -machinefile .machine_nmrinteg
/usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode integ -green
nmr: integration ... done in 4032.3s
stop
_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html