Dear Laurence,

I used 40 k-points.


The integration part makes no problems (-mode integ), the memory consuming part is the current part (-mode current).

Your hint for lapw1 shows even more that it would be safer to use 4 parallel calculations instead of eight without loosing much perfomance (the 14900k has only 8 performance cores, the other 16 (efficient cores) are slower.


Best regards,

Michael


Am 13.05.2024 um 10:14 schrieb Laurence Marks:
For my own curiosity, is it 40,000 k-points or 40 k-points?

N.B., as Peter suggested, did you try using mpi, which would be four of nmr_integ:localhost:2 I suspect (but might be wrong) that this will reduce you memory useage by a factor of 2, and will only be slightly slower than what you have. If needed you can also go to 4 mpi. Of course you have to have compiled it...

N.N.B., you presumably realise that you are using 16 cores for lapw1, as each k-point has 2 cores.



On Mon, May 13, 2024 at 4:00 PM Michael Fechtelkord via Wien <wien@zeus.theochem.tuwien.ac.at> wrote:

    Hello all,


    as far as I can see it, a job with 8 cores may be faster, but uses
    double of the space on scratch (8 partial nmr vectors with size
    depending on the kmesh per direction eg. nmr_mqx instead of 4 partial
    vectors) and that also doubles the RAM usage of the NMR current
    calculation because 8 partial vectors per direction are used.

    I will try the -quota 8 option, but currently it seems that
    calculations
    on eight cores  are at high risk to crash because of the memory and
    scratch space it needs and that already for 40k points. I never had
    problems with calculations on 4 cores even with only 64 GB RAM and
    1000k
    points.


    Best regards,

    Michael


    Am 12.05.2024 um 18:02 schrieb Michael Fechtelkord via Wien:
    > It shows  EXECUTING:     /usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2
    > -mode current    -green         -scratch /scratch/WIEN2k/ -noco
    >
    > in all cases and in htop the values I provided below.
    >
    >
    > Best regards,
    >
    > Michael
    >
    >
    > Am 12.05.2024 um 16:01 schrieb Peter Blaha:
    >> This makes sense.
    >> Please let me know if it shows
    >>
    >>  EXECUTING:     /usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode
    >> current    -green         -scratch /scratch/WIEN2k/ -noco
    >>
    >> or only    nmr -case ...
    >>
    >> In any case, it is running correctly.
    >>
    >> PS: I know that also the current step needs a lot of memory, after
    >> all it needs to read the eigenvectors of all eigenvalues, ...
    >>
    >> PPS:   -quota 8 (or 24)  might help and still utilizing all cores,
    >> but I'm not sure if it would save enough memory in the current
    steps.
    >>
    >>
    >>
    >> Am 12.05.2024 um 10:09 schrieb Michael Fechtelkord via Wien:
    >>> Hello all, hello Peter,
    >>>
    >>>
    >>> That is what is really running in the background (from htop:
    this is
    >>> a new job with 4 nodes but it was the same with 8 nodes -p 1 -
    8),
    >>> so no nmr_mpi.
    >>>
    >>>
    >>> TIME+ Command
    >>>
    >>> 96.0 14.9 19h06:05 /usr/local/WIEN2k/nmr -case MS_2M1_A12 -mode
    >>> current -green -scratch /scratch/WIEN2k/ -noco -p 3
    >>>
    >>> 95.8 14.9 19h05:10 /usr/local/WIEN2k/nmr -case MS_2M1_A12 -mode
    >>> current -green -scratch /scratch/WIEN2k/ -noco -p 1
    >>>
    >>> 95.1 14.9 19h06:00 /usr/local/WIEN2k/nmr -case MS_2M1_A12 -mode
    >>> current -green -scratch /scratch/WIEN2K/ -noco -p 2
    >>>
    >>> 95.5 15.4 19h08:10 /usr/local/WIEN2k/nmr -case MS_2M1_A12 -mode
    >>> current -green -scratch /scratch/WIEN2k/ -noco -p 4
    >>>
    >>> 94.6 14.9 18h35:33 /usr/local/WIEN2k/nmr -case MS_2M1_A12 -mode
    >>> current -green -scratch /scratch/WIEN2k/ -noco -p 3
    >>>
    >>> 93.3 15.4 18h36:24 /usr/local/WIEN2k/nmr-case MS_2M1_Al2 -mode
    >>> current -green -scratch /scratch/WIEN2k/ -noco -p 4
    >>>
    >>> 93.3 14.9 18h33:02 /usr/local/WIEN2k/nmr-case MS_2M1_A12 -mode
    >>> current -green -scratch/scratch/WIEN2k/ -noco -p2
    >>>
    >>> 94.0 14.9 18h38:44 /usr/local/WIEN2k/nmr -case MS_2M1_A12 -mode
    >>> current -green -scratch /scratch/WIEN2k/ -noco -p 1
    >>>
    >>>
    >>> Regards,
    >>>
    >>> Michael
    >>>
    >>>
    >>> Am 11.05.2024 um 20:10 schrieb Michael Fechtelkord via Wien:
    >>>> Hello Peter,
    >>>>
    >>>>
    >>>> I just use "x_nmr_lapw -p" and the rest is initiated by the nmr
    >>>> script. The Line "/usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2
    -mode
    >>>> current -green         -scratch /scratch/WIEN2k/ -noco " is just
    >>>> part of the whole procedure and not initiated by me
    manually.. (I
    >>>> only copied the last lines of the calculation).
    >>>>
    >>>>
    >>>> Best regards,
    >>>>
    >>>> Michael
    >>>>
    >>>>
    >>>> Am 11.05.2024 um 18:08 schrieb Peter Blaha:
    >>>>> Hallo Michael,
    >>>>>
    >>>>> I don't understand the line:
    >>>>>
    >>>>> /usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode current
    >>>>> -green         -scratch /scratch/WIEN2k/ -noco
    >>>>>
    >>>>> The mode current should run only k-parallel, not in mpi ??
    >>>>>
    >>>>> PS: The repetition of
    >>>>>
    >>>>> nmr_integ:localhost    is useless.
    >>>>>
    >>>>> nmr mode integ runs only once (not k-parallel, sumpara has
    already
    >>>>> summed up the currents)
    >>>>>
    >>>>> But one can use nmr_integ:localhost:8
    >>>>>
    >>>>>
    >>>>> Best regards
    >>>>>
    >>>>> Am 11.05.2024 um 16:19 schrieb Michael Fechtelkord via Wien:
    >>>>>> Hello Peter,
    >>>>>>
    >>>>>> this is the .machines file content:
    >>>>>>
    >>>>>> granulartity:1
    >>>>>> omp_lapw0:8
    >>>>>> omp_global:2
    >>>>>> 1:localhost
    >>>>>> 1:localhost
    >>>>>> 1:localhost
    >>>>>> 1:localhost
    >>>>>> 1:localhost
    >>>>>> 1:localhost
    >>>>>> 1:localhost
    >>>>>> 1:localhost
    >>>>>> nmr_integ:localhost
    >>>>>> nmr_integ:localhost
    >>>>>> nmr_integ:localhost
    >>>>>> nmr_integ:localhost
    >>>>>> nmr_integ:localhost
    >>>>>> nmr_integ:localhost
    >>>>>> nmr_integ:localhost
    >>>>>> nmr_integ:localhost
    >>>>>>
    >>>>>>
    >>>>>> Best regards,
    >>>>>>
    >>>>>> Michael
    >>>>>>
    >>>>>>
    >>>>>> Am 11.05.2024 um 14:58 schrieb Peter Blaha:
    >>>>>>> Hmm. ?
    >>>>>>>
    >>>>>>> Are you using   k-parallel  AND mpi-parallel ??  This could
    >>>>>>> overload the machine.
    >>>>>>>
    >>>>>>> How does the .machines file look like ?
    >>>>>>>
    >>>>>>>
    >>>>>>> Am 10.05.2024 um 18:15 schrieb Michael Fechtelkord via Wien:
    >>>>>>>> Dear all,
    >>>>>>>>
    >>>>>>>>
    >>>>>>>> the following problem occurs to me using the NMR part of
    WIEN2k
    >>>>>>>> (23.2) on a opensuse LEAP 15.5 Intel platform. WIEN2k was
    >>>>>>>> compiled using one-api 2024.1 ifort and gcc 13.2.1. I am
    using
    >>>>>>>> ELPA 2024.03.01, Libxc 6.22, fftw 3.3.10 and MPICH 4.2.1 and
    >>>>>>>> the one-api 2024.1 MKL libraries. The CPU is a I9 14900k
    with
    >>>>>>>> 24 cores where I use eight for the calculations. The RAM
    is 130
    >>>>>>>> Gb and a swap file of 16 GB on a Samsung PCIE 4.0 NVME
    SSD. The
    >>>>>>>> BUS width is 5600 MT / s.
    >>>>>>>>
    >>>>>>>> The structure is a layersilicate and to simulate the
    ratio of
    >>>>>>>> Si:Al = 3:1 I use a 1:1:2 supercell currently. The
    monoclinic
    >>>>>>>> symmetry of the new structure (original is C 2/c) is P
    2/c and
    >>>>>>>> contains 40 atoms (K, Al, Si, O, and F).
    >>>>>>>>
    >>>>>>>> I use 3 NMR LOs for K and O and 10 for Si, Al, and F
    (where I
    >>>>>>>> need the chemical shifts). The k mesh is 40k points.
    >>>>>>>>
    >>>>>>>> The interesting thing is that the RAM is sufficient
    during NMR
    >>>>>>>> vector calculations (always under 100 Gb RAM occupied)
    and at
    >>>>>>>> the beginning of the electron current calculation.
    However, the
    >>>>>>>> RAM use increases to a critical point in the calculation and
    >>>>>>>> more and more data is outsourced into the SWAP File which is
    >>>>>>>> sometimes 80% occupied.
    >>>>>>>>
    >>>>>>>> As you see this time only one core failed because of memory
    >>>>>>>> overflow. But using 48k points 3 cores crashed and so the
    whole
    >>>>>>>> current calculation. The reason is of the crash clear to me.
    >>>>>>>> But I do not understand, why the current calculation
    reacts so
    >>>>>>>> sensitive with so few atoms and a small k mesh. I made
    >>>>>>>> calculations with more atoms and a 1000K point mesh on 4
    cores
    >>>>>>>> .. they worked fine. So can it be that the Intel MKL
    library is
    >>>>>>>> the source of failure? So I better get back to 4 cores, even
    >>>>>>>> with longer calculation times?
    >>>>>>>>
    >>>>>>>> Have all a nice weekend!
    >>>>>>>>
    >>>>>>>>
    >>>>>>>> Best wishes from
    >>>>>>>>
    >>>>>>>> Michael Fechtelkord
    >>>>>>>>
    >>>>>>>> -----------------------------------------------
    >>>>>>>>
    >>>>>>>> cd ./  ...  x lcore  -f MS_2M1_Al2
    >>>>>>>>  CORE  END
    >>>>>>>> 0.685u 0.028s 0:00.71 98.5%     0+0k 2336+16168io 5pf+0w
    >>>>>>>>
    >>>>>>>> lcore        ....  ready
    >>>>>>>>
    >>>>>>>>
    >>>>>>>>  EXECUTING: /usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2
    >>>>>>>> -mode current -green         -scratch /scratch/WIEN2k/ -noco
    >>>>>>>>
    >>>>>>>> [1] 20253
    >>>>>>>> [2] 20257
    >>>>>>>> [3] 20261
    >>>>>>>> [4] 20265
    >>>>>>>> [5] 20269
    >>>>>>>> [6] 20273
    >>>>>>>> [7] 20277
    >>>>>>>> [8] 20281
    >>>>>>>> [8]  + Abgebrochen                   ( cd $dir; $exec2 >>
    >>>>>>>> nmr.out.${loop} ) >& nmr.err.$loop
    >>>>>>>> [7]  + Fertig                        ( cd $dir; $exec2 >>
    >>>>>>>> nmr.out.${loop} ) >& nmr.err.$loop
    >>>>>>>> [6]  + Fertig                        ( cd $dir; $exec2 >>
    >>>>>>>> nmr.out.${loop} ) >& nmr.err.$loop
    >>>>>>>> [5]  + Fertig                        ( cd $dir; $exec2 >>
    >>>>>>>> nmr.out.${loop} ) >& nmr.err.$loop
    >>>>>>>> [4]  + Fertig                        ( cd $dir; $exec2 >>
    >>>>>>>> nmr.out.${loop} ) >& nmr.err.$loop
    >>>>>>>> [3]  + Fertig                        ( cd $dir; $exec2 >>
    >>>>>>>> nmr.out.${loop} ) >& nmr.err.$loop
    >>>>>>>> [2]  + Fertig                        ( cd $dir; $exec2 >>
    >>>>>>>> nmr.out.${loop} ) >& nmr.err.$loop
    >>>>>>>> [1]  + Fertig                        ( cd $dir; $exec2 >>
    >>>>>>>> nmr.out.${loop} ) >& nmr.err.$loop
    >>>>>>>>
    >>>>>>>>  EXECUTING: /usr/local/WIEN2k/nmr -case MS_2M1_Al2 -mode
    >>>>>>>> sumpara  -p 8    -green -scratch /scratch/WIEN2k/
    >>>>>>>>
    >>>>>>>>
    >>>>>>>> current        ....  ready
    >>>>>>>>
    >>>>>>>>
    >>>>>>>>  EXECUTING:     mpirun -np 1 -machinefile .machine_nmrinteg
    >>>>>>>> /usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode integ -green
    >>>>>>>>
    >>>>>>>>
    >>>>>>>> nmr:  integration  ... done in   4032.3s
    >>>>>>>>
    >>>>>>>>
    >>>>>>>> stop
    >>>>>>>>
    >>
    > _______________________________________________
    > Wien mailing list
    > Wien@zeus.theochem.tuwien.ac.at
    > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
    > SEARCH the MAILING-LIST at:
    >
    http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

-- Dr. Michael Fechtelkord

    Institut für Geologie, Mineralogie und Geophysik
    Ruhr-Universität Bochum
    Universitätsstr. 150
    D-44780 Bochum

    Phone: +49 (234) 32-24380
    Fax:  +49 (234) 32-04380
    Email: michael.fechtelk...@ruhr-uni-bochum.de
    Web Page:
    https://www.ruhr-uni-bochum.de/kristallographie/kc/mitarbeiter/fechtelkord/

    _______________________________________________
    Wien mailing list
    Wien@zeus.theochem.tuwien.ac.at
    http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
    SEARCH the MAILING-LIST at:
    http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html



--
Professor Laurence Marks (Laurie)
Northwestern University
Webpage <http://www.numis.northwestern.edu> and Google Scholar link <http://scholar.google.com/citations?user=zmHhI9gAAAAJ&hl=en> "Research is to see what everybody else has seen, and to think what nobody else has thought", Albert Szent-Györgyi

_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST 
at:http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

--
Dr. Michael Fechtelkord

Institut für Geologie, Mineralogie und Geophysik
Ruhr-Universität Bochum
Universitätsstr. 150
D-44780 Bochum

Phone: +49 (234) 32-24380
Fax:  +49 (234) 32-04380
Email:michael.fechtelk...@ruhr-uni-bochum.de
Web Page:https://www.ruhr-uni-bochum.de/kristallographie/kc/mitarbeiter/fechtelkord/
_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Reply via email to