Re: [QE-users] MD runs out of memory with increasing number of cores

Lenz Fiedler Wed, 23 Jun 2021 12:23:44 -0700

Dear Prof. Giannozzi,

ah, I understand, that makes sense. Do you have any advice on how to best
track such a memory leak down in this case? The behavior is reproducible
with my setup.


Kind regards
Lenz



Am Mi., 23. Juni 2021 um 14:02 Uhr schrieb Paolo Giannozzi <
p.gianno...@gmail.com>:

> On Wed, Jun 23, 2021 at 1:31 PM Lenz Fiedler <fiedler.l...@gmail.com>
> wrote:
>
>  (and the increase in number of processes is most likely the reason for
>> the error, if I understand you correctly?)
>>
>
> not exactly. Too many processes may result in too much global memory
> usage, because some arrays are replicated on each process.  If you exceed
> the global available memory, the code will crash. BUT: it will do so during
> the first MD step, not after 2000 MD steps. The memory usage should not
> increase with the number of MD time steps. If it does, there is a memory
> leak, either in the code or somewhere else (libraries etc).
>
> Paolo
>
> this is not a problem. For my Beryllium calculation it is more problematic
>> since the 144 processors case really gives the best performance (I have
>> uploaded a file called performance_Be128.png to show my timing results),
>> but I still run out of memory after 2700 time steps. Although this is also
>> manageable, since I can always restart the calculation and perform another
>> 2700 time steps. With this I was able to perform 10.000 time steps in just
>> over a day. I am running more calculations on larger Be and Fe cells and I
>> will investigate this behavior there.
>>
>> I have also used the "gamma" option for the K-points to use the
>> performance benefits you outlined. For the Fe128 cell, I achieved optimal
>> performance with 144 processors and using the "gamma" option (resulting in
>> about 90s per SCF cycle). I am still not within my personal target of ~30s
>> per SCF cycle but I will start looking into the choice of my PSP and cutoff
>> (along with considering OpenMP and task group parallelization) rather than
>> blindly throwing more and more processors at the problem.
>>
>> Kind regards
>> Lenz
>>
>> PhD Student (HZDR / CASUS)
>>
>>
>> Am Sa., 19. Juni 2021 um 09:25 Uhr schrieb Paolo Giannozzi <
>> p.gianno...@gmail.com>:
>>
>>> I tried your Fe job on a 36-core machine (with Gamma point to save time
>>> and memory) and found no evidence of memory leaks after more than 100 steps.
>>>
>>> The best performance I was able to achieve so far was with 144 cores
>>>> defaulting to -nb 144, so am I correct to assume that I should try e.g. -nb
>>>> 144 -ntg 2 for 288 cores?
>>>>
>>>
>>> You should not use option -nb except in some rather special cases.
>>>
>>> Paolo
>>>
>>>
>>> PhD Student (HZDR / CASUS)
>>>>
>>>> Am Mi., 16. Juni 2021 um 07:33 Uhr schrieb Paolo Giannozzi <
>>>> p.gianno...@gmail.com>:
>>>>
>>>>> Hard to say without knowing exactly what goes out of which memory
>>>>> limits. Note that not all arrays are distributed across processors, so a
>>>>> considerable number of arrays are replicated on all processes. As a
>>>>> consequence the total amount of required memory will increase with the
>>>>> number of mpi processes. Also note that a 128-atom cell is not "large" and
>>>>> 144 cores are not "a small number of processors". You will not get any
>>>>> advantage by just increasing the number of processors any more, quite the
>>>>> opposite. If you have too many idle cores, you should consider
>>>>> - "task group" parallelization (option -ntg)
>>>>> - MPI+OpenMP parallelization (configure --enable-openmp)
>>>>> Please also note that ecutwfc=80 Ry is a rather large cutoff for a
>>>>> USPP (while ecutrho=320 is fine) and that running with K_POINTS Gamma
>>>>> instead of 1 1 1 0 0 0 will be faster and take less memory.
>>>>>
>>>>> Paolo
>>>>>
>>>>> On Mon, Jun 14, 2021 at 4:22 PM Lenz Fiedler <fiedler.l...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Dear users,
>>>>>>
>>>>>> I am trying to perform a MD simulation for a large cell (128 Fe
>>>>>> atoms, gamma point) using pw.x and I get a strange scaling behavior. To
>>>>>> test the performance I ran the same MD simulation with an increasing 
>>>>>> number
>>>>>> of nodes (2, 4, 6, 8, etc.) using 24 cores per node. The simulation is
>>>>>> successful when using 2, 4, and 6 nodes, so 48, 96 and 144 cores resp
>>>>>> (albeit slow, which is within my expectations for such a small number of
>>>>>> processors).
>>>>>> Going to 8 and more nodes, I run into an out-of-memory error after
>>>>>> about two time steps.
>>>>>> I am a little bit confused as to what could be the reason. Since a
>>>>>> smaller amount of cores works I would not expect a higher number of cores
>>>>>> to run without an oom error as well.
>>>>>> The 8 node run explictly outputs at the beginning:
>>>>>> "     Estimated max dynamical RAM per process >     140.54 MB
>>>>>>       Estimated total dynamical RAM >      26.35 GB
>>>>>> "
>>>>>>
>>>>>> which is well within the 2.5 GB I have allocated for each core.
>>>>>> I am obviously doing something wrong, could anyone point to what it
>>>>>> is?
>>>>>> The input files for a 6 and 8 node run can be found here:
>>>>>> https://drive.google.com/drive/folders/1kro3ooa2OngvddB8RL-6Iyvdc07xADNJ?usp=sharing
>>>>>> I am using QE6.6.
>>>>>>
>>>>>> Kind regards
>>>>>> Lenz
>>>>>>
>>>>>> PhD Student (HZDR / CASUS)
>>>>>> _______________________________________________
>>>>>> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
>>>>>> users mailing list users@lists.quantum-espresso.org
>>>>>> https://lists.quantum-espresso.org/mailman/listinfo/users
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
>>>>> Univ. Udine, via delle Scienze 206, 33100 Udine, Italy
>>>>> Phone +39-0432-558216, fax +39-0432-558222
>>>>>
>>>>> _______________________________________________
>>>>> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
>>>>> users mailing list users@lists.quantum-espresso.org
>>>>> https://lists.quantum-espresso.org/mailman/listinfo/users
>>>>
>>>> _______________________________________________
>>>> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
>>>> users mailing list users@lists.quantum-espresso.org
>>>> https://lists.quantum-espresso.org/mailman/listinfo/users
>>>
>>>
>>>
>>> --
>>> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
>>> Univ. Udine, via delle Scienze 206, 33100 Udine, Italy
>>> Phone +39-0432-558216, fax +39-0432-558222
>>>
>>> _______________________________________________
>>> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
>>> users mailing list users@lists.quantum-espresso.org
>>> https://lists.quantum-espresso.org/mailman/listinfo/users
>>
>> _______________________________________________
>> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
>> users mailing list users@lists.quantum-espresso.org
>> https://lists.quantum-espresso.org/mailman/listinfo/users
>
>
>
> --
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 206, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
>
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users@lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users

_______________________________________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list users@lists.quantum-espresso.org
https://lists.quantum-espresso.org/mailman/listinfo/users

Re: [QE-users] MD runs out of memory with increasing number of cores

Reply via email to