It's not easy. There is a trick (see dev-tools/mem_counter) to track the allocated memory but it requires some recompilation (possibly after some tweaking) and says something only on memory allocated with Fortran "allocate". And of course, I have tried it and it does not show anything suspicious. Otherwise: monitor memory usage with "memstat"
Paolo On Wed, Jun 23, 2021 at 9:23 PM Lenz Fiedler <fiedler.l...@gmail.com> wrote: > Dear Prof. Giannozzi, > > ah, I understand, that makes sense. Do you have any advice on how to best > track such a memory leak down in this case? The behavior is reproducible > with my setup. > > Kind regards > Lenz > > > > Am Mi., 23. Juni 2021 um 14:02 Uhr schrieb Paolo Giannozzi < > p.gianno...@gmail.com>: > >> On Wed, Jun 23, 2021 at 1:31 PM Lenz Fiedler <fiedler.l...@gmail.com> >> wrote: >> >> (and the increase in number of processes is most likely the reason for >>> the error, if I understand you correctly?) >>> >> >> not exactly. Too many processes may result in too much global memory >> usage, because some arrays are replicated on each process. If you exceed >> the global available memory, the code will crash. BUT: it will do so during >> the first MD step, not after 2000 MD steps. The memory usage should not >> increase with the number of MD time steps. If it does, there is a memory >> leak, either in the code or somewhere else (libraries etc). >> >> Paolo >> >> this is not a problem. For my Beryllium calculation it is more >>> problematic since the 144 processors case really gives the best performance >>> (I have uploaded a file called performance_Be128.png to show my timing >>> results), but I still run out of memory after 2700 time steps. Although >>> this is also manageable, since I can always restart the calculation and >>> perform another 2700 time steps. With this I was able to perform 10.000 >>> time steps in just over a day. I am running more calculations on larger Be >>> and Fe cells and I will investigate this behavior there. >>> >>> I have also used the "gamma" option for the K-points to use the >>> performance benefits you outlined. For the Fe128 cell, I achieved optimal >>> performance with 144 processors and using the "gamma" option (resulting in >>> about 90s per SCF cycle). I am still not within my personal target of ~30s >>> per SCF cycle but I will start looking into the choice of my PSP and cutoff >>> (along with considering OpenMP and task group parallelization) rather than >>> blindly throwing more and more processors at the problem. >>> >>> Kind regards >>> Lenz >>> >>> PhD Student (HZDR / CASUS) >>> >>> >>> Am Sa., 19. Juni 2021 um 09:25 Uhr schrieb Paolo Giannozzi < >>> p.gianno...@gmail.com>: >>> >>>> I tried your Fe job on a 36-core machine (with Gamma point to save time >>>> and memory) and found no evidence of memory leaks after more than 100 >>>> steps. >>>> >>>> The best performance I was able to achieve so far was with 144 cores >>>>> defaulting to -nb 144, so am I correct to assume that I should try e.g. >>>>> -nb >>>>> 144 -ntg 2 for 288 cores? >>>>> >>>> >>>> You should not use option -nb except in some rather special cases. >>>> >>>> Paolo >>>> >>>> >>>> PhD Student (HZDR / CASUS) >>>>> >>>>> Am Mi., 16. Juni 2021 um 07:33 Uhr schrieb Paolo Giannozzi < >>>>> p.gianno...@gmail.com>: >>>>> >>>>>> Hard to say without knowing exactly what goes out of which memory >>>>>> limits. Note that not all arrays are distributed across processors, so a >>>>>> considerable number of arrays are replicated on all processes. As a >>>>>> consequence the total amount of required memory will increase with the >>>>>> number of mpi processes. Also note that a 128-atom cell is not "large" >>>>>> and >>>>>> 144 cores are not "a small number of processors". You will not get any >>>>>> advantage by just increasing the number of processors any more, quite the >>>>>> opposite. If you have too many idle cores, you should consider >>>>>> - "task group" parallelization (option -ntg) >>>>>> - MPI+OpenMP parallelization (configure --enable-openmp) >>>>>> Please also note that ecutwfc=80 Ry is a rather large cutoff for a >>>>>> USPP (while ecutrho=320 is fine) and that running with K_POINTS Gamma >>>>>> instead of 1 1 1 0 0 0 will be faster and take less memory. >>>>>> >>>>>> Paolo >>>>>> >>>>>> On Mon, Jun 14, 2021 at 4:22 PM Lenz Fiedler <fiedler.l...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Dear users, >>>>>>> >>>>>>> I am trying to perform a MD simulation for a large cell (128 Fe >>>>>>> atoms, gamma point) using pw.x and I get a strange scaling behavior. To >>>>>>> test the performance I ran the same MD simulation with an increasing >>>>>>> number >>>>>>> of nodes (2, 4, 6, 8, etc.) using 24 cores per node. The simulation is >>>>>>> successful when using 2, 4, and 6 nodes, so 48, 96 and 144 cores resp >>>>>>> (albeit slow, which is within my expectations for such a small number of >>>>>>> processors). >>>>>>> Going to 8 and more nodes, I run into an out-of-memory error after >>>>>>> about two time steps. >>>>>>> I am a little bit confused as to what could be the reason. Since a >>>>>>> smaller amount of cores works I would not expect a higher number of >>>>>>> cores >>>>>>> to run without an oom error as well. >>>>>>> The 8 node run explictly outputs at the beginning: >>>>>>> " Estimated max dynamical RAM per process > 140.54 MB >>>>>>> Estimated total dynamical RAM > 26.35 GB >>>>>>> " >>>>>>> >>>>>>> which is well within the 2.5 GB I have allocated for each core. >>>>>>> I am obviously doing something wrong, could anyone point to what it >>>>>>> is? >>>>>>> The input files for a 6 and 8 node run can be found here: >>>>>>> https://drive.google.com/drive/folders/1kro3ooa2OngvddB8RL-6Iyvdc07xADNJ?usp=sharing >>>>>>> I am using QE6.6. >>>>>>> >>>>>>> Kind regards >>>>>>> Lenz >>>>>>> >>>>>>> PhD Student (HZDR / CASUS) >>>>>>> _______________________________________________ >>>>>>> Quantum ESPRESSO is supported by MaX (www.max-centre.eu) >>>>>>> users mailing list users@lists.quantum-espresso.org >>>>>>> https://lists.quantum-espresso.org/mailman/listinfo/users >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche, >>>>>> Univ. Udine, via delle Scienze 206, 33100 Udine, Italy >>>>>> Phone +39-0432-558216, fax +39-0432-558222 >>>>>> >>>>>> _______________________________________________ >>>>>> Quantum ESPRESSO is supported by MaX (www.max-centre.eu) >>>>>> users mailing list users@lists.quantum-espresso.org >>>>>> https://lists.quantum-espresso.org/mailman/listinfo/users >>>>> >>>>> _______________________________________________ >>>>> Quantum ESPRESSO is supported by MaX (www.max-centre.eu) >>>>> users mailing list users@lists.quantum-espresso.org >>>>> https://lists.quantum-espresso.org/mailman/listinfo/users >>>> >>>> >>>> >>>> -- >>>> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche, >>>> Univ. Udine, via delle Scienze 206, 33100 Udine, Italy >>>> Phone +39-0432-558216, fax +39-0432-558222 >>>> >>>> _______________________________________________ >>>> Quantum ESPRESSO is supported by MaX (www.max-centre.eu) >>>> users mailing list users@lists.quantum-espresso.org >>>> https://lists.quantum-espresso.org/mailman/listinfo/users >>> >>> _______________________________________________ >>> Quantum ESPRESSO is supported by MaX (www.max-centre.eu) >>> users mailing list users@lists.quantum-espresso.org >>> https://lists.quantum-espresso.org/mailman/listinfo/users >> >> >> >> -- >> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche, >> Univ. Udine, via delle Scienze 206, 33100 Udine, Italy >> Phone +39-0432-558216, fax +39-0432-558222 >> >> _______________________________________________ >> Quantum ESPRESSO is supported by MaX (www.max-centre.eu) >> users mailing list users@lists.quantum-espresso.org >> https://lists.quantum-espresso.org/mailman/listinfo/users > > _______________________________________________ > Quantum ESPRESSO is supported by MaX (www.max-centre.eu) > users mailing list users@lists.quantum-espresso.org > https://lists.quantum-espresso.org/mailman/listinfo/users -- Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche, Univ. Udine, via delle Scienze 206, 33100 Udine, Italy Phone +39-0432-558216, fax +39-0432-558222
_______________________________________________ Quantum ESPRESSO is supported by MaX (www.max-centre.eu) users mailing list users@lists.quantum-espresso.org https://lists.quantum-espresso.org/mailman/listinfo/users