Dear Prof. Giannozzi, Thanks so much for the insight! I realize I might have left out a crucial piece of information: The oom error does not appear right away, it appears after a certain number of time steps (and as far as I can tell, somewhat reproducibly). For the 192 core example I've sent, this number was 4 time steps. Parallel to these iron calculations I am also doing investigations on Be, and have found a similar behavior there. I have uploaded the input files for a Beryllium run in the Google Drive folder of my first message. For this calculation, I can do ~2700 time steps just fine (which took about 8 hours) and only get the oom error then. Is there some sort of option I am forgetting to set that leads to some arrays being accumulated and eventually overflowing?
I understand that just using more and more processors will not necessarily give me a better performance, but during the performance test I did I found that by going from 48 processors to 144 I could reduce the average time per time step from over 1000s to 200s (a plot for this is in the google drive folder as well). I am aiming for ~30s per time step, since I want to perform 10000 time steps to get a 10ps trajectory, thus I was trying to investigate how performance would be affected if I used slightly more processors. I will try the ntg option. The best performance I was able to achieve so far was with 144 cores defaulting to -nb 144, so am I correct to assume that I should try e.g. -nb 144 -ntg 2 for 288 cores? The 80Ry cutoff was the result of a convergence analysis I did for this system, although I could maybe decrease this number since I am more interested in sampling configurations for a Machine Learning application and less in macroscopic properties derived directly from the MD calculation. Kind regards Lenz PhD Student (HZDR / CASUS) Am Mi., 16. Juni 2021 um 07:33 Uhr schrieb Paolo Giannozzi < p.gianno...@gmail.com>: > Hard to say without knowing exactly what goes out of which memory limits. > Note that not all arrays are distributed across processors, so a > considerable number of arrays are replicated on all processes. As a > consequence the total amount of required memory will increase with the > number of mpi processes. Also note that a 128-atom cell is not "large" and > 144 cores are not "a small number of processors". You will not get any > advantage by just increasing the number of processors any more, quite the > opposite. If you have too many idle cores, you should consider > - "task group" parallelization (option -ntg) > - MPI+OpenMP parallelization (configure --enable-openmp) > Please also note that ecutwfc=80 Ry is a rather large cutoff for a USPP > (while ecutrho=320 is fine) and that running with K_POINTS Gamma instead of > 1 1 1 0 0 0 will be faster and take less memory. > > Paolo > > On Mon, Jun 14, 2021 at 4:22 PM Lenz Fiedler <fiedler.l...@gmail.com> > wrote: > >> Dear users, >> >> I am trying to perform a MD simulation for a large cell (128 Fe atoms, >> gamma point) using pw.x and I get a strange scaling behavior. To test the >> performance I ran the same MD simulation with an increasing number of nodes >> (2, 4, 6, 8, etc.) using 24 cores per node. The simulation is successful >> when using 2, 4, and 6 nodes, so 48, 96 and 144 cores resp (albeit slow, >> which is within my expectations for such a small number of processors). >> Going to 8 and more nodes, I run into an out-of-memory error after about >> two time steps. >> I am a little bit confused as to what could be the reason. Since a >> smaller amount of cores works I would not expect a higher number of cores >> to run without an oom error as well. >> The 8 node run explictly outputs at the beginning: >> " Estimated max dynamical RAM per process > 140.54 MB >> Estimated total dynamical RAM > 26.35 GB >> " >> >> which is well within the 2.5 GB I have allocated for each core. >> I am obviously doing something wrong, could anyone point to what it is? >> The input files for a 6 and 8 node run can be found here: >> https://drive.google.com/drive/folders/1kro3ooa2OngvddB8RL-6Iyvdc07xADNJ?usp=sharing >> I am using QE6.6. >> >> Kind regards >> Lenz >> >> PhD Student (HZDR / CASUS) >> _______________________________________________ >> Quantum ESPRESSO is supported by MaX (www.max-centre.eu) >> users mailing list users@lists.quantum-espresso.org >> https://lists.quantum-espresso.org/mailman/listinfo/users > > > > -- > Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche, > Univ. Udine, via delle Scienze 206, 33100 Udine, Italy > Phone +39-0432-558216, fax +39-0432-558222 > > _______________________________________________ > Quantum ESPRESSO is supported by MaX (www.max-centre.eu) > users mailing list users@lists.quantum-espresso.org > https://lists.quantum-espresso.org/mailman/listinfo/users
_______________________________________________ Quantum ESPRESSO is supported by MaX (www.max-centre.eu) users mailing list users@lists.quantum-espresso.org https://lists.quantum-espresso.org/mailman/listinfo/users