Re: [QE-users] Out of Memory on GPU cluster with SOC

Paolo Giannozzi Thu, 11 Sep 2025 12:12:11 -0700

Spin-orbit calculations should take approx. 4 times the memory ofunpolarized calculations. Without the input file it is hard to say more.


Paolo


On 9/10/2025 10:11 AM, Christian Kern wrote:

[Non ricevi spesso messaggi di posta elettronica da christian.kern@uni-graz.at. Per informazioni sull'importanza di questo fatto, visitahttps://aka.ms/LearnAboutSenderIdentification.]


Dear all!

I am running a fairly large system (11796 electrons, 784 atoms including
Au, W, S, and unit cell volume ~24200 Angstrom^3) with QE 7.4 on the GPU
partition of the Leonardo cluster (4 A100 GPUs coupled with 32 cores per
node). This calculation can be easily run (scf convergence in ~4 hours
on 12 nodes), even for larger cells (I tried up to ~1200 atoms) with
pseudodojo PAW potentials and 30 Ry ecutwfc on the Gamma point.

However, now I want to run the same calculation with spin-orbit coupling
(ispin=4, etc.) and QE always fails due to memory allocation issues.
This is regardless of the number of nodes that I use and the estimated
max. dynamic memory, reported in the output file, is below the available
memory per mpi-process/GPU (64 GB). Running on 64 nodes (=256
mpi-procs/GPUs), I can get this estimated RAM requirement down to ~12GB
per GPU. Despite that, I get an error that ~16GB cannot be allocated in
the first step. Surprisingly, those ~16GB are always the same number,
regardless of the number of nodes that I use and regardless of the
estimated memory consumption. I am using the davidson algorithm and NC
PP with 60 Ry cutoff here, but have also tested relativistic PAW PP,
lower cutoffs, less vacuum and the conjugent-gradient algorithm. None of
this helped. What can the problem be here with the memory allocation? In
smaller systems I have no problems with SOC calculations on GPUs...

As far as I understand, the maximum number of mpi-processes/GPUs for a
Gamma point scf calculation is given by the third dimension of the FFT
mesh, and parallelization in task groups is not available for GPUs? Then
the only way to further reduce the memory demand is by using "-ndiag".
In my case, this prolongs the execution time before the job dies, but I
am still running out of memory, although now because of ridiculously
small ammounts (~100MB). I tried up to "-ndiag 64"...

Looking forward to your suggestions,
Christian Kern
_______________________________________________________________________________

The Quantum ESPRESSO Foundation stands in solidarity with all civiliansworldwide who are victims of terrorism, military aggression, andindiscriminate warfare.

--------------------------------------------------------------------------------
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list [email protected]
https://lists.quantum-espresso.org/mailman/listinfo/users


--
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 206, 33100 Udine Italy, +39-0432-558216

_______________________________________________________________________________
The Quantum ESPRESSO Foundation stands in solidarity with all civilians 
worldwide who are victims of terrorism, military aggression, and indiscriminate 
warfare.
--------------------------------------------------------------------------------
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list [email protected]
https://lists.quantum-espresso.org/mailman/listinfo/users

Re: [QE-users] Out of Memory on GPU cluster with SOC

Reply via email to