[QE-users] Poor GPU scaling for Gamma-point-only calculation on multiple GPUs

Wang Xing Mon, 12 May 2025 23:48:13 -0700

Hi all,
I’m running a Gamma-point-only SCF calculation for a nanoparticle system using 
Quantum ESPRESSO with GPU support. The HPC node has 4 NVIDIA GH200 GPUs, and I 
observe very poor scaling behavior when increasing the number of GPUs.
Setup
I use 1 MPI task per GPU, with 72 OpenMP threads per task:


  *
OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
  *
QE version: 7.4 (GPU build with OpenMP)
  *
MPI: OpenMPI (GPU-aware)

Here is the relevant SLURM configuration:
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=72
#SBATCH --gpus=4
#SBATCH --gpu-bind=map_gpu:0,1,2,3
#SBATCH --time=00:30:00

module use Spack
module load nvhpc/24.11 openmpi/main-7zgw-GH200-gpu quantum-espresso/7.4-gpu-omp

srun pw.x -pd .true. -npool 1 -in aiida.in > aiida.out


Performance Results (time per SCF iteration)
Configuration
Time (sec)
1 task, 1 GPU
11.9
2 tasks, 2 GPUs
17.0
3 tasks, 3 GPUs
15.3
4 tasks, 4 GPUs
9.7

As you can see, the best performance is with just 1 task + 1 GPU. Increasing 
the number of GPUs initially worsens the performance, and only with 4 GPUs does 
it slightly improve.

>From the output:

GPU acceleration is ACTIVE. 1 visible GPUs per MPI rank
GPU-aware MPI enabled
Message from routine print_cuda_info:
  High GPU oversubscription detected. Are you sure this is what you want?

I tried --gpu-bind=map_gpu:0,1,2,3 to explicitly bind GPUs to ranks, but the 
warning still appears, and performance doesn’t change. I’ve also experimented 
with -nb and -ndiag parameters, but they either don’t help or make things worse.
Question:
Does anyone have experience optimizing Gamma-point-only calculations on 
multiple GPUs? Is there a known bottleneck or best practice for using multiple 
GPUs efficiently in such a case?
Any insights would be greatly appreciated.
Best,
Xing

_______________________________________________________________________________
The Quantum ESPRESSO Foundation stands in solidarity with all civilians 
worldwide who are victims of terrorism, military aggression, and indiscriminate 
warfare.
--------------------------------------------------------------------------------
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list [email protected]
https://lists.quantum-espresso.org/mailman/listinfo/users

[QE-users] Poor GPU scaling for Gamma-point-only calculation on multiple GPUs

Reply via email to