Re: [QE-users] Poor GPU scaling for Gamma-point-only calculation on multiple GPUs

Wang Xing Tue, 13 May 2025 12:06:06 -0700

Dear Paolo,

Thank you very much for your kind reply.

> 72 OpenMP threads are too many, and in any case: is your compute node a 
> 4*72=288 CPU one?

Yes, my compute node has 288 CPUs.

>  1. how to convince your machine to run the code the way it should (and not 
> the way it shouldn't)

I tested the same slurm script for Si with 125 atoms with 5x5x5 kpoints, and 
with "-npool 4" for 4 GPUs, it is ~15x faster compared to running on MPI-only 
128 CPUs. Very nice! Although I saw the same warning message, "High GPU 
oversubscription detected. Are you sure this is what you want?". The 
significant speedup suggests the setup of the GPUs should be correct.

However, the speedup is not observed for the gamma-point only calculation, and 
it worsens the performance when using 2 GPUs compared to 1 GPU.

> Once you are sure that the code runs the way it should, have a look at the 
> time reports for 1, 2, 4 GPUs, with no OpenMP threads.

I ran the tests without an OpenMP thread, and they show similar trends to those 
with 72 OpenMP threads. Indeed, the OpenMPI threads does not help a lot.

GPU   |     72 Open MP threads  |  no OpenMP thread |
1          |  11.9 secs                           |   13.8 secs     |
2          |    7.0 secs                           |   18.0 secs    |
4          |     9.7 secs                          |   10.9 secs    |

> Note that at Gamma point you have no "easy" parallelization levels to exploit.

Yes. I use multiple GPUs mainly to avoid the "out of memory" issue, but it turn 
out that this can harm the performance in the gamma-point only case.

Best,
Xing

Scientist, Paul Scherrer Institute (PSI)

________________________________
From: Paolo Giannozzi <[email protected]>
Sent: Tuesday, May 13, 2025 7:58 PM
To: Quantum ESPRESSO users Forum <[email protected]>; Wang Xing 
<[email protected]>
Subject: Re: [QE-users] Poor GPU scaling for Gamma-point-only calculation on 
multiple GPUs

On 13/05/2025 08:47, Wang Xing wrote:

> #SBATCH --ntasks-per-node=4
> #SBATCH --cpus-per-task=72

72 OpenMP threads are too many, and in any case: is your compute node a
4*72=288 CPU one?

> srun pw.x -pd .true. -npool 1 -in aiida.in > aiida.out

-pd has no effect for GPUs, I think

> GPU acceleration is ACTIVE. 1 visible GPUs per MPI rank

this doesn't look right to me: it should say 4 (but I don't know how
reliable this message is)

> GPU-aware MPI enabled
> Message from routine print_cuda_info:
>    High GPU oversubscription detected. Are you sure this is what you want?

this also doesn't look right: it seems to indicate that all four MPI
processes access a single GPU (but I don't know how reliable this
message is)

> Does anyone have experience optimizing Gamma-point-only calculations on
> multiple GPUs? Is there a known bottleneck or best practice for using
> multiple GPUs efficiently in such a case?

there are two distinct aspects here:
1. how to convince your machine to run the code the way it should (and
not the way it shouldn't) and
2. how to optimized the parallelization over GPUs.
I can't say anything about the former point: it is a task for system
administrators. Once you are sure that the code runs the way it should,
have a look at the time reports for 1, 2, 4 GPUs, with no OpenMP
threads. You should easily spot anomalies or bottlenecks. Note that at
Gamma point you have no "easy" parallelization levels to exploit.

Paolo

> Any insights would be greatly appreciated.
> Best,
> Xing
>
>
> _______________________________________________________________________________
> The Quantum ESPRESSO Foundation stands in solidarity with all civilians 
> worldwide who are victims of terrorism, military aggression, and 
> indiscriminate warfare.
> --------------------------------------------------------------------------------
> Quantum ESPRESSO is supported by MaX 
> (www.max-centre.eu<http://www.max-centre.eu>)
> users mailing list [email protected]
> https://lists.quantum-espresso.org/mailman/listinfo/users

--
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 206, 33100 Udine Italy, +39-0432-558216

_______________________________________________________________________________
The Quantum ESPRESSO Foundation stands in solidarity with all civilians 
worldwide who are victims of terrorism, military aggression, and indiscriminate 
warfare.
--------------------------------------------------------------------------------
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list [email protected]
https://lists.quantum-espresso.org/mailman/listinfo/users

Re: [QE-users] Poor GPU scaling for Gamma-point-only calculation on multiple GPUs

Reply via email to