Re: [OMPI users] Horovod Performance with OpenMPI

'George Bosilca' via Open MPI users Wed, 04 Jun 2025 09:40:57 -0700

Please ignore my prior answer, I just noticed you are running single-node.

In addition to Howard's suggestions, check if you have nvlink between GPUs.


  George.


On Wed, Jun 4, 2025 at 10:11 AM George Bosilca <bosi...@icl.utk.edu> wrote:

> What's the network on your cluster ? Without a very good network you
> cannot obtain anything closer to the single GPU, because the data exchanged
> between the two GPUs will become the bottleneck.
>
>   George.
>
>
> On Wed, Jun 4, 2025 at 5:56 AM Shruti Sharma <shrutic...@gmail.com> wrote:
>
>> Hi
>> I am currently running Horovod benchmarks in an intra-node setup.
>> However, I have observed that increasing the number of GPUs does not result
>> in a proportional increase in total throughput. Specifically, the
>> throughput per GPU with a single GPU is approximately 842.6 ± 2.4, whereas
>> with two GPUs, the total throughput is around 485.7 ± 44.8, which
>> translates to approximately 242.8 ± 22.4 per GPU.
>>
>> The configuration for the test is:
>> MPI : OpenMPI - 5.0.6
>> HOROVOD : 0.28.1
>> pytorch : 1.12.1
>> GPU : NVIDIA A100
>> CUDA : 11.8
>> Python : 3.10
>> GCC : 8.5.0
>>
>> command : mpirun -n 1 --report-bindings python
>> pytorch_synthetic_benchmark.py -batch-size=64 --model=resnet50
>> [gpu39:59123] Rank 0 bound package[0][core:0]
>>
>> Model: resnet50
>>
>> Batch size: 64
>>
>> Number of GPUs: 1
>>
>> Running warmup...
>>
>> Running benchmark...
>>
>> Iter #0: 844.3 img/sec per GPU
>>
>> Iter #1: 844.0 img/sec per GPU
>>
>> Iter #2: 843.6 img/sec per GPU
>>
>> Iter #3: 843.5 img/sec per GPU
>>
>> Iter #4: 843.5 img/sec per GPU
>>
>> Iter #5: 842.0 img/sec per GPU
>>
>> Iter #6: 841.3 img/sec per GPU
>>
>> Iter #7: 841.8 img/sec per GPU
>>
>> Iter #8: 841.1 img/sec per GPU
>>
>> Iter #9: 841.1 img/sec per GPU
>>
>> Img/sec per GPU: 842.6 +-2.4
>>
>> Total img/sec on 1 GPU(s): 842.6 +-2.4
>>
>>
>> Run with two GPU(s) on the same node
>> command : mpirun -n 2 --report-bindings python
>> pytorch_synthetic_benchmark.py -batch-size=64 --model=resnet50
>> [gpu39:59166] Rank 0 bound package[0][core:0]
>> [gpu39:59166] Rank 1 bound package[0][core:1]
>>
>> Model: resnet50
>>
>> Batch size: 64
>>
>> Number of GPUs: 2
>>
>> Running warmup...
>>
>> Running benchmark...
>>
>> Iter #0: 235.7 img/sec per GPU
>>
>> Iter #1: 251.5 img/sec per GPU
>>
>> Iter #2: 217.0 img/sec per GPU
>>
>> Iter #3: 239.4 img/sec per GPU
>>
>> Iter #4: 257.2 img/sec per GPU
>>
>> Iter #5: 258.3 img/sec per GPU
>>
>> Iter #6: 248.4 img/sec per GPU
>>
>> Iter #7: 242.6 img/sec per GPU
>>
>> Iter #8: 238.0 img/sec per GPU
>>
>> Iter #9: 240.3 img/sec per GPU
>>
>> Img/sec per GPU: 242.8 +-22.4
>>
>> Total img/sec on 2 GPU(s): 485.7 +-44.8
>>
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to users+unsubscr...@lists.open-mpi.org.
>>
>

To unsubscribe from this group and stop receiving emails from it, send an email 
to users+unsubscr...@lists.open-mpi.org.

Re: [OMPI users] Horovod Performance with OpenMPI

Reply via email to