Please ignore my prior answer, I just noticed you are running single-node. In addition to Howard's suggestions, check if you have nvlink between GPUs.
George. On Wed, Jun 4, 2025 at 10:11 AM George Bosilca <bosi...@icl.utk.edu> wrote: > What's the network on your cluster ? Without a very good network you > cannot obtain anything closer to the single GPU, because the data exchanged > between the two GPUs will become the bottleneck. > > George. > > > On Wed, Jun 4, 2025 at 5:56 AM Shruti Sharma <shrutic...@gmail.com> wrote: > >> Hi >> I am currently running Horovod benchmarks in an intra-node setup. >> However, I have observed that increasing the number of GPUs does not result >> in a proportional increase in total throughput. Specifically, the >> throughput per GPU with a single GPU is approximately 842.6 ± 2.4, whereas >> with two GPUs, the total throughput is around 485.7 ± 44.8, which >> translates to approximately 242.8 ± 22.4 per GPU. >> >> The configuration for the test is: >> MPI : OpenMPI - 5.0.6 >> HOROVOD : 0.28.1 >> pytorch : 1.12.1 >> GPU : NVIDIA A100 >> CUDA : 11.8 >> Python : 3.10 >> GCC : 8.5.0 >> >> command : mpirun -n 1 --report-bindings python >> pytorch_synthetic_benchmark.py -batch-size=64 --model=resnet50 >> [gpu39:59123] Rank 0 bound package[0][core:0] >> >> Model: resnet50 >> >> Batch size: 64 >> >> Number of GPUs: 1 >> >> Running warmup... >> >> Running benchmark... >> >> Iter #0: 844.3 img/sec per GPU >> >> Iter #1: 844.0 img/sec per GPU >> >> Iter #2: 843.6 img/sec per GPU >> >> Iter #3: 843.5 img/sec per GPU >> >> Iter #4: 843.5 img/sec per GPU >> >> Iter #5: 842.0 img/sec per GPU >> >> Iter #6: 841.3 img/sec per GPU >> >> Iter #7: 841.8 img/sec per GPU >> >> Iter #8: 841.1 img/sec per GPU >> >> Iter #9: 841.1 img/sec per GPU >> >> Img/sec per GPU: 842.6 +-2.4 >> >> Total img/sec on 1 GPU(s): 842.6 +-2.4 >> >> >> Run with two GPU(s) on the same node >> command : mpirun -n 2 --report-bindings python >> pytorch_synthetic_benchmark.py -batch-size=64 --model=resnet50 >> [gpu39:59166] Rank 0 bound package[0][core:0] >> [gpu39:59166] Rank 1 bound package[0][core:1] >> >> Model: resnet50 >> >> Batch size: 64 >> >> Number of GPUs: 2 >> >> Running warmup... >> >> Running benchmark... >> >> Iter #0: 235.7 img/sec per GPU >> >> Iter #1: 251.5 img/sec per GPU >> >> Iter #2: 217.0 img/sec per GPU >> >> Iter #3: 239.4 img/sec per GPU >> >> Iter #4: 257.2 img/sec per GPU >> >> Iter #5: 258.3 img/sec per GPU >> >> Iter #6: 248.4 img/sec per GPU >> >> Iter #7: 242.6 img/sec per GPU >> >> Iter #8: 238.0 img/sec per GPU >> >> Iter #9: 240.3 img/sec per GPU >> >> Img/sec per GPU: 242.8 +-22.4 >> >> Total img/sec on 2 GPU(s): 485.7 +-44.8 >> >> To unsubscribe from this group and stop receiving emails from it, send an >> email to users+unsubscr...@lists.open-mpi.org. >> > To unsubscribe from this group and stop receiving emails from it, send an email to users+unsubscr...@lists.open-mpi.org.