Re: [OMPI users] Re: CUDA-Aware on OpenMPI v4 with CUDA IPC buffers

2025-06-03 Thread 'Tomislav Janjusic US' via Open MPI users
add --mca pml_base_verbose 90 And should see something like this: [rock18:3045236] select: component ucx selected [rock18:3045236] select: component ob1 not selected / finalized Or whatever your ompi instance selected. -Tommy On Tuesday, June 3, 2025 at 12:44:00 PM UTC-5 Mike Adams wrote: > mpiru

Re: [OMPI users] Re: CUDA-Aware on OpenMPI v4 with CUDA IPC buffers

2025-06-03 Thread Mike Adams
mpirun --mca btl_smcuda_use_cuda_ipc_same_gpu 0 --mca btl_smcuda_use_cuda_ipc 0 --map-by ppr:2:numa --bind-to core --rank-by slot --display-map --display-allocation --report-bindings ./multilane_ring_allreduce where there is 1 GPU per NUMA region. I am not sure which pml I'm using, but since th

Re: [OMPI users] Re: CUDA-Aware on OpenMPI v4 with CUDA IPC buffers

2025-06-03 Thread 'Tomislav Janjusic US' via Open MPI users
Can you post the full mpirun command? or at least the relevant mpi mca params? " I'm still curious about your input on whether or not those mca parameters I mentioned yesterday are disabling GPUDirect RDMA as well?" Even if you disable sm_cuda_ipc, it's possible you're still using cuda ipc via

Re: [OMPI users] Re: CUDA-Aware on OpenMPI v4 with CUDA IPC buffers

2025-05-31 Thread Mike Adams
Interestingly, I made an error - Delta on 4.1.5 did fail like some of the cases on Bridges2 on 4.0.5, but at 16 ranks per GPU. This is the core count of the AMD processor on Delta with 4 GPUs. So, it looks like Bridges2 needs an OpenMPI upgrade. Tommy, I'm still curious about your input on wh

Re: [OMPI users] Re: CUDA-Aware on OpenMPI v4 with CUDA IPC buffers

2025-05-30 Thread Mike Adams
Dmitry, I'm not too familiar with the internals of OpenMPI, but I just tried 4.1.5 on NCSA Delta and received the same IPC errors (no mca flags switched). The actual calls didn't fail this time to perform the actual operation, so maybe that's an improvement from v4.0.x to v4.1.x? Thanks, Mi

Re: [OMPI users] Re: CUDA-Aware on OpenMPI v4 with CUDA IPC buffers

2025-05-30 Thread Dmitry N. Mikushin
There is a relevant explanation of the same issue reported for Julia: https://github.com/JuliaGPU/CUDA.jl/issues/1053 пт, 30 мая 2025 г. в 19:05, Mike Adams : > Hi Tommy, > > I'm setting btl_smcuda_use_cuda_ipc_same_gpu 0 and btl_smcuda_use_cuda_ipc 0. > > So, are you saying that with these param

[OMPI users] Re: CUDA-Aware on OpenMPI v4 with CUDA IPC buffers

2025-05-30 Thread Mike Adams
Hi Tommy, I'm setting btl_smcuda_use_cuda_ipc_same_gpu 0 and btl_smcuda_use_cuda_ipc 0. So, are you saying that with these params, it is also not using GPUDirect RDMA? PSC Bridges 2 only has v4 OpenMPI, but they may be working on installing v5 now. Everything works on v5 on NCSA Delta - I'll

[OMPI users] Re: CUDA-Aware on OpenMPI v4 with CUDA IPC buffers

2025-05-30 Thread 'Tomislav Janjusic US' via Open MPI users
Hi, I'm not sure if it's a known issue, in v4.0 possibly, not sure about v4.1 or v5.0 - can you try? As far as CUDA IPC - how are you disabling it? I don't remember the mca params in v4.0 If it's either through pml ucx, or smcuda then no, it won't use it. -Tommy On Saturday, May 24, 2025 at 8: