Re: [OMPI users] Infiniband performance Problem and stalling

Randolph Pullen Sun, 2 Sep 2012 22:25:20 -0400

No RoCE, Just native IB with TCP over the top.



No I haven't used 1.6 I was trying to stick with the standards on the mellanox 
disk.
Is there a known problem with 1.4.3 ?


________________________________
 From: Yevgeny Kliteynik <klit...@dev.mellanox.co.il>
To: Randolph Pullen <randolph_pul...@yahoo.com.au>; Open MPI Users 
<us...@open-mpi.org> 
Sent: Sunday, 2 September 2012 10:54 PM
Subject: Re: [OMPI users] Infiniband performance Problem and stalling
 
Randolph,

Some clarification on the setup:

"Melanox III HCA 10G
 cards" - are those ConnectX 3 cards configured to Ethernet?
That is, when you're using openib BTL, you mean RoCE, right?

Also, have you had a chance to try some newer OMPI release?
Any 1.6.x would do.


-- YK

On 8/31/2012 10:53 AM, Randolph Pullen wrote:
> (reposted with consolidatedinformation)
> I have a test rig comprising 2 i7 systems 8GB RAM with Melanox III HCA 10G 
> cards
> running Centos 5.7 Kernel 2.6.18-274
> Open MPI 1.4.3
> MLNX_OFED_LINUX-1.5.3-1.0.0.2 (OFED-1.5.3-1.0.0.2):
> On a Cisco 24 pt switch
> Normal performance is:
> $ mpirun --mca btl openib,self -n 2 -hostfile mpi.hosts PingPong
> results in:
> Max rate = 958.388867 MB/sec Min latency = 4.529953 usec
> and:
> $ mpirun --mca btl tcp,self -n 2 -hostfile mpi.hosts PingPong
> Max rate = 653.547293 MB/sec Min latency = 19.550323 usec
> NetPipeMPI results show a max of 7.4
 Gb/s at 8388605 bytes which seems fine.
> log_num_mtt =20 and log_mtts_per_seg params =2
> My application exchanges about a gig of data between the processes with 2 
> sender and 2 consumer processes on each node with 1 additional controller 
> process on the starting node.
> The program splits the data into 64K blocks and uses non blocking sends and 
> receives with busy/sleep loops to monitor progress until completion.
> Each process owns a single buffer for these 64K blocks.
> My problem is I see better performance under IPoIB then I do on native IB 
> (RDMA_CM).
> My understanding is that IPoIB is limited to about 1G/s so I am at a loss to 
> know why it is faster.
> These 2 configurations are equivelant (about 8-10 seconds per cycle)
> mpirun --mca btl_openib_flags 2 --mca mpi_leave_pinned 1 --mca btl tcp,self 
> -H vh2,vh1 -np 9 --bycore prog
> mpirun --mca btl_openib_flags 3 --mca mpi_leave_pinned 1 --mca btl
 tcp,self -H vh2,vh1 -np 9 --bycore prog
> And this one produces similar run times but seems to degrade with repeated 
> cycles:
> mpirun --mca btl_openib_eager_limit 64 --mca mpi_leave_pinned 1 --mca btl 
> openib,self -H vh2,vh1 -np 9 --bycore prog
> Other btl_openib_flags settings result in much lower performance.
> Changing the first of the above configs to use openIB results in a 21 second 
> run time at best. Sometimes it takes up to 5 minutes.
> In all cases, OpenIB runs in twice the time it takes TCP,except if I push the 
> small message max to 64K and force short messages. Then the openib times are 
> the same as TCP and no faster.
> With openib:
> - Repeated cycles during a single run seem to slow down with each cycle
> (usually by about 10 seconds).
> - On occasions it seems to stall indefinitely, waiting on a single receive.
> I'm still at a loss as to why. I can’t find any errors logged during
 the runs.
> Any ideas appreciated.
> Thanks in advance,
> Randolph
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Infiniband performance Problem and stalling

Reply via email to