Did "--mca mpi_preconnect_all 1" work?


I also face this problem "readv failed: connection time out" in the production 
environment, and our engineer has reproduced this scenario at 20 nodes with 
gigabye ethernet and limit one ethernet speed to 2MB/s, then a MPI_Isend && 
MPI_Recv ring that means each node call MPI_Isend send data to the next node 
and then call MPI_Recv recv data from the prior with large size for many 
cycles, then we get the following error log:

[btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv  
failed: Connection timed out (110)



our environment:

Open MPI version 1.3.1,  using btl tcp component.



I thought it might because the network fd was set nonblocking, and the 
nonblocking call of connect() might be error and the epoll_wait() was wake up 
by the error but treat it as success and call 
mca_btl_tcp_endpoint_recv_handler(), the nonblocking readv() call on a failed 
connected fd, so it return -1, and set the errorno to 110 which means 
connection timed out.


> From: ljdu...@scinet.utoronto.ca
> Date: Tue, 20 Apr 2010 09:24:17 -0400
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] 'readv failed: Connection timed out' issue
> 
> On 2010-04-20, at 9:18AM, Terry Dontje wrote:
> 
> > Hi Jonathan,
> > 
> > Do you know what the top level function is or communication pattern? Is it 
> > some type of collective or a pattern that has a many to one. 
> 
> Ah, should have mentioned. The best-characterized code that we're seeing this 
> with is an absolutely standard (logically) regular grid hydrodynamics code, 
> only does nearest neighbour communication for exchanging guardcells; the Wait 
> in this case is, I think, just a matter of overlapping communication with 
> computation of the inner zones. There are things like allreduces in there, as 
> well, for setting timesteps, but the communication pattern is overall 
> extremely regular and well-behaved.
> 
> > What might be happening is that since OMPI uses a lazy connections by 
> > default if all processes are trying to establish communications to the same 
> > process you might run into the below.
> > 
> > You might want to see if setting "--mca mpi_preconnect_all 1" helps any. 
> > But beware this will cause your startup to increase. However, this might 
> > give us insight as to whether the problem is flooding a single rank with 
> > connect requests.
> 
> I'm certainly willing to try it.
> 
> - Jonathan
> 
> -- 
> Jonathan Dursi <ljdu...@scinet.utoronto.ca>
> 
> 
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

_________________________________________________________________
一张照片的自白――Windows Live照片的可爱视频介绍
http://windowslivesky.spaces.live.com/blog/cns!5892B6048E2498BD!889.entry

Reply via email to