Sorry for the delay in replying; this mail slipped by me in my inbox.



On Apr 26, 2009, at 11:50 PM, Rangesh Gupta wrote:

Hi all,

I m facing problem while running Openfoam1.5 the executable is sonicTurbFoam with the help of openmpi it hang after some time, every time it hang at different place. The Mpi command is

mpirun --mca btl_openib_if_include ib0 -mca btl_tcp_if_exclude lo,eth0,eth1 --mca btl_openib_ib_timeout 40 -n $NO_OF_PROCESS - machinefile $MYHOSTS $1 -parallel

FWIW, if you're submitting via slurm, the -machinefile and -n options shouldn't be necessary -- it should get those directly from SLURM.

We are using 64 processor on 8 nodes.

I m submitting it with the help of lsf scheduler and internally it usage SLURM as a resource manager.

Error :
[n112][0,1,41][btl_tcp_frag.c:
202:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed with errno=110 [n112][0,1,43][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed with errno=110

errno=110 is timeout on Linux. Do you happen to have firewalling enabled on your compute nodes? OMPI needs to be able to use random TCP ports to connect between all of the processes in an MPI job.

--
Jeff Squyres
Cisco Systems

Reply via email to