Resuscitating this thread...

Well, we spent some time testing the various options, and Leonardo's
suggestion seems to work!

We disabled TCP Segment Offloading on the e1000 NICs using "ethtool -K
eth<X> tso off" and this type of crash no longer happens.

I hope this message can help anyone else experiencing the same issues. 
Thanks Leonardo!

OMPI devs: does this imply bug(s) in the e1000 driver/chip?  Should I
contact the driver authors?


On Fri, 10 Oct 2008 12:42:19 -0400, "V. Ram" <v_r_...@fastmail.fm> said:
> Leonardo,
> 
> These nodes are all using intel e1000 chips.  As the nodes are AMD
> K7-based, these are the older chips, not the new ones with all the
> eeprom issues with the newer kernel.
> 
> The kernel in use is from the 2.6.22 family, and the e1000 driver is the
> one shipped with the kernel.  I am running it compiled into the kernel,
> not as a module.
> 
> When testing using the intel MPI Benchmarks, I found that increasing the
> receive ring buffer size to the max (4096) helped performance, so I use
> ethtool -G on startup.
> 
> Checking ethtool -k, I see that tcp segment offload is on.  I can try
> turning that off to see what happens.
> 
> Oddly, on 64-bit nodes using the tg3 driver, this code doesn't crash or
> have these same issues, and I'm not having to turn off tso.
> 
> Can anyone else suggest why the code might be crashing when running over
> ethernet and not over shared memory?  Any suggestions on how to debug
> this or interpret the error message issued from btl_tcp_frag.c ?
> 
> Thanks.
> 
> 
> On Wed, 01 Oct 2008 18:11:34 +0200, "Leonardo Fialho"
> <lfia...@aomail.uab.es> said:
> > Ram,
> > 
> > What is the name and version of the kernel module for your NIC? I have 
> > experimented some similar with my tg3 module. The error which appeared 
> > for my was different:
> > 
> > [btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv 
> > failed: No route to host (113)
> > 
> > I solved it changing the following parameter in the linux kernel:
> > 
> > /sbin/ethtool -K eth0 tso off
> > 
> > Leonardo
> > 
> > 
> > Aurélien Bouteiller escribió:
> > > If you have several network cards in your system, it can sometime get 
> > > the endpoints confused. Especially if you don't have the same number 
> > > of cards or don't use the same subnet for all "eth0, eth1". You should 
> > > try to restrict Open MPI to use only one of the available networks by 
> > > using the --mca btl_tcp_if_include ethx parameter to mpirun, where x 
> > > is the network interface that is always connected to the same logical 
> > > and physical network on your machine.
> > >
> > > Aurelien
> > >
> > > Le 1 oct. 08 à 11:47, V. Ram a écrit :
> > >
> > >> I wrote earlier about one of my users running a third-party Fortran code
> > >> on 32-bit x86 machines, using OMPI 1.2.7, that is having some odd crash
> > >> behavior.
> > >>
> > >> Our cluster's nodes all have 2 single-core processors.  If this code is
> > >> run on 2 processors on 1 node, it runs seemingly fine.  However, if the
> > >> job runs on 1 processor on each of 2 nodes (e.g., mpirun --bynode), then
> > >> it crashes and gives messages like:
> > >>
> > >> [node4][0,1,4][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
> > >> [node3][0,1,3][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
> > >> mca_btl_tcp_frag_recv: readv failed with errno=110
> > >> mca_btl_tcp_frag_recv: readv failed with errno=104
> > >>
> > >> Essentially, if any network communication is involved, the job crashes
> > >> in this form.
> > >>
> > >> I do have another user that runs his own MPI code on 10+ of these
> > >> processors for days at a time without issue, so I don't think it's
> > >> hardware.
> > >>
> > >> The original code also runs fine across many networked nodes if the
> > >> architecture is x86-64 (also running OMPI 1.2.7).
> > >>
> > >> We have also tried different Fortran compilers (both PathScale and
> > >> gfortran) and keep getting these crashes.
> > >>
> > >> Are there any suggestions on how to figure out if it's a problem with
> > >> the code or the OMPI installation/software on the system? We have tried
> > >> "--debug-daemons" with no new/interesting information being revealed.
> > >> Is there a way to trap segfault messages or more detailed MPI
> > >> transaction information or anything else that could help diagnose this?
> > >>
> > >> Thanks.
> > >> -- 
> > >>  V. Ram
> > >>  v_r_...@fastmail.fm
> > >>
> > >> -- 
> > >> http://www.fastmail.fm - Same, same, but different...
> > >>
> > >> _______________________________________________
> > >> users mailing list
> > >> us...@open-mpi.org
> > >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > >
> > > _______________________________________________
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > 
> > 
> > -- 
> > Leonardo Fialho
> > Computer Architecture and Operating Systems Department - CAOS
> > Universidad Autonoma de Barcelona - UAB
> > ETSE, Edifcio Q, QC/3088
> > http://www.caos.uab.es
> > Phone: +34-93-581-2888
> > Fax: +34-93-581-2478
> > 
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> -- 
>   V. Ram
>   v_r_...@fastmail.fm
> 
> -- 
> http://www.fastmail.fm - Faster than the air-speed velocity of an
>                           unladen european swallow
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
-- 
  V. Ram
  v_r_...@fastmail.fm

-- 
http://www.fastmail.fm - The way an email service should be


Reply via email to