Resuscitating this thread... Well, we spent some time testing the various options, and Leonardo's suggestion seems to work!
We disabled TCP Segment Offloading on the e1000 NICs using "ethtool -K eth<X> tso off" and this type of crash no longer happens. I hope this message can help anyone else experiencing the same issues. Thanks Leonardo! OMPI devs: does this imply bug(s) in the e1000 driver/chip? Should I contact the driver authors? On Fri, 10 Oct 2008 12:42:19 -0400, "V. Ram" <v_r_...@fastmail.fm> said: > Leonardo, > > These nodes are all using intel e1000 chips. As the nodes are AMD > K7-based, these are the older chips, not the new ones with all the > eeprom issues with the newer kernel. > > The kernel in use is from the 2.6.22 family, and the e1000 driver is the > one shipped with the kernel. I am running it compiled into the kernel, > not as a module. > > When testing using the intel MPI Benchmarks, I found that increasing the > receive ring buffer size to the max (4096) helped performance, so I use > ethtool -G on startup. > > Checking ethtool -k, I see that tcp segment offload is on. I can try > turning that off to see what happens. > > Oddly, on 64-bit nodes using the tg3 driver, this code doesn't crash or > have these same issues, and I'm not having to turn off tso. > > Can anyone else suggest why the code might be crashing when running over > ethernet and not over shared memory? Any suggestions on how to debug > this or interpret the error message issued from btl_tcp_frag.c ? > > Thanks. > > > On Wed, 01 Oct 2008 18:11:34 +0200, "Leonardo Fialho" > <lfia...@aomail.uab.es> said: > > Ram, > > > > What is the name and version of the kernel module for your NIC? I have > > experimented some similar with my tg3 module. The error which appeared > > for my was different: > > > > [btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv > > failed: No route to host (113) > > > > I solved it changing the following parameter in the linux kernel: > > > > /sbin/ethtool -K eth0 tso off > > > > Leonardo > > > > > > Aurélien Bouteiller escribió: > > > If you have several network cards in your system, it can sometime get > > > the endpoints confused. Especially if you don't have the same number > > > of cards or don't use the same subnet for all "eth0, eth1". You should > > > try to restrict Open MPI to use only one of the available networks by > > > using the --mca btl_tcp_if_include ethx parameter to mpirun, where x > > > is the network interface that is always connected to the same logical > > > and physical network on your machine. > > > > > > Aurelien > > > > > > Le 1 oct. 08 à 11:47, V. Ram a écrit : > > > > > >> I wrote earlier about one of my users running a third-party Fortran code > > >> on 32-bit x86 machines, using OMPI 1.2.7, that is having some odd crash > > >> behavior. > > >> > > >> Our cluster's nodes all have 2 single-core processors. If this code is > > >> run on 2 processors on 1 node, it runs seemingly fine. However, if the > > >> job runs on 1 processor on each of 2 nodes (e.g., mpirun --bynode), then > > >> it crashes and gives messages like: > > >> > > >> [node4][0,1,4][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] > > >> [node3][0,1,3][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] > > >> mca_btl_tcp_frag_recv: readv failed with errno=110 > > >> mca_btl_tcp_frag_recv: readv failed with errno=104 > > >> > > >> Essentially, if any network communication is involved, the job crashes > > >> in this form. > > >> > > >> I do have another user that runs his own MPI code on 10+ of these > > >> processors for days at a time without issue, so I don't think it's > > >> hardware. > > >> > > >> The original code also runs fine across many networked nodes if the > > >> architecture is x86-64 (also running OMPI 1.2.7). > > >> > > >> We have also tried different Fortran compilers (both PathScale and > > >> gfortran) and keep getting these crashes. > > >> > > >> Are there any suggestions on how to figure out if it's a problem with > > >> the code or the OMPI installation/software on the system? We have tried > > >> "--debug-daemons" with no new/interesting information being revealed. > > >> Is there a way to trap segfault messages or more detailed MPI > > >> transaction information or anything else that could help diagnose this? > > >> > > >> Thanks. > > >> -- > > >> V. Ram > > >> v_r_...@fastmail.fm > > >> > > >> -- > > >> http://www.fastmail.fm - Same, same, but different... > > >> > > >> _______________________________________________ > > >> users mailing list > > >> us...@open-mpi.org > > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > > > _______________________________________________ > > > users mailing list > > > us...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > -- > > Leonardo Fialho > > Computer Architecture and Operating Systems Department - CAOS > > Universidad Autonoma de Barcelona - UAB > > ETSE, Edifcio Q, QC/3088 > > http://www.caos.uab.es > > Phone: +34-93-581-2888 > > Fax: +34-93-581-2478 > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- > V. Ram > v_r_...@fastmail.fm > > -- > http://www.fastmail.fm - Faster than the air-speed velocity of an > unladen european swallow > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- V. Ram v_r_...@fastmail.fm -- http://www.fastmail.fm - The way an email service should be