Well, it looks like comm_spawn is working on 1.6. Afraid I don't know enough about Rmpi/snow to advise on what changed, but you could add some debug params to get an idea of where the problem is occurring:
-mca plm_base_verbose 5 -mca dpm_base_verbose 5 should tell you from an OMPI perspective. I can try to help debug that end, at least. On Jul 26, 2012, at 3:02 PM, Ralph Castain wrote: > Weird - looks like it has done a comm_spawn and having trouble connecting > between the jobs. I can check the basic code and make sure it is working - I > seem to recall someone else recently talking about Rmpi changes causing > problems (different ones than this, IIRC), so you might want to search our > user archives for rmpi to see what they ran into. Not sure what rmpi changed, > or why. > > On Jul 26, 2012, at 2:41 PM, Brock Palen wrote: > >> I have ran into a problem using Rmpi with OpenMPI (trying to get snow >> running). >> >> I built OpenMPI following another post where I built static: >> >> ./configure --prefix=$INSTALL/gcc-4.4.6-static >> --mandir=$INSTALL/gcc-4.4.6-static/man --with-tm=/usr/local/torque/ >> --with-openib --with-psm --enable-static CC=gcc CXX=g++ FC=gfortran >> F77=gfortran >> >> Rmpi/snow work fine when I run on a single node. When I span more than one >> node I get nasty errors (pasted below). >> >> I tested this mpi install with a simple hello world and that works. Any >> thoughts what is different about Rmpi/snow that could cause this? >> >> [nyx0400.engin.umich.edu:11927] [[48116,0],4] ORTE_ERROR_LOG: Not found in >> file routed_binomial.c at line 386 >> [nyx0400.engin.umich.edu:11927] [[48116,0],4]:route_callback tried routing >> message from [[48116,2],16] to [[48116,1],0]:16, can't find route >> [nyx0405.engin.umich.edu:07707] [[48116,0],8] ORTE_ERROR_LOG: Not found in >> file routed_binomial.c at line 386 >> [nyx0405.engin.umich.edu:07707] [[48116,0],8]:route_callback tried routing >> message from [[48116,2],32] to [[48116,1],0]:16, can't find route >> [0] >> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(opal_backtrace_print+0x1f) >> [0x2b7e9209e0df] >> [1] >> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(+0x9f77a) >> [0x2b7e9206577a] >> [2] >> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(mca_oob_tcp_msg_recv_complete+0x27f) >> [0x2b7e920404af] >> [3] >> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(+0x7bed2) >> [0x2b7e92041ed2] >> [4] >> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(opal_event_base_loop+0x238) >> [0x2b7e92087e38] >> [5] >> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(orte_daemon+0x8d8) >> [0x2b7e92016768] >> [6] func:orted(main+0x66) [0x400966] >> [7] func:/lib64/libc.so.6(__libc_start_main+0xfd) [0x3d39c1ecdd] >> [8] func:orted() [0x400839] >> [nyx0397.engin.umich.edu:06959] [[48116,0],1] ORTE_ERROR_LOG: Not found in >> file routed_binomial.c at line 386 >> [nyx0397.engin.umich.edu:06959] [[48116,0],1]:route_callback tried routing >> message from [[48116,2],7] to [[48116,1],0]:16, can't find route >> [nyx0401.engin.umich.edu:07782] [[48116,0],5] ORTE_ERROR_LOG: Not found in >> file routed_binomial.c at line 386 >> [nyx0401.engin.umich.edu:07782] [[48116,0],5]:route_callback tried routing >> message from [[48116,2],23] to [[48116,1],0]:16, can't find route >> [nyx0406.engin.umich.edu:07743] [[48116,0],9] ORTE_ERROR_LOG: Not found in >> file routed_binomial.c at line 386 >> [nyx0406.engin.umich.edu:07743] [[48116,0],9]:route_callback tried routing >> message from [[48116,2],39] to [[48116,1],0]:16, can't find route >> [0] >> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(opal_backtrace_print+0x1f) >> [0x2ae2ad17d0df] >> >> >> >> >> Brock Palen >> www.umich.edu/~brockp >> CAEN Advanced Computing >> bro...@umich.edu >> (734)936-1985 >> >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >