Yeah, it's failing when trying to unpack the topology obtained from hwloc. My guess is that one of the following calls changed in hwloc-1.4.3:
if (0 != hwloc_topology_set_xmlbuffer(t, xmlbuffer, strlen(xmlbuffer))) { rc = OPAL_ERROR; free(xmlbuffer); hwloc_topology_destroy(t); goto cleanup; } /* since we are loading this from an external source, we have to * explicitly set a flag so hwloc sets things up correctly */ if (0 != hwloc_topology_set_flags(t, HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM)) { free(xmlbuffer); rc = OPAL_ERROR; goto cleanup; } Only other things in that routing are hwloc_topology_init and hwloc_topology_load, and those haven't changed in awhile. On Jul 23, 2013, at 11:12 AM, Kevin H. Hobbs <hob...@ohio.edu> wrote: > On 07/23/2013 09:54 AM, Jeff Squyres (jsquyres) wrote: >> >> I don't know if Fedora RPMs include -g in their builds, or if Fedora >> includes a debuginfo RPM that you could install such that you can attach >> a debugger and be able to dig into OMPI's internals yourself. >> > > There is a debuginfo package. > > Since I removed all of fedora's openmpi packages and installed from > source into /opt/openmpi-1.6.5 and /opt/openmpi-1.6.5_hwloc-1.4.3 to > narrow down on this problem, I now have to re-install the rpms with yum. > > sudo yum install openmpi openmpi-devel openmpi-debuginfo > > These don't put anything into my PATH or LD_LIBRARY_PATH so I have to : > > module load mpi/openmpi-x86_64 > > I compiled my simple program with : > > mpicc -g -o mpi_simple mpi_simple.c > > The program links to fedora's copies of the libraries of interest : > > mpirun -n 1 ldd mpi_simple | grep hwloc > libhwloc.so.5 => /lib64/libhwloc.so.5 (0x0000003c57600000) > mpirun -n 1 ldd mpi_simple | grep mpi > libmpi.so.1 => /usr/lib64/openmpi/lib/libmpi.so.1 (0x00007f7207e29000) > > I started the debugger with : > > mpirun -n 1 gdb mpi_simple > > When run in the debugger I got the error I described. > > I reran and in gdb did : > > set breakpoint pending on > break util/nidmap.c:146 > run > step > > took me into 'opal_dss_unpack' Then I did 'next' until I got passed > 'opal_dss_unpack_buffer' which returned the -1 we see outside. > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users