Brian, As Ralph already stated, this is likely a hwloc API issue. From debian9, you can lstopo --of xml | ssh debian8 lstopo --if xml -i -
that will likely confirm the API error. If you are willing to get a bit more details, you can add some printf in opal_hwloc_unpack (from opal/mca/hwloc/base/hwloc_base_dt.c) to figure out where exactly the failure occurs. Meanwhile, you can move forward by using the embedded hwloc on both distros (--with-hwloc=internal or no --with-hwloc option at all). Note we strongly discourage you configure --with-FOO=/usr (it explicitly add /usr/include and /usr/lib[64] in the search path, and might hide some other external libraries installed in a non standard location). In order to force the external hwloc lib installed in the default location, --with-hwloc=external is what you need (same thing applies to libevent and pmix) Cheers, Gilles On Sun, Jul 22, 2018 at 7:52 AM r...@open-mpi.org <r...@open-mpi.org> wrote: > > More than likely the problem is the difference in hwloc versions - sounds > like the topology to/from xml is different between the two versions, and the > older one doesn’t understand the new one. > > > On Jul 21, 2018, at 12:04 PM, Brian Smith <bsm...@systemfabricworks.com> > > wrote: > > > > Greetings, > > > > I'm having trouble getting openmpi 2.1.2 to work when launching a > > process from debian 8 on a remote debian 9 host. To keep things simple > > in this example, I'm just launching date on the remote host. > > > > deb8host$ mpirun -H deb9host date > > [deb8host:01552] [[32763,0],0] ORTE_ERROR_LOG: Error in file > > base/plm_base_launch_support.c at line 954 > > > > It works fine when executed from debian 9: > > deb9host$ mpirun -H deb8host date > > Sat Jul 21 13:40:43 CDT 2018 > > > > Also works when executed from debian 8 against debian 8: > > deb8host:~$ mpirun -H deb8host2 date > > Sat Jul 21 13:55:57 CDT 2018 > > > > The failure results from an error code returned by: > > opal_dss.unpack(buffer, &topo, &idx, OPAL_HWLOC_TOPO) > > > > openmpi was built with the same configure flags on both hosts. > > > > --prefix=$(PREFIX) \ > > --with-verbs \ > > --with-libfabric \ > > --disable-silent-rules \ > > --with-hwloc=/usr \ > > --with-libltdl=/usr \ > > --with-devel-headers \ > > --with-slurm \ > > --with-sge \ > > --without-tm \ > > --disable-heterogeneous \ > > --with-contrib-vt-flags=--disable-iotrace \ > > --sysconfdir=$(PREFIX)/etc \ > > --libdir=$(PREFIX)/lib \ > > --includedir=$(PREFIX)/include > > > > > > deb9host libhwloc and libhwloc-plugins is 1.11.5-1 > > deb8host libhwloc and libhwloc-plugins is 1.10.0-3 > > > > I've been trying to debug this for the past few days and would > > appreciate any help on determining why this failure is occurring > > and/or resolving the problem. > > > > -- > > Brian T. Smith > > System Fabric Works > > Senior Technical Staff > > bsm...@systemfabricworks.com > > GPG Key: B3C2C7B73BA3CD7F > > _______________________________________________ > > users mailing list > > users@lists.open-mpi.org > > https://lists.open-mpi.org/mailman/listinfo/users > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users