Re: [OMPI devel] OMPI devel] openmpi and XRC API from ofed-3.12

2014-12-10 Thread Gilles Gouaillardet
Piotr and all, i issued PR #313 (vs master) based on your patch: https://github.com/open-mpi/ompi/pull/313 could you please have a look at it ? Cheers, Gilles On 2014/12/09 22:07, Gilles Gouaillardet wrote: > Thanks Piotr, > > Based on the ompi community rules, a pr should be made vs the maste

Re: [OMPI devel] OMPI devel] openmpi and XRC API from ofed-3.12

2014-12-10 Thread Piotr Lesnicki
Hi, Gilles, the patch I sent is vs v1.6, so I prepare a patch vs master. The patch on v1.6 can not apply on master because of changes in the btl openib: connecting XRC queues has changed from XOOB to UDCM. Piotr De : devel [devel-boun...@open-mpi.org] de

Re: [OMPI devel] OMPI devel] OMPI devel] openmpi and XRC API from ofed-3.12

2014-12-10 Thread Gilles Gouaillardet
Hi, I already figured this out and did the port :-) Cheers, Gilles Piotr Lesnicki wrote: >Hi, > >Gilles, the patch I sent is vs v1.6, so I prepare a patch vs master. > >The patch on v1.6 can not apply on master because of changes in the >btl openib: connecting XRC queues has changed from XOOB

Re: [OMPI devel] [OMPI users] Warning about not enough registerable memory on SL6.6

2014-12-10 Thread Joshua Ladd
Thanks, Gilles We're back to looking at this (yet again.) It's a false positive, yes, however, it's not completely benign. The max_reg that's calculated is much smaller than it should be. In OFED 3.12, max_reg should be 2*TOTAL_RAM. We should have a fix for 1.8.4. Josh On Mon, Dec 8, 2014 at 9:2

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-10 Thread Pim Schellart
Dear Gilles et al., we tested with openmpi compiled from source (version 1.8.3) both with: ./configure --prefix=/usr/local/openmpi --disable-silent-rules --with-libltdl=external --with-devel-headers --with-slurm --enable-heterogeneous --disable-vt --sysconfdir=/etc/openmpi and ./configure --p

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-10 Thread Brice Goglin
The warning does not exist in the hwloc code inside OMPI 1.8, so there's something strange happening in your first test. I would assume it's using the external hwloc in both cases for some reason. Running ldd on libopen-pal.so could be a way to check whether it depends on an external libhwloc.so or

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-10 Thread Ralph Castain
Brice: is there any way to tell if these are coming from Slurm vs OMPI? Given this data, I’m suspicious that this might have something to do with Slurm and not us. > On Dec 10, 2014, at 9:45 AM, Pim Schellart wrote: > > Dear Gilles et al., > > we tested with openmpi compiled from source (ver

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-10 Thread Brice Goglin
Unfortunately I don't think we have any way to know which process and hwloc version generates a XML so far. I am currently looking at adding this to hwloc 1.10.1 because of this thread. One thing that could help would be to dump the XML file that OMPI receives. Just write the entire buffer to a fi

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-10 Thread Ralph Castain
I think you actually already answered this - if that warning message isn’t in OMPI’s internal code, and the user gets it when building with either internal or external hwloc support, then it must be coming from Slurm. This assumes that ldd libopen-pal.so doesn’t show OMPI to actually be linked

Re: [OMPI devel] opal_lifo/opal_fifo fail with make distcheck

2014-12-10 Thread Nathan Hjelm
The failure was due to the use of opal_init() in the tests. I thought it was ok to use because it is used by other tests (which turned out to be disabled) but that isn't the case. opal_init_util() has to be used instead. I pushed a fix to master last night. -Nathan On Tue, Dec 09, 2014 at 03:35:

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-10 Thread Gilles Gouaillardet
Pim, at this stage, all i can do is acknowledge your slurm is configured to use cgroups. and based on your previous comment (e.g. problem only occurs with several jobs on the same node) that *could* be a bug in OpenMPI (or hwloc). by the way, how do you start your mpi application ? - do you use

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-10 Thread Ralph Castain
Per his prior notes, he is using mpirun to launch his jobs. Brice has confirmed that OMPI doesn’t have that hwloc warning in it. So either he has inadvertently linked against the Ubuntu system version of hwloc, or the message must be coming from Slurm. > On Dec 10, 2014, at 6:14 PM, Gilles Gou

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-10 Thread Gilles Gouaillardet
Ralph, You are right, please disregard my previous post, it was irrelevant. i just noticed that unlike ompi v1.8 (hwloc 1.7.2 based => no warning), master has this warning (hwloc 1.9.1) i will build slurm vs a recent hwloc and see what happens (FWIW RHEL6 comes with hwloc 1.5, RHEL7 comes with h