Ralph,
You are right,
please disregard my previous post, it was irrelevant.
i just noticed that unlike ompi v1.8 (hwloc 1.7.2 based => no warning),
master has this warning (hwloc 1.9.1)
i will build slurm vs a recent hwloc and see what happens
(FWIW RHEL6 comes with hwloc 1.5, RHEL7 comes with h
Per his prior notes, he is using mpirun to launch his jobs. Brice has confirmed
that OMPI doesn’t have that hwloc warning in it. So either he has inadvertently
linked against the Ubuntu system version of hwloc, or the message must be
coming from Slurm.
> On Dec 10, 2014, at 6:14 PM, Gilles Gou
Pim,
at this stage, all i can do is acknowledge your slurm is configured to
use cgroups.
and based on your previous comment (e.g. problem only occurs with
several jobs on the same node)
that *could* be a bug in OpenMPI (or hwloc).
by the way, how do you start your mpi application ?
- do you use
The failure was due to the use of opal_init() in the tests. I thought it
was ok to use because it is used by other tests (which turned out to be
disabled) but that isn't the case. opal_init_util() has to be used
instead. I pushed a fix to master last night.
-Nathan
On Tue, Dec 09, 2014 at 03:35:
I think you actually already answered this - if that warning message isn’t in
OMPI’s internal code, and the user gets it when building with either internal
or external hwloc support, then it must be coming from Slurm.
This assumes that ldd libopen-pal.so doesn’t show OMPI to actually be linked
Unfortunately I don't think we have any way to know which process and
hwloc version generates a XML so far. I am currently looking at adding
this to hwloc 1.10.1 because of this thread.
One thing that could help would be to dump the XML file that OMPI
receives. Just write the entire buffer to a fi
Brice: is there any way to tell if these are coming from Slurm vs OMPI? Given
this data, I’m suspicious that this might have something to do with Slurm and
not us.
> On Dec 10, 2014, at 9:45 AM, Pim Schellart wrote:
>
> Dear Gilles et al.,
>
> we tested with openmpi compiled from source (ver
The warning does not exist in the hwloc code inside OMPI 1.8, so there's
something strange happening in your first test. I would assume it's
using the external hwloc in both cases for some reason. Running ldd on
libopen-pal.so could be a way to check whether it depends on an external
libhwloc.so or
Dear Gilles et al.,
we tested with openmpi compiled from source (version 1.8.3) both with:
./configure --prefix=/usr/local/openmpi --disable-silent-rules
--with-libltdl=external --with-devel-headers --with-slurm
--enable-heterogeneous --disable-vt --sysconfdir=/etc/openmpi
and
./configure --p
Thanks, Gilles
We're back to looking at this (yet again.) It's a false positive, yes,
however, it's not completely benign. The max_reg that's calculated is much
smaller than it should be. In OFED 3.12, max_reg should be 2*TOTAL_RAM. We
should have a fix for 1.8.4.
Josh
On Mon, Dec 8, 2014 at 9:2
Hi,
I already figured this out and did the port :-)
Cheers,
Gilles
Piotr Lesnicki wrote:
>Hi,
>
>Gilles, the patch I sent is vs v1.6, so I prepare a patch vs master.
>
>The patch on v1.6 can not apply on master because of changes in the
>btl openib: connecting XRC queues has changed from XOOB
Hi,
Gilles, the patch I sent is vs v1.6, so I prepare a patch vs master.
The patch on v1.6 can not apply on master because of changes in the
btl openib: connecting XRC queues has changed from XOOB to UDCM.
Piotr
De : devel [devel-boun...@open-mpi.org] de
Piotr and all,
i issued PR #313 (vs master) based on your patch:
https://github.com/open-mpi/ompi/pull/313
could you please have a look at it ?
Cheers,
Gilles
On 2014/12/09 22:07, Gilles Gouaillardet wrote:
> Thanks Piotr,
>
> Based on the ompi community rules, a pr should be made vs the maste
13 matches
Mail list logo