Mark,
i think there is no more mca_pmix_native.so.
you can confirm that by checking the timestamps of the libs after
running make install.
just remove your install dir, and run make install again, and that
will solve your issue.
Cheers,
Gilles
On Tue, Sep 22, 2015 at 5:26 PM, Mark Santcroos
w
x/pmix1xx includes "-D_REENTRANT".
> However, they don't include "-mt".
> I believe we concluded (when we had problems previously) that "-mt" was
> the proper flag (at compile and link) for multi-threaded with the Studio
> compilers.
>
> -
t, Sep 19, 2015 at 8:50 PM, Ralph Castain > > wrote:
>>
>>> Paul, can you clarify something for me? The error in this case indicates
>>> that the client wasn’t able to reach the daemon - this should have resulted
>>> in termination of the job. Did the j
Paul,
IIRC, the "Permission denied" is coming from hwloc that cannot collect
all the info it would like.
Cheers,
Gilles
On 9/18/2015 2:34 PM, Paul Hargrove wrote:
Tried tonight's master tarball on Solaris 11.2 on x86-64 with the
Studio Compilers (default ILP32 output) and saw the following
George,
I will revisit this.
if I added const modifier when not required by the standard, this was not
intentional, this was a mistake.
thanks for the report
Gilles
On Wednesday, September 16, 2015, George Bosilca
wrote:
> Gilles,
>
> Your commit 6e6a3e96 is only partially correct. There is n
Ralph,
the collective/op, collective/op_mpifh, collective/op_usempi,
group/group, onesided/c_lock_illegal and random/attr-error-code fails
because your contrib/platform/intel/bend/linux.conf contains the
following line
mpi_param_check = 0
and this is not handled correctly by ibm test suite.
Ralph,
according to mtt logs (click on the MPI Install button at the top left
corner), ompi was built in zero seconds ...
iirc, you do not build ompi under mtt, but you use the mtt "installed"
module
so my best bet is mtt logged some garbage since it has no way to figure out
how ompi was configure
Ralph,
at first glance, these errors look unrelated to PMIx.
I noticed a bunch of bind() failure.
based on your command line, I guess you are not running your job via a
batch manager,
and I would guess not all unix sockets are always cleaned up.
(or this is an old bug and you did not manually clea
Nathan,
i am experiencing some issues and i found commit
0bf06de3f1444f469303e47752430ec9b423b33f
https://github.com/open-mpi/ompi/commit/0bf06de3f1444f469303e47752430ec9b423b33f
and the following are very likely the root cause.
i experience this on a linux sparc system only.
Per the commit
Ralph,
this is fixed in
https://github.com/open-mpi/ompi/commit/a1627feaf74d8562146a1afbfabec60651496c06
Cheers,
Gilles
On 9/11/2015 1:02 PM, Gilles Gouaillardet wrote:
Ralph,
will do
i think this new warnings are a consequence of the changes i pushed
recently
(e.g. add the const
bled by default]
coll_base->coll_ireduce = mca_coll_ml_reduce_nb;
On Sep 10, 2015, at 7:02 PM, Shamis, Pavel <mailto:sham...@ornl.gov>> wrote:
Awesome, thanks for fixing this.
*From:*devel <mailto:devel-boun...@ope
Pasha,
i fixed that in
https://github.com/open-mpi/ompi/commit/c404e98dced4104cd3abe7485846368325c3d150
but forgot to post it to the ML ...
Cheers,
Gilles
On 9/11/2015 7:31 AM, Shamis, Pavel wrote:
Ralph,
I don't see these warnings on my fedora box with gcc 5.1.1.
I will try to fix it, but
regarding the FX10 specific issue
>
> Cheers,
>
> Gilles
>
> On 9/4/2015 2:31 PM, Brice Goglin wrote:
>
> Le 04/09/2015 00:36, Gilles Gouaillardet a écrit :
>
> Ralph,
>
> just to be clear, your proposal is to abort if openmpi is configured
> with --without-hw
Folks,
a bunch of C bindings have comments such as
/* XXX -- CONST -- do not cast away const -- update mca/coll */
and that has been there for a long time.
i made PR #839 https://github.com/open-mpi/ompi/pull/839 to fix this.
the change is quite massive (270 files) since :
- the C bindings had t
Folks,
Jeff and i have been discussing the possibility of removing the
--enable-mpi-profile option from ompi.
(see https://github.com/open-mpi/ompi/pull/845 for the details)
Removing this option would simplify the building process, and make it
crystal clear that Fortran bindings call
the C PM
:31 PM, Brice Goglin wrote:
Le 04/09/2015 00:36, Gilles Gouaillardet a écrit :
Ralph,
just to be clear, your proposal is to abort if openmpi is configured
with --without-hwloc, right ?
( the --with-hwloc option is not removed because we want to keep the
option of using an external hwloc library
Ralph,
just to be clear, your proposal is to abort if openmpi is configured with
--without-hwloc, right ?
( the --with-hwloc option is not removed because we want to keep the option
of using an external hwloc library )
if I understand correctly, Paul's point is that if openmpi is ported to a
new
Ralph,
if I correctly read between the lines of your second point, omnipath (PSM2)
is working out of the box. I am not sure this is the case, and/or my
extrapolation might be incorrect.
if I understood correctly, psm2 is a new feature.
from a distro point of view, that could be a new package (kno
Jeff,
on second thought, wouldn't it be better to simple disable both PSM and
PSM2 in openmpi,
and let libfabric handle these conflicts ?
does that make any sense ?
Cheers,
Gilles
On Thursday, September 3, 2015, Jeff Squyres (jsquyres)
wrote:
> I agree with what George says.
>
> AFAIK, Red Ha
Michael,
if a solution with two packages is acceptable,
then an other and simpler option is to configure
openmpi for PSM with --without-psm2,
and openmpi for PSM2 with --without-psm
this is safe for --disable-dlopen or --enable-static, and you do not need
to tweak the conf files
Cheers,
Gilles
afraid that won’t solve the problem - the distro will still
feel the need to release -two- versions of OMPI, one with PSM and
one with PSM2. Ordinarily, I wouldn’t care - but this creates user
confusion and reflects on us as a community.
> On Sep 2, 2015, at 6:50 PM, Gill
care - but this creates user confusion and reflects on us as a
community.
On Sep 2, 2015, at 6:50 PM, Gilles Gouaillardet wrote:
Ralph,
what about automatically *not* building PSM2 if PSM is built and PSM2 is not
explicitly required ?
/* in order to be future proof, we could even do that only
Ralph,
what about automatically *not* building PSM2 if PSM is built and PSM2 is
not explicitly required ?
/* in order to be future proof, we could even do that only if we detect
a symbol conflict */
we could abort if ompi is configure'd with both --with-psm and
--with-psm2, or simply do nothin
Hi,
this part has been revamped recently.
at first, i would recommend you make a fresh install
remove the install directory, and the build directory if you use VPATH,
re-run configure && make && make install
that should hopefully fix the issue
Cheers,
Gilles
On 9/1/2015 9:35 AM, Cabral, Mat
Brice,
as a side note, what is the rationale for defining the distance as a
floating point number ?
i remember i had to fix a bug in ompi a while ago
/* e.g. replace if (d1 == d2) with if((d1-d2) < epsilon) */
Cheers,
Gilles
On 9/1/2015 5:28 AM, Brice Goglin wrote:
The locality is mlx4_0 as
Jeff,
i filed PR #845 https://github.com/open-mpi/ompi/pull/845
could you please have a look ?
Cheers,
Gilles
On 8/30/2015 9:20 PM, Gilles Gouaillardet wrote:
ok, will do
basically, I simply have to
#include "ompi/mpi/c/profile/defines.h"
if configure set the WANT_MPI_PROFI
*_f files are impacted, and for
mpif-h only,
so i'd rather ask before I fill the pr, and even if a sed command will do
most of the job */
Cheers,
Gilles
On Saturday, August 29, 2015, Jeff Squyres (jsquyres)
wrote:
> On Aug 27, 2015, at 3:25 AM, Gilles Gouaillardet > wro
Thanks Michael and Kawashima-san,
i made PR #838 to fix this
it is currently available at https://github.com/open-mpi/ompi/pull/838
Cheers,
Gilles
On 8/27/2015 6:29 PM, Michael Knobloch wrote:
Dear OpenMPI developers,
I noticed a bug in the definition of the 3 MPI-3 RMA functions
MPI_Compare
Ralph,
what about :
- if only one interface is specified (e.g. *_if_include eth0), then bind
to that interface
- otherwise, bind to all interfaces
Mark, would that solve your issue ?
Cheers,
Gilles
On 8/28/2015 9:50 AM, Ralph Castain wrote:
I committed the change that prevents orte-submit
Kawashima-san,
you are right, I mixed MPI_Buffer_detach and MPI_Win_detach
sorry for the confusion
Cheers,
Gilles
On Thursday, August 27, 2015, Kawashima, Takahiro <
t-kawash...@jp.fujitsu.com> wrote:
> Gilles,
>
> > there is a comment in the source code to explain this.
>
> Could you point wh
iirc, the MPI_Win_detach discrepancy with the standard is intentional in
fortran 2008,
there is a comment in the source code to explain this.
On Thursday, August 27, 2015, Kawashima, Takahiro <
t-kawash...@jp.fujitsu.com> wrote:
> Oh, I also noticed it yesterday and was about to report it.
>
> An
I am lost ...
from ompi/mpi/fortran/mpif-h/profile/palltoall_f.c
void ompi_alltoall_f(char *sendbuf, MPI_Fint *sendcount, MPI_Fint *sendtype,
char *recvbuf, MPI_Fint *recvcount, MPI_Fint *recvtype,
MPI_Fint *comm, MPI_Fint *ierr)
{
[...]
c_ierr = M
Paul,
i tried PSM_RCVTHREAD=0 but it did not help
Jeff,
you did not read too much ... but my words were not quite accurate.
yes, the signal handlers are set in the library constructor.
by reading the source code, i found that can be avoided by setting
the yet undocumented IPATH_NO_BACKTRACE en
t; -Paul
>
> On Tue, Aug 25, 2015 at 6:02 PM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com
> > wrote:
>
>> i run on a centos 7 vm, and with the OFED that comes with centos
>> (I will send full details tomorrow)
>> there is no psm hardware, just inf
would it be easier if the option was --host instead of -host ?
I guess changing the cli is not an option for the v1.x series, so what
about adding the -hosts option (alias to -host option) ?
I made the same mistake a few times before, adding a s to hosts looks more
intuitive for me.
my 0.02 US$
Gi
EADME and the FAQ.
>>>
>>> I'd be against adding user documentation to the wiki -- this would be a
>>> 3rd place for users to look for information.
>>>
>>> > - file a bug against intel psm
>>>
>>> I'd like to hear what
Folks,
I ran some basic tests with IPM profiler-like
https://github.com/nerscadmin/IPM and found that when fortran calls an
mpi subroutine, this is accounted twice.
IPM defines both MPI_* subroutines and their fortran mpi_*_ counterpart.
since the ompi fortran calls the MPI_* symbol (and not the P
ipath change actually change its signal handler
> behavior?
>
>
> > On Aug 25, 2015, at 4:27 AM, Gilles Gouaillardet > wrote:
> >
> > Folks,
> >
> > some time ago, some crashes were reported when using java bindings.
> > one of them was caused was cause
Folks,
some time ago, some crashes were reported when using java bindings.
one of them was caused was caused by mca_mtl_psm.so.
the root cause is libinfinipath.so initializer sets its own signal
handler, which
conflicts with the signal handler sets by the jvm.
the only workaround is to disable
Thanks Adrian,
i fixed this in PR #831 https://github.com/open-mpi/ompi/pull/831 and
push it shortly to master
Best regards,
Gilles
On 8/25/2015 4:47 PM, Adrian Reber wrote:
On Mon, Aug 24, 2015 at 09:47:22PM +, Jeff Squyres (jsquyres) wrote:
Who runs the esslingen MTT?
You're getting
a first step could be adding a --disable-libnl3 option to configure, which
means components should not even try to use libnl3
makes sense ?
On Monday, August 24, 2015, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:
> iirc, librdmacm uses libnl
>
> I am not sure if h
f both libnl and libnl3 are present
> in the same process (e.g., if some of OMPI's dependent libraries pull them
> both in). We could try to opal_dl_open() NULL and them look for symbols
> that are unique to libnl and libnl3, but a) when to do that, and b) it's
> not guaranteed
al_dl_open() NULL and them look for symbols
> that are unique to libnl and libnl3, but a) when to do that, and b) it's
> not guaranteed to work in all cases.
>
>
>
>
> > On Aug 24, 2015, at 7:36 AM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com > w
Folks,
I recently installed libnl3-devel rpm on my centos 7 box, reconfigured and
recompiled ompi, and ompi_info now crashes.
it seems the root cause is an obscure conflict between libnl and libnl3.
libnl is indirectly required by the common_verbs mac (OFED libraries do
need it) and libnl3 is req
same issue, but suspect that it is
> not.
>
> -Paul
>
> On Sat, Aug 22, 2015 at 6:00 PM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com
> > wrote:
>
>> Paul,
>>
>> isn t this an issue that was already discussed ?
>> mellanox proprietary hcoll
Paul,
isn t this an issue that was already discussed ?
mellanox proprietary hcoll library includes its own coll ml module that
conflicts with the ompi one.
mellanox folks fixed this internally but I am not sure this has been
released.
you can run
nm libhcoll.so
if there are some symbols starting w
urgent, i assigned them to you.
this simply remove a bogus test (OFED version used at runtime
vs compile time)
note i made a PR for master but i did not push my changes
Cheers,
Gilles
On 8/14/2015 8:44 AM, Gilles Gouaillardet wrote:
Paul,
i tried to fix this test, and at this stage, i do not
Paul,
i tried to fix this test, and at this stage, i do not understand any
more the logic of this test.
right now, my best bet is to simply remove this test.
the worst case scenario will be a potentially obscure error message if
ompi was built with OFED X and ran with OFED Y.
I will make the
Hi Howard,
it looks like i pushed by branch to ompi repo instead of my clone ...
that was clearly a mistake and i deleted the branch
Cheers,
Gilles
On 8/6/2015 12:14 AM, Howard Pritchard wrote:
HI Folks,
There's a new branch on open-mpi/ompi repo.
Is this intentional?
Howard
___
Harmut,
yes this is a bug ...
we are still working on a proper fix.
in the mean time, you can comment the dlsym test in the openib btl
(otherwise, openmpi falls back to tcp ...)
Cheers,
Gilles
On Tuesday, August 4, 2015, Hartmut Häfner (SCC)
wrote:
> Dear developers,
>
> we have installed Op
Christoph,
that is correct
stdout is a tty and stderr is not.
it is a pipe to orted.
I do not think that would be hard to change.
is this a source of problem for your applications ?
note this kind of behavior can be caused by the batch manager.
if you use slurm and srun instead of mpirun, I am no
Lisandro,
i fixed it on master at
https://github.com/open-mpi/ompi/commit/318a1a40a4ab345f417b8932326d4dd2e68d82bc
could you git it a try ?
Cheers,
Gilles
On 7/26/2015 9:26 AM, Gilles Gouaillardet wrote:
Lisandro,
I think I see what is going wrong and will fix it
Thanks for the report
Lisandro,
I think I see what is going wrong and will fix it
Thanks for the report,
Gilles
On Saturday, July 25, 2015, Lisandro Dalcin wrote:
> Using a debug build of 1.8.7, I'm still getting this malloc(0) warning:
>
> malloc debug: Request for 0 bytes (coll_libnbc_ireduce_scatter_block.c, 67
Paul,
where do you run mpirun ?
on a compute node ?
on a login node with no infiniband interface ?
if on a login node, are the infiniband libraries at least available ?
Cheers,
Gilles
On Saturday, July 25, 2015, Paul Hargrove wrote:
> I know Gilles and I went to a fair amount of effort to get
using the
> dlsym() check if not using shared libs, including in the --disable-dlopen
> and --disable-shared cases.
>
> Also, I noticed that you don't have a dlclose(lib) call
> in mca_btl_openib_xrc_check_api().
>
> -Paul
>
> On Fri, Jul 24, 2015 at 11:55 PM, Gilles Go
Paul,
this test is here to gracefully disable the opening btl if ompi was built
with recent ofed, but is running with an old version (or the other way
around)
I recently got a similar false positive when ompi was configure'd with
static libraries only.
in this case, a workaround was to dlsym the
ain, i was unable to reproduce any crash.
Cheers,
Gilles
On 7/22/2015 12:48 AM, Ralph Castain wrote:
I believe I have this fixed - please see if this solves the problem:
https://github.com/open-mpi/ompi/pull/730
On Jul 21, 2015, at 12:22 AM, Gilles Gouaillardet <mailto:gil...@rist.or.jp>
t, next, &orte_rml_base.posted_recvs,
orte_rml_posted_recv_t) {
/* since names could include wildcards, must use
* the more generalized comparison function
*/
i hope this helps,
Gilles
On 7/17/2015 11:04 PM, Ralph Castain wrote:
It’s probably a race condition ca
it leaves something to be desired.
Sigh. Sorry for the “false” alarm.
On Jul 17, 2015, at 8:54 PM, Gilles Gouaillardet
<mailto:gilles.gouaillar...@gmail.com>> wrote:
Ralph,
based on the source code (ompi_mpi_params.c:91) I was expecting a
Boolean ompi_mpi_param_check
Cheers,
G
Ralph,
based on the source code (ompi_mpi_params.c:91) I was expecting a Boolean
ompi_mpi_param_check
Cheers,
Gilles
On Saturday, July 18, 2015, Ralph Castain wrote:
> Yep, I checked:
>
> MPI parameter check: runtime
>
>
>
> On Jul 17, 2015, at 8:00 PM
Ralph,
I will try to reproduce this.
I guess you already checked the output of ompi_info to confirm params are
checked at runtime.
Cheers,
Gilles
On Saturday, July 18, 2015, Ralph Castain wrote:
> Hi folks
>
> I keep getting segfault errors when testing 1.10, while others say the
> tests are
Folks,
I noticed several errors such as
http://mtt.open-mpi.org/index.php?do_redir=2244
that did not make any sense to me (at first glance)
I was able to attach one process when the issue occurs.
the sigsegv occurs in thread 2, while thread 1 is invoking
ompi_rte_finalize.
All I can think is a s
7/13/2015 11:42 PM, Ralph Castain wrote:
Yes, I’ll release a new rc once I get it all merged.
Are the linker warnings a change in behavior from 1.8.6? I confess
I’ve been seeing them in the master for so long that I’ve been
“inoculated” to ignore them.
On Jul 13, 2015, at 7:34 AM, Gilles Gouaill
?
Cheers,
Gilles
On Monday, July 13, 2015, Ralph Castain wrote:
> Gilles - just to confirm, the patch you provided here is the one in the
> updated PRs, yes? If so, I’ll consider those PRs as confirmed and commit
> them
>
>
> On Jul 13, 2015, at 7:20 AM, Gilles Gouaillardet &l
On Monday, July 13, 2015, Chris Samuel wrote:
> On Mon, 13 Jul 2015 05:17:29 PM Gilles Gouaillardet wrote:
>
> > Hi Chris,
>
> Hi Gilles,
>
> > i pushed my tarball into a gist :
>
> Thanks for that, I can confirm on our two x86-64 RHEL 6.6 boxes (one circa
&
Hi Chris,
i pushed my tarball into a gist :
git clone https://gist.github.com/ec20f77ec35533fa575a.git
and then the tarball is in ec20f77ec35533fa575a/openmpi-gitclone.tar.bz2
Cheers,
Gilles
On 7/13/2015 4:59 PM, Chris Samuel wrote:
Hi Gilles,
On Mon, 13 Jul 2015 03:16:57 PM Gilles
either system.
-Paul
On Sun, Jul 12, 2015 at 7:48 PM, Gilles Gouaillardet
mailto:gil...@rist.or.jp>> wrote:
Paul,
Here is a revised patch to be applied vs the 1.8.7-rc1 tarball
Could you please give it a try ?
Cheers,
Gilles
On 7/11/2015 4:22 AM, Paul Hargro
the XRC constants that we need to compile XRC code before
ruling that we can actually build XRC support.
> On Jul 10, 2015, at 10:33 AM, Gilles Gouaillardet
mailto:gilles.gouaillar...@gmail.com>> wrote:
>
> Sorry about that, and thanks for reverting the comm
was working correctly, why don’t we just revert the config
> in question back to the 1.8.4 version? Why was it changed in the first
> place? Does anyone know what problem someone was trying to solve?
>
>
> On Jul 10, 2015, at 7:33 AM, Gilles Gouaillardet <
> gilles.gouailla
Sorry about that, and thanks for reverting the commit.
Paul mentioned a patch I sent to the ml, and that worked for him.
The commit was supposed to be a more robust version.
For example, in rhel7, the deprecated function have been removed, but the
xrc domains is fine.
Currently, xrc is not support
entifier is related to
"ConnectIB XRC support" (not ConnectX).
If you look back at the 1.8.4 release you will find only a check
for ibv_create_xrc_rcv_qp.
-Paul
On Thu, Jul 9, 2015 at 6:17 PM, Gilles Gouaillardet
mailto:gil...@rist.or.jp>> wrote:
Thank
u look back at the 1.8.4 release you will find only a check for
ibv_create_xrc_rcv_qp.
-Paul
On Thu, Jul 9, 2015 at 6:17 PM, Gilles Gouaillardet <mailto:gil...@rist.or.jp>> wrote:
Thanks Paul,
i just found an other bug ...
(and i should be blamed for it)
here is att
Thu, Jul 9, 2015 at 5:17 PM, Gilles Gouaillardet <mailto:gil...@rist.or.jp>> wrote:
Paul,
can you please compress and post your config.log ?
what is the OFED version you are running ?
on master, that fix did the trick on mellanox test cluster (recent
OFED version) but
Paul,
can you please compress and post your config.log ?
what is the OFED version you are running ?
on master, that fix did the trick on mellanox test cluster (recent OFED
version) but did not
enable XRC on lanl test clusters (my best bet is an old OFED library)
Thanks
Gilles
On 7/10/2015 9
Ben and Paul,
Thanks for the report !
it looks like a simple typo (e.g. ')' instead of ','
the attached patch is for v1.8
in order to use it, you need recent autotools (see
http://www.open-mpi.org/source/building.php)
apply the patch, run autogen.pl, and then configure, make, make install
i
Nathan,
the root cause is your fixes were not backported to the v1.8 (nor the
v1.10) branch
i made PR https://github.com/open-mpi/ompi-release/pull/357 to fix this.
could you please review it ?
since there are quite a lot of differences between v1.8 and master, the
backport was not trivial.
In other places, initialization looks like
opal_mutex_t mutex = {{0}};
Btw, opal_condition is a standalone binary (e.g. Not part of ompi library),
so I do not think uninitialized common hurts here.
Cheers,
Gilles
On Wednesday, July 1, 2015, Nathan Hjelm wrote:
>
> PGI no longer suprises me w
I think Paul concern was about cross compilation
(e.g. no AC_TRY_RUN ...)
fwiw, fortran bindings cannot be built "as is" when cross compiling ompi
Cheers,
Gilles
On Wednesday, July 1, 2015, Ralph Castain wrote:
> Given the description, I suspect that any MPI application should be
> sufficient
Jeff,
the first argument of MPI_Buffer_detach is
OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buffer_addrfrom use-mpi-f08
however, the standard states this is TYPE(C_PTR), INTENT(OUT)
(and yes, this is very counter intuitive ... at first glance only)
can you please confirm this is an Open MPI bu
Ralph and all,
master is now fixed
Cheers,
Gilles
On 6/29/2015 7:07 AM, Gilles Gouaillardet wrote:
Ralph,
my bad, I wil fix this today
sorry for the inconvenience
Gilles
On Monday, June 29, 2015, Ralph Castain <mailto:r...@open-mpi.org>> wrote:
Hey folks
I don’t kno
Ralph,
my bad, I wil fix this today
sorry for the inconvenience
Gilles
On Monday, June 29, 2015, Ralph Castain wrote:
> Hey folks
>
> I don’t know who has been working on the Java bindings, but they are
> totally broken in the master repo - cannot compile. I tried fixing a few of
> the obviou
issues ?
Cheers,
Gilles
On 6/26/2015 12:31 PM, Paul Hargrove wrote:
On Thu, Jun 25, 2015 at 5:05 PM, Paul Hargrove <mailto:phhargr...@lbl.gov>> wrote:
On Thu, Jun 25, 2015 at 4:59 PM, Gilles Gouaillardet
mailto:gil...@rist.or.jp>> wrote:
In this case, mca_co
anox is going to talk about this internally and get
back to us.
> On Jun 25, 2015, at 2:59 PM, Gilles Gouaillardet
mailto:gilles.gouaillar...@gmail.com>> wrote:
>
> Jeff,
>
> this is exactly what happens.
>
> I will send a stack trace l
as already resolved that coll_ml_* symbol in the
> coll ml DSO. So the execution transfers back up into the coll ml DSO, and
> ... kaboom.
>
> A simple stack trace will confirm this -- it should show execution going
> down into libhcol and then back up into coll ml.
>
>
>
>
&
a coll ^ml
otherwise, it might crash (if coll_ml is loaded before coll_hcoll, which
is really system dependent)
Cheers,
Gilles
On 6/25/2015 10:46 AM, Gilles Gouaillardet wrote:
Daniel,
thanks for the logs.
an other workaround is to
mpirun --mca coll ^hcoll ...
i was able to reproduce the iss
file to blacklist the coll_ml module to
ensure this is working.
Mike and Mellanox folks, could you please comment on that ?
Cheers,
Gilles
On 6/24/2015 5:23 PM, Daniel Letai wrote:
Gilles,
Attached the two output logs.
Thanks,
Daniel
On 06/22/2015 08:08 AM, Gilles Gouaillardet wrote
Lisandro,
this is related to your previous report :
some bugs were introduced when silencing zero size mallocs
here is attached a patch (to be applied as well as the previous one)
Cheers,
Gilles
On 6/23/2015 12:23 AM, Lisandro Dalcin wrote:
The attached test code used to work in 1.8.5 and be
on
the other hand, i do not think as a community, we are interested by
mpi4py bugs.
i will let other folks comment on that.
Cheers,
Gilles
On 6/23/2015 9:49 AM, Lisandro Dalcin wrote:
On 22 June 2015 at 18:26, Gilles Gouaillardet wrote:
if you still have the test program that can do that
, Gilles Gouaillardet wrote:
Lisandro,
there was a regression in 1.8.6 with NBC and zero size messages.
(ironically, the bug was introduced when silencing zero size malloc you
reported
in http://www.open-mpi.org/community/lists/devel/2015/05/17388.php
the attached patch fixes the issue
OK, I'l
#x27;s called 2.x because all the 2.x releases will come from there - not
> just the 2.0.x releases.
>
> Sent from my phone. No type good.
>
> > On Jun 21, 2015, at 7:49 PM, Gilles Gouaillardet > wrote:
> >
> > Jeff,
> >
> > currently, the g
Lisandro,
there was a regression in 1.8.6 with NBC and zero size messages.
(ironically, the bug was introduced when silencing zero size malloc you
reported
in http://www.open-mpi.org/community/lists/devel/2015/05/17388.php
the attached patch fixes the issue
in your initial report, you mention
Jeff,
currently, the github "v2.0" branch is called "v2.x"
was this intended ?
Cheers,
Gilles
On 6/21/2015 2:00 AM, Jeff Squyres (jsquyres) wrote:
The v2.0 branch has been created on the github ompi-release repo. Let the pull
requests commence.
Just so that we developers are on the same s
Ralph and all,
this is fixed at
https://github.com/open-mpi/ompi/commit/ee3a1da28a3c018115bad82e0a9e7d1e04d35148
Cheers,
Gilles
On 6/14/2015 10:43 AM, Gilles Gouaillardet wrote:
Will do tomorrow.
proc is only used in heterogeneous mode, hence the warning
On Sunday, June 14, 2015, Ralph
Will do tomorrow.
proc is only used in heterogeneous mode, hence the warning
On Sunday, June 14, 2015, Ralph Castain wrote:
> *pml_ob1_recvreq.c:* In function '*mca_pml_ob1_recv_request_put_frag*':
> *pml_ob1_recvreq.c:397:18:* *warning: *unused variable '*proc*'
> [-Wunused-variable]
> omp
t; I’d also like to see us apply the same logic to the MCA param system.
> Let’s just define ~4 named levels and get rid of the fine grained numbering.
>
>
> On Jun 8, 2015, at 2:04 AM, Gilles Gouaillardet > wrote:
>
> Nathan,
>
> i think it is a good idea to use
Nathan,
i think it is a good idea to use names vs numeric values for verbosity.
what about using "a la" log4c verbosity names ?
http://sourceforge.net/projects/log4c/
static const char* const priorities[] = {
"FATAL",
"ALERT",
"CRIT",
"ERROR",
"WARN",
"NOTICE",
"INFO
> On May 29, 2015, at 8:11 AM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com
> > wrote:
>
> Ralph,
>
> this is being discussed at https://github.com/open-mpi/ompi/pull/605
>
> my latest solution is available at
> https://github.com/ggouaillardet/ompi/comm
Ralph,
this is being discussed at https://github.com/open-mpi/ompi/pull/605
my latest solution is available at
https://github.com/ggouaillardet/ompi/commit/2a8ef01bad02b6c833c642d17d9a1140ea9292a4
the pr is a simple but temporary solution in which I introduced a new mca
param,
so if we decide th
Edgar,
i am sorry about that.
i fixed some memory leaks (some memory was leaking in some error cases).
i also moved (up) some malloc in order to group them and simplify the
handling
of error cases.
per your comment, one move was incorrect indeed :-(
Cheers,
Gilles
On 5/28/2015 12:14 PM, Ed
Mike,
most coverity links reported by Jenkins are invalid
for example https://github.com/open-mpi/ompi/pull/593 points to
http://bgate.mellanox.com:/jenkins/job/gh-ompi-master-pr//ws/cov_build/all_535/output/errors/index.html
which does not exist (any more)
only the link of the most recen
401 - 500 of 816 matches
Mail list logo