Re: [OMPI devel] pmix warnings on cray with HEAD

2015-09-22 Thread Gilles Gouaillardet
Mark, i think there is no more mca_pmix_native.so. you can confirm that by checking the timestamps of the libs after running make install. just remove your install dir, and run make install again, and that will solve your issue. Cheers, Gilles On Tue, Sep 22, 2015 at 5:26 PM, Mark Santcroos w

Re: [OMPI devel] PMIX vs Solaris

2015-09-20 Thread Gilles Gouaillardet
x/pmix1xx includes "-D_REENTRANT". > However, they don't include "-mt". > I believe we concluded (when we had problems previously) that "-mt" was > the proper flag (at compile and link) for multi-threaded with the Studio > compilers. > > -

Re: [OMPI devel] PMIX vs Solaris

2015-09-20 Thread Gilles Gouaillardet
t, Sep 19, 2015 at 8:50 PM, Ralph Castain > > wrote: >> >>> Paul, can you clarify something for me? The error in this case indicates >>> that the client wasn’t able to reach the daemon - this should have resulted >>> in termination of the job. Did the j

Re: [OMPI devel] PMIX vs Solaris

2015-09-18 Thread Gilles Gouaillardet
Paul, IIRC, the "Permission denied" is coming from hwloc that cannot collect all the info it would like. Cheers, Gilles On 9/18/2015 2:34 PM, Paul Hargrove wrote: Tried tonight's master tarball on Solaris 11.2 on x86-64 with the Studio Compilers (default ILP32 output) and saw the following

Re: [OMPI devel] Commit 6e6a3e96

2015-09-16 Thread Gilles Gouaillardet
George, I will revisit this. if I added const modifier when not required by the standard, this was not intentional, this was a mistake. thanks for the report Gilles On Wednesday, September 16, 2015, George Bosilca wrote: > Gilles, > > Your commit 6e6a3e96 is only partially correct. There is n

Re: [OMPI devel] Remaining MTT errors

2015-09-14 Thread Gilles Gouaillardet
Ralph, the collective/op, collective/op_mpifh, collective/op_usempi, group/group, onesided/c_lock_illegal and random/attr-error-code fails because your contrib/platform/intel/bend/linux.conf contains the following line mpi_param_check = 0 and this is not handled correctly by ibm test suite.

Re: [OMPI devel] Remaining MTT errors

2015-09-13 Thread Gilles Gouaillardet
Ralph, according to mtt logs (click on the MPI Install button at the top left corner), ompi was built in zero seconds ... iirc, you do not build ompi under mtt, but you use the mtt "installed" module so my best bet is mtt logged some garbage since it has no way to figure out how ompi was configure

Re: [OMPI devel] Remaining MTT errors

2015-09-12 Thread Gilles Gouaillardet
Ralph, at first glance, these errors look unrelated to PMIx. I noticed a bunch of bind() failure. based on your command line, I guess you are not running your job via a batch manager, and I would guess not all unix sockets are always cleaned up. (or this is an old bug and you did not manually clea

[OMPI devel] issue with group sentinel values

2015-09-11 Thread Gilles Gouaillardet
Nathan, i am experiencing some issues and i found commit 0bf06de3f1444f469303e47752430ec9b423b33f https://github.com/open-mpi/ompi/commit/0bf06de3f1444f469303e47752430ec9b423b33f and the following are very likely the root cause. i experience this on a linux sparc system only. Per the commit

Re: [OMPI devel] New master warnings

2015-09-11 Thread Gilles Gouaillardet
Ralph, this is fixed in https://github.com/open-mpi/ompi/commit/a1627feaf74d8562146a1afbfabec60651496c06 Cheers, Gilles On 9/11/2015 1:02 PM, Gilles Gouaillardet wrote: Ralph, will do i think this new warnings are a consequence of the changes i pushed recently (e.g. add the const

Re: [OMPI devel] New master warnings

2015-09-11 Thread Gilles Gouaillardet
bled by default] coll_base->coll_ireduce = mca_coll_ml_reduce_nb; On Sep 10, 2015, at 7:02 PM, Shamis, Pavel <mailto:sham...@ornl.gov>> wrote: ​Awesome, thanks for fixing this. *From:*devel <mailto:devel-boun...@ope

Re: [OMPI devel] New master warnings

2015-09-10 Thread Gilles Gouaillardet
Pasha, i fixed that in https://github.com/open-mpi/ompi/commit/c404e98dced4104cd3abe7485846368325c3d150 but forgot to post it to the ML ... Cheers, Gilles On 9/11/2015 7:31 AM, Shamis, Pavel wrote: Ralph, I don't see these warnings on my fedora box with gcc 5.1.1. I will try to fix it, but

Re: [OMPI devel] RFC: Remove --without-hwloc configure option

2015-09-04 Thread Gilles Gouaillardet
regarding the FX10 specific issue > > Cheers, > > Gilles > > On 9/4/2015 2:31 PM, Brice Goglin wrote: > > Le 04/09/2015 00:36, Gilles Gouaillardet a écrit : > > Ralph, > > just to be clear, your proposal is to abort if openmpi is configured > with --without-hw

[OMPI devel] no more cast away const

2015-09-04 Thread Gilles Gouaillardet
Folks, a bunch of C bindings have comments such as /* XXX -- CONST -- do not cast away const -- update mca/coll */ and that has been there for a long time. i made PR #839 https://github.com/open-mpi/ompi/pull/839 to fix this. the change is quite massive (270 files) since : - the C bindings had t

[OMPI devel] RFC: Remove the --enable-mpi-profile option

2015-09-04 Thread Gilles Gouaillardet
Folks, Jeff and i have been discussing the possibility of removing the --enable-mpi-profile option from ompi. (see https://github.com/open-mpi/ompi/pull/845 for the details) Removing this option would simplify the building process, and make it crystal clear that Fortran bindings call the C PM

Re: [OMPI devel] RFC: Remove --without-hwloc configure option

2015-09-04 Thread Gilles Gouaillardet
:31 PM, Brice Goglin wrote: Le 04/09/2015 00:36, Gilles Gouaillardet a écrit : Ralph, just to be clear, your proposal is to abort if openmpi is configured with --without-hwloc, right ? ( the --with-hwloc option is not removed because we want to keep the option of using an external hwloc library

Re: [OMPI devel] RFC: Remove --without-hwloc configure option

2015-09-03 Thread Gilles Gouaillardet
Ralph, just to be clear, your proposal is to abort if openmpi is configured with --without-hwloc, right ? ( the --with-hwloc option is not removed because we want to keep the option of using an external hwloc library ) if I understand correctly, Paul's point is that if openmpi is ported to a new

Re: [OMPI devel] 1.10.0 issue

2015-09-03 Thread Gilles Gouaillardet
Ralph, if I correctly read between the lines of your second point, omnipath (PSM2) is working out of the box. I am not sure this is the case, and/or my extrapolation might be incorrect. if I understood correctly, psm2 is a new feature. from a distro point of view, that could be a new package (kno

Re: [OMPI devel] 1.10.0 issue

2015-09-03 Thread Gilles Gouaillardet
Jeff, on second thought, wouldn't it be better to simple disable both PSM and PSM2 in openmpi, and let libfabric handle these conflicts ? does that make any sense ? Cheers, Gilles On Thursday, September 3, 2015, Jeff Squyres (jsquyres) wrote: > I agree with what George says. > > AFAIK, Red Ha

Re: [OMPI devel] 1.10.0 issue

2015-09-03 Thread Gilles Gouaillardet
Michael, if a solution with two packages is acceptable, then an other and simpler option is to configure openmpi for PSM with --without-psm2, and openmpi for PSM2 with --without-psm this is safe for --disable-dlopen or --enable-static, and you do not need to tweak the conf files Cheers, Gilles

Re: [OMPI devel] 1.10.0 issue

2015-09-03 Thread Gilles Gouaillardet
afraid that won’t solve the problem - the distro will still feel the need to release -two- versions of OMPI, one with PSM and one with PSM2. Ordinarily, I wouldn’t care - but this creates user confusion and reflects on us as a community. > On Sep 2, 2015, at 6:50 PM, Gill

Re: [OMPI devel] 1.10.0 issue

2015-09-03 Thread Gilles Gouaillardet
care - but this creates user confusion and reflects on us as a community. On Sep 2, 2015, at 6:50 PM, Gilles Gouaillardet wrote: Ralph, what about automatically *not* building PSM2 if PSM is built and PSM2 is not explicitly required ? /* in order to be future proof, we could even do that only

Re: [OMPI devel] 1.10.0 issue

2015-09-02 Thread Gilles Gouaillardet
Ralph, what about automatically *not* building PSM2 if PSM is built and PSM2 is not explicitly required ? /* in order to be future proof, we could even do that only if we detect a symbol conflict */ we could abort if ompi is configure'd with both --with-psm and --with-psm2, or simply do nothin

Re: [OMPI devel] Problem running from ompi master

2015-08-31 Thread Gilles Gouaillardet
Hi, this part has been revamped recently. at first, i would recommend you make a fresh install remove the install directory, and the build directory if you use VPATH, re-run configure && make && make install that should hopefully fix the issue Cheers, Gilles On 9/1/2015 9:35 AM, Cabral, Mat

Re: [OMPI devel] Dual rail IB card problem

2015-08-31 Thread Gilles Gouaillardet
Brice, as a side note, what is the rationale for defining the distance as a floating point number ? i remember i had to fix a bug in ompi a while ago /* e.g. replace if (d1 == d2) with if((d1-d2) < epsilon) */ Cheers, Gilles On 9/1/2015 5:28 AM, Brice Goglin wrote: The locality is mlx4_0 as

Re: [OMPI devel] fortran calling MPI_* instead of PMPI_*

2015-08-31 Thread Gilles Gouaillardet
Jeff, i filed PR #845 https://github.com/open-mpi/ompi/pull/845 could you please have a look ? Cheers, Gilles On 8/30/2015 9:20 PM, Gilles Gouaillardet wrote: ok, will do basically, I simply have to #include "ompi/mpi/c/profile/defines.h" if configure set the WANT_MPI_PROFI

Re: [OMPI devel] fortran calling MPI_* instead of PMPI_*

2015-08-30 Thread Gilles Gouaillardet
*_f files are impacted, and for mpif-h only, so i'd rather ask before I fill the pr, and even if a sed command will do most of the job */ Cheers, Gilles On Saturday, August 29, 2015, Jeff Squyres (jsquyres) wrote: > On Aug 27, 2015, at 3:25 AM, Gilles Gouaillardet > wro

Re: [OMPI devel] OpenMPI 1.8 Bug Report

2015-08-27 Thread Gilles Gouaillardet
Thanks Michael and Kawashima-san, i made PR #838 to fix this it is currently available at https://github.com/open-mpi/ompi/pull/838 Cheers, Gilles On 8/27/2015 6:29 PM, Michael Knobloch wrote: Dear OpenMPI developers, I noticed a bug in the definition of the 3 MPI-3 RMA functions MPI_Compare

Re: [OMPI devel] bind to interface / address oob_tcp_listener.c:create_listen()

2015-08-27 Thread Gilles Gouaillardet
Ralph, what about : - if only one interface is specified (e.g. *_if_include eth0), then bind to that interface - otherwise, bind to all interfaces Mark, would that solve your issue ? Cheers, Gilles On 8/28/2015 9:50 AM, Ralph Castain wrote: I committed the change that prevents orte-submit

Re: [OMPI devel] OpenMPI 1.8 Bug Report

2015-08-27 Thread Gilles Gouaillardet
Kawashima-san, you are right, I mixed MPI_Buffer_detach and MPI_Win_detach sorry for the confusion Cheers, Gilles On Thursday, August 27, 2015, Kawashima, Takahiro < t-kawash...@jp.fujitsu.com> wrote: > Gilles, > > > there is a comment in the source code to explain this. > > Could you point wh

Re: [OMPI devel] OpenMPI 1.8 Bug Report

2015-08-27 Thread Gilles Gouaillardet
iirc, the MPI_Win_detach discrepancy with the standard is intentional in fortran 2008, there is a comment in the source code to explain this. On Thursday, August 27, 2015, Kawashima, Takahiro < t-kawash...@jp.fujitsu.com> wrote: > Oh, I also noticed it yesterday and was about to report it. > > An

Re: [OMPI devel] fortran calling MPI_* instead of PMPI_*

2015-08-27 Thread Gilles Gouaillardet
I am lost ... from ompi/mpi/fortran/mpif-h/profile/palltoall_f.c void ompi_alltoall_f(char *sendbuf, MPI_Fint *sendcount, MPI_Fint *sendtype, char *recvbuf, MPI_Fint *recvcount, MPI_Fint *recvtype, MPI_Fint *comm, MPI_Fint *ierr) { [...] c_ierr = M

Re: [OMPI devel] mca_mtl_psm and java

2015-08-26 Thread Gilles Gouaillardet
Paul, i tried PSM_RCVTHREAD=0 but it did not help Jeff, you did not read too much ... but my words were not quite accurate. yes, the signal handlers are set in the library constructor. by reading the source code, i found that can be avoided by setting the yet undocumented IPATH_NO_BACKTRACE en

Re: [OMPI devel] mca_mtl_psm and java

2015-08-25 Thread Gilles Gouaillardet
t; -Paul > > On Tue, Aug 25, 2015 at 6:02 PM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com > > wrote: > >> i run on a centos 7 vm, and with the OFED that comes with centos >> (I will send full details tomorrow) >> there is no psm hardware, just inf

Re: [OMPI devel] cosmetic misleading mpirun error message

2015-08-25 Thread Gilles Gouaillardet
would it be easier if the option was --host instead of -host ? I guess changing the cli is not an option for the v1.x series, so what about adding the -hosts option (alias to -host option) ? I made the same mistake a few times before, adding a s to hosts looks more intuitive for me. my 0.02 US$ Gi

Re: [OMPI devel] mca_mtl_psm and java

2015-08-25 Thread Gilles Gouaillardet
EADME and the FAQ. >>> >>> I'd be against adding user documentation to the wiki -- this would be a >>> 3rd place for users to look for information. >>> >>> > - file a bug against intel psm >>> >>> I'd like to hear what

[OMPI devel] fortran calling MPI_* instead of PMPI_*

2015-08-25 Thread Gilles Gouaillardet
Folks, I ran some basic tests with IPM profiler-like https://github.com/nerscadmin/IPM and found that when fortran calls an mpi subroutine, this is accounted twice. IPM defines both MPI_* subroutines and their fortran mpi_*_ counterpart. since the ompi fortran calls the MPI_* symbol (and not the P

Re: [OMPI devel] mca_mtl_psm and java

2015-08-25 Thread Gilles Gouaillardet
ipath change actually change its signal handler > behavior? > > > > On Aug 25, 2015, at 4:27 AM, Gilles Gouaillardet > wrote: > > > > Folks, > > > > some time ago, some crashes were reported when using java bindings. > > one of them was caused was cause

[OMPI devel] mca_mtl_psm and java

2015-08-25 Thread Gilles Gouaillardet
Folks, some time ago, some crashes were reported when using java bindings. one of them was caused was caused by mca_mtl_psm.so. the root cause is libinfinipath.so initializer sets its own signal handler, which conflicts with the signal handler sets by the jvm. the only workaround is to disable

Re: [OMPI devel] esslingen MTT?

2015-08-25 Thread Gilles Gouaillardet
Thanks Adrian, i fixed this in PR #831 https://github.com/open-mpi/ompi/pull/831 and push it shortly to master Best regards, Gilles On 8/25/2015 4:47 PM, Adrian Reber wrote: On Mon, Aug 24, 2015 at 09:47:22PM +, Jeff Squyres (jsquyres) wrote: Who runs the esslingen MTT? You're getting

Re: [OMPI devel] reachable_netlink mca, libnl and libnl3

2015-08-24 Thread Gilles Gouaillardet
a first step could be adding a --disable-libnl3 option to configure, which means components should not even try to use libnl3 makes sense ? On Monday, August 24, 2015, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > iirc, librdmacm uses libnl > > I am not sure if h

Re: [OMPI devel] reachable_netlink mca, libnl and libnl3

2015-08-24 Thread Gilles Gouaillardet
f both libnl and libnl3 are present > in the same process (e.g., if some of OMPI's dependent libraries pull them > both in). We could try to opal_dl_open() NULL and them look for symbols > that are unique to libnl and libnl3, but a) when to do that, and b) it's > not guaranteed

Re: [OMPI devel] reachable_netlink mca, libnl and libnl3

2015-08-24 Thread Gilles Gouaillardet
al_dl_open() NULL and them look for symbols > that are unique to libnl and libnl3, but a) when to do that, and b) it's > not guaranteed to work in all cases. > > > > > > On Aug 24, 2015, at 7:36 AM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com > w

[OMPI devel] reachable_netlink mca, libnl and libnl3

2015-08-24 Thread Gilles Gouaillardet
Folks, I recently installed libnl3-devel rpm on my centos 7 box, reconfigured and recompiled ompi, and ompi_info now crashes. it seems the root cause is an obscure conflict between libnl and libnl3. libnl is indirectly required by the common_verbs mac (OFED libraries do need it) and libnl3 is req

Re: [OMPI devel] 1.10.0rc4 hcoll problem compiled statically

2015-08-23 Thread Gilles Gouaillardet
same issue, but suspect that it is > not. > > -Paul > > On Sat, Aug 22, 2015 at 6:00 PM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com > > wrote: > >> Paul, >> >> isn t this an issue that was already discussed ? >> mellanox proprietary hcoll

Re: [OMPI devel] 1.10.0rc4 hcoll problem compiled statically

2015-08-22 Thread Gilles Gouaillardet
Paul, isn t this an issue that was already discussed ? mellanox proprietary hcoll library includes its own coll ml module that conflicts with the ompi one. mellanox folks fixed this internally but I am not sure this has been released. you can run nm libhcoll.so if there are some symbols starting w

Re: [OMPI devel] v1.10.0 release

2015-08-14 Thread Gilles Gouaillardet
urgent, i assigned them to you. this simply remove a bogus test (OFED version used at runtime vs compile time) note i made a PR for master but i did not push my changes Cheers, Gilles On 8/14/2015 8:44 AM, Gilles Gouaillardet wrote: Paul, i tried to fix this test, and at this stage, i do not

Re: [OMPI devel] v1.10.0 release

2015-08-13 Thread Gilles Gouaillardet
Paul, i tried to fix this test, and at this stage, i do not understand any more the logic of this test. right now, my best bet is to simply remove this test. the worst case scenario will be a potentially obscure error message if ompi was built with OFED X and ran with OFED Y. I will make the

Re: [OMPI devel] new branch on open-mpi/ompi?

2015-08-06 Thread Gilles Gouaillardet
Hi Howard, it looks like i pushed by branch to ompi repo instead of my clone ... that was clearly a mistake and i deleted the branch Cheers, Gilles On 8/6/2015 12:14 AM, Howard Pritchard wrote: HI Folks, There's a new branch on open-mpi/ompi repo. Is this intentional? Howard ___

Re: [OMPI devel] Error in version 1.8.7?

2015-08-04 Thread Gilles Gouaillardet
Harmut, yes this is a bug ... we are still working on a proper fix. in the mean time, you can comment the dlsym test in the openib btl (otherwise, openmpi falls back to tcp ...) Cheers, Gilles On Tuesday, August 4, 2015, Hartmut Häfner (SCC) wrote: > Dear developers, > > we have installed Op

Re: [OMPI devel] stdout, stderr reporting different values for isatty

2015-07-27 Thread Gilles Gouaillardet
Christoph, that is correct stdout is a tty and stderr is not. it is a pipe to orted. I do not think that would be hard to change. is this a source of problem for your applications ? note this kind of behavior can be caused by the batch manager. if you use slurm and srun instead of mpirun, I am no

Re: [OMPI devel] malloc(0) warning with 1.8.7

2015-07-27 Thread Gilles Gouaillardet
Lisandro, i fixed it on master at https://github.com/open-mpi/ompi/commit/318a1a40a4ab345f417b8932326d4dd2e68d82bc could you git it a try ? Cheers, Gilles On 7/26/2015 9:26 AM, Gilles Gouaillardet wrote: Lisandro, I think I see what is going wrong and will fix it Thanks for the report

Re: [OMPI devel] malloc(0) warning with 1.8.7

2015-07-25 Thread Gilles Gouaillardet
Lisandro, I think I see what is going wrong and will fix it Thanks for the report, Gilles On Saturday, July 25, 2015, Lisandro Dalcin wrote: > Using a debug build of 1.8.7, I'm still getting this malloc(0) warning: > > malloc debug: Request for 0 bytes (coll_libnbc_ireduce_scatter_block.c, 67

Re: [OMPI devel] MAYBE problem w/ XRC with OFED pre-3.12

2015-07-25 Thread Gilles Gouaillardet
Paul, where do you run mpirun ? on a compute node ? on a login node with no infiniband interface ? if on a login node, are the infiniband libraries at least available ? Cheers, Gilles On Saturday, July 25, 2015, Paul Hargrove wrote: > I know Gilles and I went to a fair amount of effort to get

Re: [OMPI devel] MAYBE problem w/ XRC with OFED pre-3.12

2015-07-25 Thread Gilles Gouaillardet
using the > dlsym() check if not using shared libs, including in the --disable-dlopen > and --disable-shared cases. > > Also, I noticed that you don't have a dlclose(lib) call > in mca_btl_openib_xrc_check_api(). > > -Paul > > On Fri, Jul 24, 2015 at 11:55 PM, Gilles Go

Re: [OMPI devel] MAYBE problem w/ XRC with OFED pre-3.12

2015-07-25 Thread Gilles Gouaillardet
Paul, this test is here to gracefully disable the opening btl if ompi was built with recent ofed, but is running with an old version (or the other way around) I recently got a similar false positive when ompi was configure'd with static libraries only. in this case, a workaround was to dlsym the

Re: [OMPI devel] race condition in finalize

2015-07-22 Thread Gilles Gouaillardet
ain, i was unable to reproduce any crash. Cheers, Gilles On 7/22/2015 12:48 AM, Ralph Castain wrote: I believe I have this fixed - please see if this solves the problem: https://github.com/open-mpi/ompi/pull/730 On Jul 21, 2015, at 12:22 AM, Gilles Gouaillardet <mailto:gil...@rist.or.jp>

Re: [OMPI devel] race condition in finalize

2015-07-21 Thread Gilles Gouaillardet
t, next, &orte_rml_base.posted_recvs, orte_rml_posted_recv_t) { /* since names could include wildcards, must use * the more generalized comparison function */ i hope this helps, Gilles On 7/17/2015 11:04 PM, Ralph Castain wrote: It’s probably a race condition ca

Re: [OMPI devel] Erroneous test?

2015-07-20 Thread Gilles Gouaillardet
it leaves something to be desired. Sigh. Sorry for the “false” alarm. On Jul 17, 2015, at 8:54 PM, Gilles Gouaillardet <mailto:gilles.gouaillar...@gmail.com>> wrote: Ralph, based on the source code (ompi_mpi_params.c:91) I was expecting a Boolean ompi_mpi_param_check Cheers, G

Re: [OMPI devel] Erroneous test?

2015-07-17 Thread Gilles Gouaillardet
Ralph, based on the source code (ompi_mpi_params.c:91) I was expecting a Boolean ompi_mpi_param_check Cheers, Gilles On Saturday, July 18, 2015, Ralph Castain wrote: > Yep, I checked: > > MPI parameter check: runtime > > > > On Jul 17, 2015, at 8:00 PM

Re: [OMPI devel] Erroneous test?

2015-07-17 Thread Gilles Gouaillardet
Ralph, I will try to reproduce this. I guess you already checked the output of ompi_info to confirm params are checked at runtime. Cheers, Gilles On Saturday, July 18, 2015, Ralph Castain wrote: > Hi folks > > I keep getting segfault errors when testing 1.10, while others say the > tests are

[OMPI devel] race condition in finalize

2015-07-17 Thread Gilles Gouaillardet
Folks, I noticed several errors such as http://mtt.open-mpi.org/index.php?do_redir=2244 that did not make any sense to me (at first glance) I was able to attach one process when the issue occurs. the sigsegv occurs in thread 2, while thread 1 is invoking ompi_rte_finalize. All I can think is a s

Re: [OMPI devel] 1.8.7rc1 testing results

2015-07-14 Thread Gilles Gouaillardet
7/13/2015 11:42 PM, Ralph Castain wrote: Yes, I’ll release a new rc once I get it all merged. Are the linker warnings a change in behavior from 1.8.6? I confess I’ve been seeing them in the master for so long that I’ve been “inoculated” to ignore them. On Jul 13, 2015, at 7:34 AM, Gilles Gouaill

Re: [OMPI devel] 1.8.7rc1 testing results

2015-07-13 Thread Gilles Gouaillardet
? Cheers, Gilles On Monday, July 13, 2015, Ralph Castain wrote: > Gilles - just to confirm, the patch you provided here is the one in the > updated PRs, yes? If so, I’ll consider those PRs as confirmed and commit > them > > > On Jul 13, 2015, at 7:20 AM, Gilles Gouaillardet &l

Re: [OMPI devel] 1.8.7rc1 testing results

2015-07-13 Thread Gilles Gouaillardet
On Monday, July 13, 2015, Chris Samuel wrote: > On Mon, 13 Jul 2015 05:17:29 PM Gilles Gouaillardet wrote: > > > Hi Chris, > > Hi Gilles, > > > i pushed my tarball into a gist : > > Thanks for that, I can confirm on our two x86-64 RHEL 6.6 boxes (one circa &

Re: [OMPI devel] 1.8.7rc1 testing results

2015-07-13 Thread Gilles Gouaillardet
Hi Chris, i pushed my tarball into a gist : git clone https://gist.github.com/ec20f77ec35533fa575a.git and then the tarball is in ec20f77ec35533fa575a/openmpi-gitclone.tar.bz2 Cheers, Gilles On 7/13/2015 4:59 PM, Chris Samuel wrote: Hi Gilles, On Mon, 13 Jul 2015 03:16:57 PM Gilles

Re: [OMPI devel] 1.8.7rc1 testing results

2015-07-13 Thread Gilles Gouaillardet
either system. -Paul On Sun, Jul 12, 2015 at 7:48 PM, Gilles Gouaillardet mailto:gil...@rist.or.jp>> wrote: Paul, Here is a revised patch to be applied vs the 1.8.7-rc1 tarball Could you please give it a try ? Cheers, Gilles On 7/11/2015 4:22 AM, Paul Hargro

Re: [OMPI devel] 1.8.7rc1 testing results

2015-07-12 Thread Gilles Gouaillardet
the XRC constants that we need to compile XRC code before ruling that we can actually build XRC support. > On Jul 10, 2015, at 10:33 AM, Gilles Gouaillardet mailto:gilles.gouaillar...@gmail.com>> wrote: > > Sorry about that, and thanks for reverting the comm

Re: [OMPI devel] 1.8.7rc1 testing results

2015-07-10 Thread Gilles Gouaillardet
was working correctly, why don’t we just revert the config > in question back to the 1.8.4 version? Why was it changed in the first > place? Does anyone know what problem someone was trying to solve? > > > On Jul 10, 2015, at 7:33 AM, Gilles Gouaillardet < > gilles.gouailla

Re: [OMPI devel] 1.8.7rc1 testing results

2015-07-10 Thread Gilles Gouaillardet
Sorry about that, and thanks for reverting the commit. Paul mentioned a patch I sent to the ml, and that worked for him. The commit was supposed to be a more robust version. For example, in rhel7, the deprecated function have been removed, but the xrc domains is fine. Currently, xrc is not support

Re: [OMPI devel] 1.8.7 rc1 out for review

2015-07-10 Thread Gilles Gouaillardet
entifier is related to "ConnectIB XRC support" (not ConnectX). If you look back at the 1.8.4 release you will find only a check for ibv_create_xrc_rcv_qp. -Paul On Thu, Jul 9, 2015 at 6:17 PM, Gilles Gouaillardet mailto:gil...@rist.or.jp>> wrote: Thank

Re: [OMPI devel] 1.8.7 rc1 out for review

2015-07-10 Thread Gilles Gouaillardet
u look back at the 1.8.4 release you will find only a check for ibv_create_xrc_rcv_qp. -Paul On Thu, Jul 9, 2015 at 6:17 PM, Gilles Gouaillardet <mailto:gil...@rist.or.jp>> wrote: Thanks Paul, i just found an other bug ... (and i should be blamed for it) here is att

Re: [OMPI devel] 1.8.7 rc1 out for review

2015-07-09 Thread Gilles Gouaillardet
Thu, Jul 9, 2015 at 5:17 PM, Gilles Gouaillardet <mailto:gil...@rist.or.jp>> wrote: Paul, can you please compress and post your config.log ? what is the OFED version you are running ? on master, that fix did the trick on mellanox test cluster (recent OFED version) but

Re: [OMPI devel] 1.8.7 rc1 out for review

2015-07-09 Thread Gilles Gouaillardet
Paul, can you please compress and post your config.log ? what is the OFED version you are running ? on master, that fix did the trick on mellanox test cluster (recent OFED version) but did not enable XRC on lanl test clusters (my best bet is an old OFED library) Thanks Gilles On 7/10/2015 9

Re: [OMPI devel] XRC Support

2015-07-09 Thread Gilles Gouaillardet
Ben and Paul, Thanks for the report ! it looks like a simple typo (e.g. ')' instead of ',' the attached patch is for v1.8 in order to use it, you need recent autotools (see http://www.open-mpi.org/source/building.php) apply the patch, run autogen.pl, and then configure, make, make install i

Re: [OMPI devel] Open MPI 1.8.6 memory leak

2015-07-01 Thread Gilles Gouaillardet
Nathan, the root cause is your fixes were not backported to the v1.8 (nor the v1.10) branch i made PR https://github.com/open-mpi/ompi-release/pull/357 to fix this. could you please review it ? since there are quite a lot of differences between v1.8 and master, the backport was not trivial.

Re: [OMPI devel] error in test/threads/opal_condition.c

2015-07-01 Thread Gilles Gouaillardet
In other places, initialization looks like opal_mutex_t mutex = {{0}}; Btw, opal_condition is a standalone binary (e.g. Not part of ompi library), so I do not think uninitialized common hurts here. Cheers, Gilles On Wednesday, July 1, 2015, Nathan Hjelm wrote: > > PGI no longer suprises me w

Re: [OMPI devel] Testing of "OMP_PROC_BIND value is invalid" errors

2015-07-01 Thread Gilles Gouaillardet
I think Paul concern was about cross compilation (e.g. no AC_TRY_RUN ...) fwiw, fortran bindings cannot be built "as is" when cross compiling ompi Cheers, Gilles On Wednesday, July 1, 2015, Ralph Castain wrote: > Given the description, I suspect that any MPI application should be > sufficient

[OMPI devel] MPI_Buffer_detach fortran binding

2015-06-29 Thread Gilles Gouaillardet
Jeff, the first argument of MPI_Buffer_detach is OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buffer_addrfrom use-mpi-f08 however, the standard states this is TYPE(C_PTR), INTENT(OUT) (and yes, this is very counter intuitive ... at first glance only) can you please confirm this is an Open MPI bu

Re: [OMPI devel] Java bindings are completely broken

2015-06-28 Thread Gilles Gouaillardet
Ralph and all, master is now fixed Cheers, Gilles On 6/29/2015 7:07 AM, Gilles Gouaillardet wrote: Ralph, my bad, I wil fix this today sorry for the inconvenience Gilles On Monday, June 29, 2015, Ralph Castain <mailto:r...@open-mpi.org>> wrote: Hey folks I don’t kno

Re: [OMPI devel] Java bindings are completely broken

2015-06-28 Thread Gilles Gouaillardet
Ralph, my bad, I wil fix this today sorry for the inconvenience Gilles On Monday, June 29, 2015, Ralph Castain wrote: > Hey folks > > I don’t know who has been working on the Java bindings, but they are > totally broken in the master repo - cannot compile. I tried fixing a few of > the obviou

Re: [OMPI devel] [OMPI users] simple mpi hello world segfaults when coll ml not disabled

2015-06-26 Thread Gilles Gouaillardet
issues ? Cheers, Gilles On 6/26/2015 12:31 PM, Paul Hargrove wrote: On Thu, Jun 25, 2015 at 5:05 PM, Paul Hargrove <mailto:phhargr...@lbl.gov>> wrote: On Thu, Jun 25, 2015 at 4:59 PM, Gilles Gouaillardet mailto:gil...@rist.or.jp>> wrote: In this case, mca_co

Re: [OMPI devel] [OMPI users] simple mpi hello world segfaults when coll ml not disabled

2015-06-25 Thread Gilles Gouaillardet
anox is going to talk about this internally and get back to us. > On Jun 25, 2015, at 2:59 PM, Gilles Gouaillardet mailto:gilles.gouaillar...@gmail.com>> wrote: > > Jeff, > > this is exactly what happens. > > I will send a stack trace l

Re: [OMPI devel] [OMPI users] simple mpi hello world segfaults when coll ml not disabled

2015-06-25 Thread Gilles Gouaillardet
as already resolved that coll_ml_* symbol in the > coll ml DSO. So the execution transfers back up into the coll ml DSO, and > ... kaboom. > > A simple stack trace will confirm this -- it should show execution going > down into libhcol and then back up into coll ml. > > > > &

Re: [OMPI devel] [OMPI users] simple mpi hello world segfaults when coll ml not disabled

2015-06-25 Thread Gilles Gouaillardet
a coll ^ml otherwise, it might crash (if coll_ml is loaded before coll_hcoll, which is really system dependent) Cheers, Gilles On 6/25/2015 10:46 AM, Gilles Gouaillardet wrote: Daniel, thanks for the logs. an other workaround is to mpirun --mca coll ^hcoll ... i was able to reproduce the iss

Re: [OMPI devel] [OMPI users] simple mpi hello world segfaults when coll ml not disabled

2015-06-24 Thread Gilles Gouaillardet
file to blacklist the coll_ml module to ensure this is working. Mike and Mellanox folks, could you please comment on that ? Cheers, Gilles On 6/24/2015 5:23 PM, Daniel Letai wrote: Gilles, Attached the two output logs. Thanks, Daniel On 06/22/2015 08:08 AM, Gilles Gouaillardet wrote

Re: [OMPI devel] Regressions: MPI_Win_{start|post}() with MPI_GROUP_EMPTY

2015-06-22 Thread Gilles Gouaillardet
Lisandro, this is related to your previous report : some bugs were introduced when silencing zero size mallocs here is attached a patch (to be applied as well as the previous one) Cheers, Gilles On 6/23/2015 12:23 AM, Lisandro Dalcin wrote: The attached test code used to work in 1.8.5 and be

Re: [OMPI devel] Bug

2015-06-22 Thread Gilles Gouaillardet
on the other hand, i do not think as a community, we are interested by mpi4py bugs. i will let other folks comment on that. Cheers, Gilles On 6/23/2015 9:49 AM, Lisandro Dalcin wrote: On 22 June 2015 at 18:26, Gilles Gouaillardet wrote: if you still have the test program that can do that

Re: [OMPI devel] Bug

2015-06-22 Thread Gilles Gouaillardet
, Gilles Gouaillardet wrote: Lisandro, there was a regression in 1.8.6 with NBC and zero size messages. (ironically, the bug was introduced when silencing zero size malloc you reported in http://www.open-mpi.org/community/lists/devel/2015/05/17388.php the attached patch fixes the issue OK, I'l

Re: [OMPI devel] v2.0 branch has been created

2015-06-22 Thread Gilles Gouaillardet
#x27;s called 2.x because all the 2.x releases will come from there - not > just the 2.0.x releases. > > Sent from my phone. No type good. > > > On Jun 21, 2015, at 7:49 PM, Gilles Gouaillardet > wrote: > > > > Jeff, > > > > currently, the g

Re: [OMPI devel] Bug

2015-06-21 Thread Gilles Gouaillardet
Lisandro, there was a regression in 1.8.6 with NBC and zero size messages. (ironically, the bug was introduced when silencing zero size malloc you reported in http://www.open-mpi.org/community/lists/devel/2015/05/17388.php the attached patch fixes the issue in your initial report, you mention

Re: [OMPI devel] v2.0 branch has been created

2015-06-21 Thread Gilles Gouaillardet
Jeff, currently, the github "v2.0" branch is called "v2.x" was this intended ? Cheers, Gilles On 6/21/2015 2:00 AM, Jeff Squyres (jsquyres) wrote: The v2.0 branch has been created on the github ompi-release repo. Let the pull requests commence. Just so that we developers are on the same s

Re: [OMPI devel] Unused var in OB1 on master

2015-06-14 Thread Gilles Gouaillardet
Ralph and all, this is fixed at https://github.com/open-mpi/ompi/commit/ee3a1da28a3c018115bad82e0a9e7d1e04d35148 Cheers, Gilles On 6/14/2015 10:43 AM, Gilles Gouaillardet wrote: Will do tomorrow. proc is only used in heterogeneous mode, hence the warning On Sunday, June 14, 2015, Ralph

Re: [OMPI devel] Unused var in OB1 on master

2015-06-13 Thread Gilles Gouaillardet
Will do tomorrow. proc is only used in heterogeneous mode, hence the warning On Sunday, June 14, 2015, Ralph Castain wrote: > *pml_ob1_recvreq.c:* In function '*mca_pml_ob1_recv_request_put_frag*': > *pml_ob1_recvreq.c:397:18:* *warning: *unused variable '*proc*' > [-Wunused-variable] > omp

Re: [OMPI devel] RFC: standardize verbosity values

2015-06-08 Thread Gilles Gouaillardet
t; I’d also like to see us apply the same logic to the MCA param system. > Let’s just define ~4 named levels and get rid of the fine grained numbering. > > > On Jun 8, 2015, at 2:04 AM, Gilles Gouaillardet > wrote: > > Nathan, > > i think it is a good idea to use

Re: [OMPI devel] RFC: standardize verbosity values

2015-06-08 Thread Gilles Gouaillardet
Nathan, i think it is a good idea to use names vs numeric values for verbosity. what about using "a la" log4c verbosity names ? http://sourceforge.net/projects/log4c/ static const char* const priorities[] = { "FATAL", "ALERT", "CRIT", "ERROR", "WARN", "NOTICE", "INFO

Re: [OMPI devel] v1.8.6 release

2015-05-29 Thread Gilles Gouaillardet
> On May 29, 2015, at 8:11 AM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com > > wrote: > > Ralph, > > this is being discussed at https://github.com/open-mpi/ompi/pull/605 > > my latest solution is available at > https://github.com/ggouaillardet/ompi/comm

Re: [OMPI devel] v1.8.6 release

2015-05-29 Thread Gilles Gouaillardet
Ralph, this is being discussed at https://github.com/open-mpi/ompi/pull/605 my latest solution is available at https://github.com/ggouaillardet/ompi/commit/2a8ef01bad02b6c833c642d17d9a1140ea9292a4 the pr is a simple but temporary solution in which I introduced a new mca param, so if we decide th

Re: [OMPI devel] change in io_ompio.c

2015-05-27 Thread Gilles Gouaillardet
Edgar, i am sorry about that. i fixed some memory leaks (some memory was leaking in some error cases). i also moved (up) some malloc in order to group them and simplify the handling of error cases. per your comment, one move was incorrect indeed :-( Cheers, Gilles On 5/28/2015 12:14 PM, Ed

[OMPI devel] Jenkins and coverity logs

2015-05-25 Thread Gilles Gouaillardet
Mike, most coverity links reported by Jenkins are invalid for example https://github.com/open-mpi/ompi/pull/593 points to http://bgate.mellanox.com:/jenkins/job/gh-ompi-master-pr//ws/cov_build/all_535/output/errors/index.html which does not exist (any more) only the link of the most recen

<    1   2   3   4   5   6   7   8   9   >