Re: [OMPI devel] hcoll missing libsharp

2019-10-16 Thread Joshua Ladd via devel
Chris, libudev.so dependency is coming from upgraded hwloc v1.11.12 in HCOLL v4.3. This lib is part of systemd-libs rpm package. Josh On Wed, Oct 16, 2019 at 8:32 AM Chris Ward via devel < devel@lists.open-mpi.org> wrote: > I set up MOFED 4.7.1 , and now the configure complete successfully >

Re: [OMPI devel] hcoll missing libsharp

2019-10-15 Thread Joshua Ladd via devel
That is a VERY old MOFED (couple of years old.) We just released version 4.7. Josh On Tue, Oct 15, 2019 at 11:50 AM Chris Ward wrote: > I'm using a MOFED from file MLNX_OFED_LINUX-4.0-0.0.8.2-rhel7.3-x86_64.tgz > , this on a machine running RHEL 7.6 . Should I be using a newer MOFED ? > > > >

Re: [OMPI devel] hcoll missing libsharp

2019-10-15 Thread Joshua Ladd via devel
Chris, HCOLL depends on libsharp. What MOFED version are you using? What HCOLL version are you building against? Josh On Tue, Oct 15, 2019 at 11:36 AM Chris Ward via devel < devel@lists.open-mpi.org> wrote: > Setting LD_LIBRARY_PATH didn't help; I got the same error. > > Is the problem because

Re: [OMPI devel] Memory performance with Bcast

2019-03-21 Thread Joshua Ladd
Marcin, HPC-X implements the MPI BCAST operation by leveraging hardware multicast capabilities. Starting with HPC-X v2.3 we introduced a new multicast based algorithm for large messages as well. Hardware multicast scales as O(1) modulo switch hops. It is the most efficient way to broadcast a

Re: [OMPI devel] Mellanox Jenkins

2017-06-22 Thread Joshua Ladd
t of >> ignoring CI results, so no results are better than failed results :/. >> >> Brian >> >> > On Jun 21, 2017, at 1:49 PM, Jeff Squyres (jsquyres) < >> jsquy...@cisco.com> wrote: >> > >> > Thanks Josh. >> > >> >>

[OMPI devel] Mellanox Jenkins

2017-06-21 Thread Joshua Ladd
OMPI Developers, We are aware of the issue currently affecting the Mellanox Jenkins servers. The issue is being addressed and we hope it will be resolved soon. We apologize for the inconvenience and thank you for your patience. Best, Josh Ladd ___

Re: [OMPI devel] Openmpi support for Mellanox CX4-LX

2017-06-13 Thread Joshua Ladd
Hi, Please include your full command line. Josh On Mon, Jun 12, 2017 at 6:17 PM, Chuanxiong Guo wrote: > Hi, > > I have two servers with Mellanox CX4-LX (50GbE Ethernet) back-to-back > connected. I am using Ubuntu 14-04. I have made mvapich2 work, and I can > confirm

Re: [OMPI devel] OpenMPI 2.x: bug: violent break at beginning with (sequential) runs...

2016-09-15 Thread Joshua Ladd
scination with PMIx. PMIx didn’t calculate this > jobid - OMPI did. Yes, it is in the opal/pmix layer, but it had -nothing- > to do with PMIx. > > So why do you want to continue to blame PMIx for this problem?? > > > On Sep 15, 2016, at 4:29 AM, Joshua Ladd <jladd.m...@gmail.co

Re: [OMPI devel] OpenMPI 2.x: bug: violent break at beginning with (sequential) runs...

2016-09-15 Thread Joshua Ladd
>>> TMP=/tmp/tmp.wOv5dkNaSI >>> >>> and into $TMP I have: >>> >>> openmpi-sessions-40031@lorien_0 >>> >>> and into this subdirectory I have a bunch of empty dirs: >>> >>> cmpbib@lorien:/tmp/tmp.wOv5dkNaSI/openmpi-sess

Re: [OMPI devel] OpenMPI 2.x: bug: violent break at beginning with (sequential) runs...

2016-09-14 Thread Joshua Ladd
; > lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs > Output information may be incomplete. > lsof: WARNING: can't stat() tracefs file system /sys/kernel/debug/tracing > Output information may be incomplete. > > nothing... > > W

Re: [OMPI devel] OpenMPI 2.x: bug: violent break at beginning with (sequential) runs...

2016-09-14 Thread Joshua Ladd
Hi, Eric I **think** this might be related to the following: https://github.com/pmix/master/pull/145 I'm wondering if you can look into the /tmp directory and see if you have a bunch of stale usock files. Best, Josh On Wed, Sep 14, 2016 at 1:36 AM, Gilles Gouaillardet

Re: [OMPI devel] 2.0.0 is coming: what do we need to communicate to users?

2016-04-29 Thread Joshua Ladd
, Jeff Squyres (jsquyres) <jsquy...@cisco.com > wrote: > Do you guys want to add anything into NEWS about OSHMEM improvements in > 2.0.0 (even though it won't be 1.3)? > > Or were such improvements hidden down in UCX / MXM? > > > > On Apr 29, 2016, at 5:40 PM, Joshua Lad

[OMPI devel] 2.0.0 is coming: what do we need to communicate to users?

2016-04-29 Thread Joshua Ladd
ot remember if they were resolved. > >>>> > >>>> We may also want to clarify if any PML/MTLs are experimental in this > >>>> release. > >>>> > >>>> MPI_THREAD_MULTIPLE support. > >>>> > >>>> > >>>&g

Re: [OMPI devel] 2.0.0 is coming: what do we need to communicate to users?

2016-04-29 Thread Joshua Ladd
Certainly we need to communicate / advertise / evangelize the improvements in job launch - the largest and most substantial change between the two branches - and provide some best practice guidelines for usage (use direct modex for applications with sparse communication patterns and full modex for

Re: [OMPI devel] seg fault when using yalla, XRC, and yalla

2016-04-20 Thread Joshua Ladd
Hi, David We are looking into your report. Best, Josh On Tue, Apr 19, 2016 at 4:41 PM, David Shrader wrote: > Hello, > > I have been investigating using XRC on a cluster with a mellanox > interconnect. I have found that in a certain situation I get a seg fault. I > am

Re: [OMPI devel] RFC: set MCA param mpi_add_procs_cutoff default to 32

2016-02-04 Thread Joshua Ladd
+1 On Wed, Feb 3, 2016 at 9:54 PM, Jeff Squyres (jsquyres) wrote: > WHAT: Decrease default value of mpi_add_procs_cutoff from 1024 to 32 > > WHY: The "partial add procs" behavior is supposed to be a key feature of > v2.0.0 > > WHERE: ompi/mpi/runtime/ompi_mpi_params.c > >

Re: [OMPI devel] PMIX deadlock

2015-11-09 Thread Joshua Ladd
Thanks, Nysal!! Good catch! Josh On Mon, Nov 9, 2015 at 2:27 PM, Mark Santcroos wrote: > It seems the change suggested by Nysal also allows me to run into the next > problem ;-) > > Mark > > > On 09 Nov 2015, at 20:19 , George Bosilca wrote: >

Re: [OMPI devel] v1.10.1rc1 released

2015-10-03 Thread Joshua Ladd
This doesn't contain the three patches that we discussed on PR: https://github.com/open-mpi/ompi-release/pull/621 Josh On Sat, Oct 3, 2015 at 6:13 AM, Jeff Squyres (jsquyres) wrote: > v1.10.1 is primarily a bug-fix release. rc1 has been released; it's in > the usual place:

Re: [OMPI devel] Testing of "OMP_PROC_BIND value is invalid" errors

2015-07-01 Thread Joshua Ladd
oward's position regarding how/when/why the code had entered master. > > -Paul > > On Wed, Jul 1, 2015 at 3:10 AM, Joshua Ladd <jladd.m...@gmail.com> wrote: > >> Paul, >> >> I think your testing is extremely helpful. Even more so with this new >> ve

Re: [OMPI devel] Testing of "OMP_PROC_BIND value is invalid" errors

2015-07-01 Thread Joshua Ladd
Paul, I think your testing is extremely helpful. Even more so with this new versioning scheme. Setting OMP envars in ORTE should have been discussed. Considering both Paul and Howard (key members of our community) use OMP in production environments with Cray and PGI compilers, it seems a bit odd

Re: [OMPI devel] [OMPI users] simple mpi hello world segfaults when coll ml not disabled

2015-06-25 Thread Joshua Ladd
Thanks, Gilles. We are addressing this. Josh Sent from my iPhone > On Jun 25, 2015, at 11:03 AM, Gilles Gouaillardet wrote: > > Folks, > > this is a followup on an issue reported by Daniel on the users mailing list : > OpenMPI is built with hcoll from Mellanox. > the coll

[OMPI devel] Fwd: job post

2015-05-22 Thread Joshua Ladd
Dear Open MPI Community, I'd like to advertise multiple positions of particular relevance to this community. Please feel free to contact me directly or our US Hiring Manager, Scott Chong sco...@mellanox.com, if you or someone you know may be a good fit. Two open positions reporting to me. Can

Re: [OMPI devel] Tues Mar 3rd telecon

2015-02-27 Thread Joshua Ladd
I'm available, but am OK to skip Tuesday's call too. Josh On Thu, Feb 26, 2015 at 10:04 AM, Howard Pritchard wrote: > I will also be available but suggest we skip next Tuesday. > On Feb 25, 2015 5:04 PM, "Ralph Castain" wrote: > >> Hey folks >> >>

Re: [OMPI devel] OMPI devel] RoCE plus QDR IB tunable parameters

2015-02-25 Thread Joshua Ladd
You need to configure OMPI --with-mxm=/path/to/mxm in order to use Yalla. In addition, Yalla is only available on Master as it is a new feature. If you want to play with other PMLs in the release branch, you may try the MXM MTL (again, you first need to configure your build to use the MXM library)

[OMPI devel] MCA Aliases

2015-02-19 Thread Joshua Ladd
Folks, Is it possible to define an alias for an MCA parameter? Grepping around the interwebs, it seems there was an RFC along these lines in 2008. http://www.open-mpi.org/community/lists/devel/2008/04/3613.php It doesn't appear that the functionality was added or, if it was, it has since been

Re: [OMPI devel] FT code (again)

2014-12-19 Thread Joshua Ladd
George is correct; opal_pmix.fence replaces the grpcomm barrier. Josh On Fri, Dec 19, 2014 at 10:47 AM, George Bosilca wrote: > > A opal_pmix.fence seems like a perfect replacement. > > George. > > > On Fri, Dec 19, 2014 at 10:26 AM, Adrian Reber wrote:

Re: [OMPI devel] ofi/mtl causing problems

2014-12-17 Thread Joshua Ladd
ic included on the configure > line and still failed with the same problem, mtl/ofi thinks its okay to > build... > > Howard > > > 2014-12-17 11:48 GMT-07:00 Joshua Ladd <jladd.m...@gmail.com>: >> >> Seem to me this should be disabled by default until fol

Re: [OMPI devel] ofi/mtl causing problems

2014-12-17 Thread Joshua Ladd
Seem to me this should be disabled by default until folks can quiet the noise. If memory serves me, that's the position the community took with OSHMEM. Josh On Wed, Dec 17, 2014 at 1:40 PM, Howard Pritchard wrote: > > Jeff, > > I think the problem is that the libfabric

Re: [OMPI devel] [OMPI users] Warning about not enough registerable memory on SL6.6

2014-12-10 Thread Joshua Ladd
Window creation: MPI_Win_allocate > # Synchronization: MPI_Win_flush > # Size Bandwidth (MB/s) > 1 28.56 > 2 58.74 > > > So it wasn't fixed for RHEL 6.6. > > Regards, Götz > > On Mon, Dec 8, 2014 at 4:00 PM, Götz Waschk &l

Re: [OMPI devel] MTT diligence

2014-11-10 Thread Joshua Ladd
anytime soon. In the meantime, Alina will continue to diligently monitor MTT and report issues along with offending commits. Best, Josh On Sat, Nov 8, 2014 at 11:53 AM, Joshua Ladd <jladd.m...@gmail.com> wrote: > Alina, > > Please take the lead on this and respond to Ralph's query. Wor

Re: [OMPI devel] MTT diligence

2014-11-08 Thread Joshua Ladd
, this is a good idea. Josh On Thu, Nov 6, 2014 at 5:00 PM, Ralph Castain <rhc.open...@gmail.com> wrote: > > > On Nov 6, 2014, at 1:51 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> > wrote: > > > > On Nov 6, 2014, at 4:06 PM, Joshua Ladd <jladd.m...@gmail.com&g

Re: [OMPI devel] RFC: revamp btl rdma interface

2014-11-06 Thread Joshua Ladd
tup of Gemini they can not be > > mixed. If > > > it is possible to mix them with other networks I would be > happy > > to add > > > an atomic flag for that. > > > > > > -Nathan &

Re: [OMPI devel] osu_mbw_mr error

2014-11-06 Thread Joshua Ladd
On Thursday, November 6, 2014, Nathan Hjelm <hje...@lanl.gov> wrote: > On Thu, Nov 06, 2014 at 04:06:23PM -0500, Joshua Ladd wrote: > >Nathan, > >Has this bug always been present in OpenIB or is this a recent > addition? > >If this is regression, I

[OMPI devel] osu_mbw_mr error

2014-11-06 Thread Joshua Ladd
in production systems, this issue was never discovered. Once again, many thanks to Alina for discovering and reporting this. Keep up the MTT vigilance! Josh On Tuesday, November 4, 2014, Joshua Ladd <jladd.m...@gmail.com <javascript:_e(%7B%7D,'cvml','jladd.m...@gmail.com');>> wrote: >

Re: [OMPI devel] thread-tests hang

2014-11-06 Thread Joshua Ladd
Thank you for taking the time to investigate this, Jeff. SC is a hectic and stressful time for everyone on this list with many deadlines looming. This bug isn't a priority for us, however, it seems to me that your original revert, the one that simply wants to disable threading by default (and for

Re: [OMPI devel] Prepping for 1.8.4 release

2014-11-06 Thread Joshua Ladd
We filed an RFC for the trunk at Jeff's request. This is a new feature. Josh On Thu, Nov 6, 2014 at 12:13 PM, Joshua Ladd <jladd.m...@gmail.com> wrote: > Yalla is only in trunk. Unless you want us to push it to 1.8.4 - we won't > object :) > > Josh > > On Thu, Nov 6,

Re: [OMPI devel] Prepping for 1.8.4 release

2014-11-06 Thread Joshua Ladd
Yalla is only in trunk. Unless you want us to push it to 1.8.4 - we won't object :) Josh On Thu, Nov 6, 2014 at 11:46 AM, Ralph Castain wrote: > Hey folks > > Here is the NEWS I have for 1.8.4 so far - please respond with any > additions/mods you would like to suggest >

Re: [OMPI devel] thread-tests hang

2014-11-05 Thread Joshua Ladd
I think this is a pretty significant change in behavior for a minor release, Jeff. According to the interested parties: "I'm reporting a performance (message rate 16%, latency 3%) regression when using PSM that occurred between OMPI v1.6.5 and v1.8.1. I would guess it affects other networks too,

Re: [OMPI devel] RFC: revamp btl rdma interface

2014-11-05 Thread Joshua Ladd
omics > have to be done through the same btl (including atomics on self). I did > this because with the default setup of Gemini they can not be mixed. If > it is possible to mix them with other networks I would be happy to add > an atomic flag for that. > > -Nathan > > On Wed, Nov

Re: [OMPI devel] RFC: revamp btl rdma interface

2014-11-05 Thread Joshua Ladd
Quick question. Out of curiosity, how do you handle the (common) case of mixing network atomics with CPU atomics? Say for a single target with two initiators, one initiator is on host with the target, so goes through the SM BTL, and the other initiator is off host, so goes through the network BTL.

Re: [OMPI devel] osu_mbw_mr error

2014-11-04 Thread Joshua Ladd
Thanks, Nathan. After a bit more investigation yesterday, this was our conclusion too; that it is a longstanding bug in OpenIB BTL we just happened to start triggering the broken flow with some recent changes made to the default max_lmc parameter. Let us know if you need anything from our end.

Re: [OMPI devel] OMPI BCOL hang with PMI1

2014-10-22 Thread Joshua Ladd
Privet, Artem ML is the collective component that is invoking the calls into BCOL. The triplet basesmuma,basesmuma,ptpcoll, for example, means I want three levels of hierarchy - socket level, UMA level, and then network level. I am guessing (only a guess after a quick glance) that maybe srun is

Re: [OMPI devel] [OMPI bugs] [Open MPI] #4919: Fix the application abort routine so we actually abort

2014-09-25 Thread Joshua Ladd
@iivanov I am looking into a fix. On Thu, Sep 25, 2014 at 11:42 AM, Open MPI wrote: > #4919: Fix the application abort routine so we actually abort > ---+- > Reporter: rhc | Owner:

Re: [OMPI devel] race condition in grpcomm/rcd

2014-09-12 Thread Joshua Ladd
Let me know if Nadia can help here, Ralph. Josh On Fri, Sep 12, 2014 at 9:31 AM, Ralph Castain wrote: > > On Sep 12, 2014, at 5:45 AM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com> wrote: > > Ralph, > > On Fri, Sep 12, 2014 at 10:54 AM, Ralph Castain

Re: [OMPI devel] Need to know your Github ID

2014-09-10 Thread Joshua Ladd
jladd -> jladd-mlnx On Wed, Sep 10, 2014 at 8:45 AM, Shamis, Pavel wrote: > Jeff, > pasha -> shamisp > > > -Original Message- > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff > > Squyres (jsquyres) > > Sent: Wednesday, September 10, 2014 6:46 AM >

Re: [OMPI devel] [1.8.2rc4] OSHMEM fortran bindings with bad compilers

2014-08-14 Thread Joshua Ladd
We will update the README accordingly. Thank you, Paul. Josh On Thu, Aug 14, 2014 at 10:00 AM, Jeff Squyres (jsquyres) < jsquy...@cisco.com> wrote: > Good points. > > Mellanox -- can you update per Paul's suggestions? > > > On Aug 13, 2014, at 8:26 PM, Paul Hargrove wrote:

Re: [OMPI devel] [OMPI users] OpenMPI fails with np > 65

2014-08-13 Thread Joshua Ladd
> Thanks. > > *Lenny Verkhovsky* > > SW Engineer, Mellanox Technologies > > www.mellanox.com > > > > Office:+972 74 712 9244 > > Mobile: +972 54 554 0233 > > Fax:+972 72 257 9400 > > > > *From:* devel [mailto:devel-boun...@open-mpi.org] *On Beha

Re: [OMPI devel] [OMPI users] OpenMPI fails with np > 65

2014-08-13 Thread Joshua Ladd
Lenny, Is there any particular reason that you're using the trunk? The reason I ask is because the trunk is in an unusually high state of flux at the moment with a major move underway. If you're trying to use OMPI for production grade runs, I would strongly advise picking up one of the stable

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32346 - trunk/opal/mca/btl/openib

2014-07-29 Thread Joshua Ladd
Nathan, can you take a look at https://svn.open-mpi.org/trac/ompi/changeset/32350 when you get a chance. Thanks, Josh On Tue, Jul 29, 2014 at 6:14 PM, Nathan Hjelm wrote: > On Tue, Jul 29, 2014 at 04:12:18PM -0600, Nathan Hjelm wrote: > > > > Yeah. Though it would be best to

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32346 - trunk/opal/mca/btl/openib

2014-07-29 Thread Joshua Ladd
s"); > > mca_base_var_get_value (vari, NULL, , NULL); > > If the source is MCA_BASE_VAR_SOURCE_DEFAULT then the value was not > modified by a file, the enviornment, or MPI_T. > > -Nathan > > On Tue, Jul 29, 2014 at 05:42:20PM -0400, svn-commit-mai...@open-mpi.org > wrote:

Re: [OMPI devel] btl_openib_receive_queues mca param not always taken into account

2014-07-29 Thread Joshua Ladd
Hi, Nadia I CMRed your patch to 1.8.2 and applied the fix on the trunk in: https://svn.open-mpi.org/trac/ompi/changeset/32346 Thanks for reporting! Josh On Fri, Jul 11, 2014 at 6:04 AM, Nadia Derbey wrote: > Hi, > > I noticed that specifying the receive_queues

Re: [OMPI devel] SHMEM symmetric objects in shared libraries

2014-07-29 Thread Joshua Ladd
Pasha, Is v1.1 posted somewhere? I don't see it up on the LBNL site. Josh On Tue, Jul 29, 2014 at 2:05 PM, Shamis, Pavel wrote: > > Btw, I'm pretty confident, that this Open SHMEM implementation does > not > recognize global or static variables in shared

Re: [OMPI devel] SHMEM symmetric objects in shared libraries

2014-07-29 Thread Joshua Ladd
t; > mechanism used for the data segment of the executable. > > > > Howard > > > > > > *From:* devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *Joshua > Ladd > *Sent:* Tuesday, July 29, 2014 10:57 AM > *To:* Open MPI Developers > *Subject:* Re: [OMPI d

Re: [OMPI devel] SHMEM symmetric objects in shared libraries

2014-07-29 Thread Joshua Ladd
Are you claiming that in the following test, the static variable "val" will not be seen as a symmetric object? #include "shmem.h" int main( int argc, char **argv){ long my_pe, npes, master; start_pes(0); my_pe = shmem_my_pe(); npes = shmem_n_pes(); master = npes - 1; /*

Re: [OMPI devel] Annual SVN account maintenance

2014-07-23 Thread Joshua Ladd
lin, Mellanox > > alinas -> Alina Sklarevich, Mellanox > > amikheev -> Alex Mikheev, Mellanox > > bosilca -> George Bosilca, UTK > > brbarret -> Brian Barrett, IU, LANL, SNL > > devendar -> Devendar Bureddy, Mellanox > > dgoodell -> Dave Goodell, Cisco > &g

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun

2014-07-16 Thread Joshua Ladd
*An enhancement to permit some form of delimiter escaping would probably still be nice, but is low priority.* [Josh] Not a problem, Dave. We will do this. On Wed, Jul 16, 2014 at 4:32 PM, Dave Goodell (dgoodell) <dgood...@cisco.com > wrote: > On Jul 16, 2014, at 3:08 PM, Joshua Ladd

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun

2014-07-16 Thread Joshua Ladd
Ralph warned me that no matter what decision we made, someone would probably violently object. So, with that in mind, let me put my diplomat hat on... Dave, I'm sorry you view this as a "crapification" of your mpirun user interface. Your lament is duly noted and we are happy to work with you to

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun

2014-07-16 Thread Joshua Ladd
Dave, Your example will error out. If someone tries to set envars with both mechanism, the job fails. The decision to do so was also made at the Dev meeting and is so that we don't have to do this kind of checking. Josh On Wed, Jul 16, 2014 at 12:22 PM, Dave Goodell (dgoodell) <

Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-15 Thread Joshua Ladd
According to http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html *"constructor * * destructor * * constructor (*priority*)** destructor (priority)**The constructor attribute causes the function to be called automatically before execution enters main (). Similarly, the destructor attribute

Re: [OMPI devel] RFC: add STCI component to OMPI/RTE framework

2014-05-29 Thread Joshua Ladd
+1 I'm interested in hearing more. RTE is of interest. Josh On Thu, May 29, 2014 at 10:33 AM, Ralph Castain wrote: > +1 for me! > > On May 29, 2014, at 7:26 AM, Thomas Naughton wrote: > > > Hi, > > > > Thanks Jeff, I think that was a pretty good summary

Re: [OMPI devel] OMPI v1.8.x git tags?

2014-05-12 Thread Joshua Ladd
Yes. Will look into it. Josh On Mon, May 12, 2014 at 6:01 PM, Jeff Squyres (jsquyres) wrote: > Ah; I guess the tags aren't getting pulled over. > > Mellanox -- can you check into this? > > > > On May 12, 2014, at 5:52 PM, "Friedley, Andrew" >

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-08 Thread Joshua Ladd
Chris, The necessary packages will be supported and available in community OFED. Josh On Thu, May 8, 2014 at 9:23 AM, Chris Samuel <sam...@unimelb.edu.au> wrote: > On Thu, 8 May 2014 09:10:00 AM Joshua Ladd wrote: > > > We (MLNX) are working on a new SLURM PMI2 p

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-08 Thread Joshua Ladd
stream? > -Adam > > > -- > *From:* devel [devel-boun...@open-mpi.org] on behalf of Joshua Ladd [ > jladd.m...@gmail.com] > *Sent:* Wednesday, May 07, 2014 7:56 AM > *To:* Open MPI Developers > > *Subject:* Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is &g

Re: [OMPI devel] RFC: Remove autogen.sh sym link

2014-05-08 Thread Joshua Ladd
+1 On Thu, May 8, 2014 at 6:08 AM, Jeff Squyres (jsquyres) wrote: > WHAT: Remove the backwards-compatibility autogen.sh sym link > > WHY: Because it's time > > WHERE: svn rm autogen.sh > > TIMEOUT: Teleconf next Tuesday, 13 May 2014 > > MORE DETAIL: > > We converted from

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Joshua Ladd
MI-2 version and should try PMI-1. >> >> Make sense? >> Ralph >> >> On May 7, 2014, at 8:00 AM, Ralph Castain <r...@open-mpi.org> wrote: >> >> >> On May 7, 2014, at 7:56 AM, Joshua Ladd <jladd.m...@gmail.com> wrote: >> >> Ah, I see

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Joshua Ladd
given that both you and Chris appear to prefer to keep it > "on-by-default", we'll see if we can find a way to detect that PMI-2 is > broken and then fall back to PMI-1. > > > On May 7, 2014, at 7:39 AM, Joshua Ladd <jladd.m...@gmail.com> wrote: > > Just saw thi

Re: [OMPI devel] regression with derived datatypes

2014-05-07 Thread Joshua Ladd
Rolf, This was run on a Sandy Bridge system with ConnectX-3 cards. Josh On Wed, May 7, 2014 at 10:46 AM, Joshua Ladd <jladd.m...@gmail.com> wrote: > Elena, can you run your reproducer on the trunk, please, and see if the > problem persists? > > Josh > > > On Wed, Ma

Re: [OMPI devel] regression with derived datatypes

2014-05-07 Thread Joshua Ladd
Elena, can you run your reproducer on the trunk, please, and see if the problem persists? Josh On Wed, May 7, 2014 at 10:26 AM, Jeff Squyres (jsquyres) wrote: > On May 7, 2014, at 10:03 AM, Elena Elkina wrote: > > > Yes, this commit is also in

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Joshua Ladd
Just saw this thread, and I second Chris' observations: at scale we are seeing huge gains in jobstart performance with PMI2 over PMI1. We *CANNOT*loose this functionality. For competitive reasons, I cannot provide exact numbers, but let's say the difference is in the ballpark of a full

Re: [OMPI devel] [OMPI bugs] [Open MPI] #4582: Move r31564 to v1.8 branch (OSHMEM: Added missing API for)

2014-04-30 Thread Joshua Ladd
Wait, this can simply have the milestone changed then, right? On Wed, Apr 30, 2014 at 9:46 AM, Open MPI wrote: > #4582: Move r31564 to v1.8 branch (OSHMEM: Added missing API for) > ---+- > Reporter: miked

Re: [OMPI devel] RFC: Remove heterogeneous support

2014-04-30 Thread Joshua Ladd
Hi, OMPI Community On the call yesterday, Ralph and Jeff posed the question to the Community at large and to NVIDIA in particular if they/we/us have a vested interest in heterogeneous support. Mellanox and NVIDIA are partnering on systems that, on today's roadmap, could require heterogeneous

Re: [OMPI devel] Was "hcoll destruction via MPI attribute": undefined symbol hcoll_group_destroy_notify

2014-04-08 Thread Joshua Ladd
In order to run with OMPI 1.8, we need to get you the latest HCOLL drop. Mike Dubman can handle this for you. He will be back in the office Thursday sometime. Best, Josh From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Anthony Alba Sent: Tuesday, April 08, 2014 9:59 PM To:

Re: [OMPI devel] Seeking input for an RFC

2014-04-02 Thread Joshua Ladd
info. Best, Josh From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Joshua Ladd Sent: Tuesday, April 01, 2014 11:15 AM To: Open MPI Developers (de...@open-mpi.org) Subject: [OMPI devel] Seeking input for an RFC Soliciting input from the community: WHAT: Modify PML cm component

[OMPI devel] Seeking input for an RFC

2014-04-01 Thread Joshua Ladd
Soliciting input from the community: WHAT: Modify PML cm component to remove unnecessary initializations, optimizing blocking operations WHY:Remove overhead in fast-path by allowing a "direct mode" increases single packet latency HOW:In PML cm, even if the request starts and ends

Re: [OMPI devel] 答复: 答复: doubt on latency result with OpenMPI library

2014-03-28 Thread Joshua Ladd
I also believe that for iWARP and RoCE, the RDMA CM will be chosen automatically, and UD CM will be automatically chosen for IB. [Josh] If you want to run OMPI over RoCE on Mellanox hardware, you must explicitly choose rdmacm with -mca btl openib,sm,self -mca btl_openib_cpc_include rdmacm -

Re: [OMPI devel] [OMPI bugs] [Open MPI] #4354: Move r30966 to v1.7 branch (In mtl_mxm, don't disconnect from)

2014-03-11 Thread Joshua Ladd
Yossi, is it possible to handle this with OBJ_RELEASE? -Original Message- From: bugs [mailto:bugs-boun...@open-mpi.org] On Behalf Of Open MPI Sent: Monday, March 10, 2014 12:22 PM Cc: b...@open-mpi.org Subject: Re: [OMPI bugs] [Open MPI] #4354: Move r30966 to v1.7 branch (In mtl_mxm,

Re: [OMPI devel] Trunk is broken

2014-02-25 Thread Joshua Ladd
Fresh checkout did the trick. Sorry to bother. Josh From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Joshua Ladd Sent: Tuesday, February 25, 2014 6:51 PM To: Open MPI Developers Subject: Re: [OMPI devel] Trunk is broken SVN up. Reran autogen. I'm trying with a fresh checkout now

Re: [OMPI devel] Trunk is broken

2014-02-25 Thread Joshua Ladd
.org] On Behalf Of Ralph Castain Sent: Tuesday, February 25, 2014 6:17 PM To: Open MPI Developers Subject: Re: [OMPI devel] Trunk is broken Odd - it is building fine for me on both Mac and Linux. Is this a git mirror or the actual svn checkout, or a tarball? On Feb 25, 2014, at 3:1

Re: [OMPI devel] Trunk is broken

2014-02-25 Thread Joshua Ladd
Developers Subject: Re: [OMPI devel] Trunk is broken Odd - it is building fine for me on both Mac and Linux. Is this a git mirror or the actual svn checkout, or a tarball? On Feb 25, 2014, at 3:11 PM, Joshua Ladd <josh...@mellanox.com<mailto:josh...@mellanox.com>> wrote: Ralph, may

[OMPI devel] Trunk is broken

2014-02-25 Thread Joshua Ladd
Ralph, maybe something didn't get pulled over in your OSC merge: Looks like a few routines were removed and were not replaced or were not removed from other parts of the code where they are invoked inompi/mpi/c/profile/paccumulate.c The offending change set is:

Re: [OMPI devel] RFC: Add an OPAL rand and srand

2014-02-07 Thread Joshua Ladd
Fri, Feb 7, 2014 at 2:23 PM, Joshua Ladd <josh...@mellanox.com<mailto:josh...@mellanox.com>> wrote: What: Add an internal random number generator to OPAL. Why: OMPI uses rand and srand all over the place. Because the middleware is mucking with the RNG's global state, applications

Re: [OMPI devel] [OMPI svn] svn:open-mpi r30571 - trunk/ompi/runtime

2014-02-06 Thread Joshua Ladd
It's been CMRed, but scheduled for 1.7.5 https://svn.open-mpi.org/trac/ompi/ticket/4185 From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Mike Dubman Sent: Thursday, February 06, 2014 12:17 PM To: Open MPI Developers Subject: Re: [OMPI devel] [OMPI svn] svn:open-mpi r30571 -

Re: [OMPI devel] hcoll destruction via MPI attribute

2014-01-09 Thread Joshua Ladd
Subject: Re: [OMPI devel] hcoll destruction via MPI attribute On Jan 9, 2014, at 11:00 AM, Joshua Ladd <josh...@mellanox.com> wrote: > Hcoll uses the PML as an "OOB" to bootstrap itself. When a communicator is > destroyed, by the time we destroy the hcoll module, th

Re: [OMPI devel] hcoll destruction via MPI attribute

2014-01-09 Thread Joshua Ladd
+Valentine Jeff, Hcoll uses the PML as an "OOB" to bootstrap itself. When a communicator is destroyed, by the time we destroy the hcoll module, the communicator context is no longer valid and any pending operations that rely on its existence will fail. In particular, we have a non-blocking

Re: [OMPI devel] bug in mca framework?

2013-12-17 Thread Joshua Ladd
Hjelm Sent: Monday, December 16, 2013 12:44 PM To: Open MPI Developers Subject: Re: [OMPI devel] bug in mca framework? On Mon, Dec 16, 2013 at 05:21:05PM +, Joshua Ladd wrote: > After speaking with Igor Ivanov about this this morning, he summarized his > findings as follows: >

Re: [OMPI devel] bug in mca framework?

2013-12-16 Thread Joshua Ladd
After speaking with Igor Ivanov about this this morning, he summarized his findings as follows: 1. Valgrind comes up clean. 2. The issue is not reproduced with a static build. 3. A bisection study reveals that problems first appear after commit:

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Joshua Ladd
The proof of the pudding is that all of the MPI layer has been adapted to the new async behavior -except- for the openib cpc's. The issue of what to do with these has been raised several times, especially once the ofacm code was committed. Unfortunately, lack of time and priorities left this

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Joshua Ladd
wba...@sandia.gov<mailto:bwba...@sandia.gov>> wrote: On 11/14/13 1:13 PM, "Joshua Ladd" <josh...@mellanox.com<mailto:josh...@mellanox.com>> wrote: Let me try to summarize my understanding of the situation: 1. Ralph made the OOB asynchronous. 2. OOB cpcs don't work

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Joshua Ladd
ad of OPENIB. This made openib initialization code a bit cleaner. Here is my old tree with openib btl changes https://bitbucket.org/pasha/ofacm I hope it helps, Best, Pasha On Nov 14, 2013, at 1:17 PM, Joshua Ladd <josh...@mellanox.com> wrote: > Unless someone went in and "fi

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Joshua Ladd
finally complete the switchover. >>> >>> Meantime, perhaps someone can CMR and review a copying of the udcm >>> cpc to the 1.7 branch? >>> >>> >>> On Nov 14, 2013, at 5:14 AM, Joshua Ladd <josh...@mellanox.com> wrote: >>> >

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Joshua Ladd
code. Looking over > at that area, I see only oob and xoob - so if the users of the common ofacm > code are finding that it works, the simple answer may just be to finally > complete the switchover. > > Meantime, perhaps someone can CMR and review a copying of the udcm cpc to the &

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Joshua Ladd
Um, no. It's supposed to work with UDCM which doesn't appear to be enabled in 1.7. Per Ralph's comment to me last night: "... you cannot use the oob connection manager. It doesn't work and was deprecated. You must use udcm, which is why things are supposed to be set to do so by default.

Re: [OMPI devel] [OMPI svn] svn:open-mpi r29644 - trunk/orte/mca/rmaps/mindist

2013-11-08 Thread Joshua Ladd
can't do that, Josh. You are violating the abstraction break rather badly by searching for specific IB devices down in ORTE. Please revert this and let's talk about what you are actually trying to do. On Nov 7, 2013, at 8:28 PM, svn-commit-mai...@open-mpi.org wrote: > Author: jladd (Jos

Re: [OMPI devel] [EXTERNAL] Re: oshmem and CFLAGS removal

2013-10-31 Thread Joshua Ladd
AGS="$oshmem_CFLAGS" >> > >Nope, it was not that simple. With that change, the -pedantic and >-Wundef end up in the CFLAGS for oshmem and I see all the warnings. >I will submit a ticket and give it to Joshua Ladd. Yeah, that's not going to work. But a bigger question: why

Re: [OMPI devel] oshmem and CFLAGS removal

2013-10-31 Thread Joshua Ladd
| sed 's/-Wno-long- >double//g'`" > >I think the solution is simple -- delete this line: > >> CFLAGS="$oshmem_CFLAGS" > Nope, it was not that simple. With that change, the -pedanti

Re: [OMPI devel] CM PML / OpenSHMEM

2013-10-29 Thread Joshua Ladd
Sent: Tuesday, October 29, 2013 1:58 PM To: Open MPI Developers Subject: Re: [OMPI devel] CM PML / OpenSHMEM Did that time get finalized? I recall the doodle, but not seeing a final decision On Oct 29, 2013, at 10:53 AM, Joshua Ladd <josh...@mellanox.com> wrote: > These (and others) ar

Re: [OMPI devel] CM PML / OpenSHMEM

2013-10-29 Thread Joshua Ladd
These (and others) are exactly the issues we need to discuss with you guys next week. Josh -Original Message- From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Tuesday, October 29, 2013 1:29 PM To: Open MPI Developers Subject: Re: [OMPI devel] CM PML /

Re: [OMPI devel] [EXTERNAL] Re: SHMEM v1.7 merge proposal

2013-10-29 Thread Joshua Ladd
I think the community’s concerns are valid. What Mike is articulating is that we already maintain a “1.7 ready” OSHMEM branch internally. I think it should be a simple procedure to do as Brian and Ralph are suggesting and branch off of 1.7 in SVN and apply our patches. We can do this. Josh

Re: [OMPI devel] SHMEM v1.7 merge proposal

2013-10-29 Thread Joshua Ladd
I wondered where that was coming from. -Original Message- From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres (jsquyres) Sent: Tuesday, October 29, 2013 7:53 AM To: Open MPI Developers Subject: Re: [OMPI devel] SHMEM v1.7 merge proposal On Oct 29, 2013, at 7:16

[OMPI devel] Trunk is broken

2013-10-01 Thread Joshua Ladd
Also getting a compile failure in the trunk: ./autogen.pl && ./configure --prefix=/hpc/home/USERS/joshual/ompi_trunk/really-the-trunk/ompi-install --with-mxm=/hpc/local/src/mxm2_release --with-fca=/opt/mellanox/fca --with-pmi && make -j 9 && make install CC ess_slurm_module.lo CCLD

  1   2   >