Chris,
libudev.so dependency is coming from upgraded hwloc v1.11.12 in HCOLL v4.3.
This lib is part of systemd-libs rpm package.
Josh
On Wed, Oct 16, 2019 at 8:32 AM Chris Ward via devel <
devel@lists.open-mpi.org> wrote:
> I set up MOFED 4.7.1 , and now the configure complete successfully
>
That is a VERY old MOFED (couple of years old.) We just released version
4.7.
Josh
On Tue, Oct 15, 2019 at 11:50 AM Chris Ward wrote:
> I'm using a MOFED from file MLNX_OFED_LINUX-4.0-0.0.8.2-rhel7.3-x86_64.tgz
> , this on a machine running RHEL 7.6 . Should I be using a newer MOFED ?
>
>
>
>
Chris,
HCOLL depends on libsharp. What MOFED version are you using? What HCOLL
version are you building against?
Josh
On Tue, Oct 15, 2019 at 11:36 AM Chris Ward via devel <
devel@lists.open-mpi.org> wrote:
> Setting LD_LIBRARY_PATH didn't help; I got the same error.
>
> Is the problem because
Marcin,
HPC-X implements the MPI BCAST operation by leveraging hardware multicast
capabilities. Starting with HPC-X v2.3 we introduced a new multicast based
algorithm for large messages as well. Hardware multicast scales as O(1)
modulo switch hops. It is the most efficient way to broadcast a
t of
>> ignoring CI results, so no results are better than failed results :/.
>>
>> Brian
>>
>> > On Jun 21, 2017, at 1:49 PM, Jeff Squyres (jsquyres) <
>> jsquy...@cisco.com> wrote:
>> >
>> > Thanks Josh.
>> >
>> >>
OMPI Developers,
We are aware of the issue currently affecting the Mellanox Jenkins servers.
The issue is being addressed and we hope it will be resolved soon. We
apologize for the inconvenience and thank you for your patience.
Best,
Josh Ladd
___
Hi,
Please include your full command line.
Josh
On Mon, Jun 12, 2017 at 6:17 PM, Chuanxiong Guo
wrote:
> Hi,
>
> I have two servers with Mellanox CX4-LX (50GbE Ethernet) back-to-back
> connected. I am using Ubuntu 14-04. I have made mvapich2 work, and I can
> confirm
scination with PMIx. PMIx didn’t calculate this
> jobid - OMPI did. Yes, it is in the opal/pmix layer, but it had -nothing-
> to do with PMIx.
>
> So why do you want to continue to blame PMIx for this problem??
>
>
> On Sep 15, 2016, at 4:29 AM, Joshua Ladd <jladd.m...@gmail.co
>>> TMP=/tmp/tmp.wOv5dkNaSI
>>>
>>> and into $TMP I have:
>>>
>>> openmpi-sessions-40031@lorien_0
>>>
>>> and into this subdirectory I have a bunch of empty dirs:
>>>
>>> cmpbib@lorien:/tmp/tmp.wOv5dkNaSI/openmpi-sess
;
> lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
> Output information may be incomplete.
> lsof: WARNING: can't stat() tracefs file system /sys/kernel/debug/tracing
> Output information may be incomplete.
>
> nothing...
>
> W
Hi, Eric
I **think** this might be related to the following:
https://github.com/pmix/master/pull/145
I'm wondering if you can look into the /tmp directory and see if you have a
bunch of stale usock files.
Best,
Josh
On Wed, Sep 14, 2016 at 1:36 AM, Gilles Gouaillardet
, Jeff Squyres (jsquyres) <jsquy...@cisco.com
> wrote:
> Do you guys want to add anything into NEWS about OSHMEM improvements in
> 2.0.0 (even though it won't be 1.3)?
>
> Or were such improvements hidden down in UCX / MXM?
>
>
> > On Apr 29, 2016, at 5:40 PM, Joshua Lad
ot remember if they were resolved.
> >>>>
> >>>> We may also want to clarify if any PML/MTLs are experimental in this
> >>>> release.
> >>>>
> >>>> MPI_THREAD_MULTIPLE support.
> >>>>
> >>>>
> >>>&g
Certainly we need to communicate / advertise / evangelize the improvements
in job launch - the largest and most substantial change between the two
branches - and provide some best practice guidelines for usage (use direct
modex for applications with sparse communication patterns and full modex
for
Hi, David
We are looking into your report.
Best,
Josh
On Tue, Apr 19, 2016 at 4:41 PM, David Shrader wrote:
> Hello,
>
> I have been investigating using XRC on a cluster with a mellanox
> interconnect. I have found that in a certain situation I get a seg fault. I
> am
+1
On Wed, Feb 3, 2016 at 9:54 PM, Jeff Squyres (jsquyres)
wrote:
> WHAT: Decrease default value of mpi_add_procs_cutoff from 1024 to 32
>
> WHY: The "partial add procs" behavior is supposed to be a key feature of
> v2.0.0
>
> WHERE: ompi/mpi/runtime/ompi_mpi_params.c
>
>
Thanks, Nysal!! Good catch!
Josh
On Mon, Nov 9, 2015 at 2:27 PM, Mark Santcroos
wrote:
> It seems the change suggested by Nysal also allows me to run into the next
> problem ;-)
>
> Mark
>
> > On 09 Nov 2015, at 20:19 , George Bosilca wrote:
>
This doesn't contain the three patches that we discussed on PR:
https://github.com/open-mpi/ompi-release/pull/621
Josh
On Sat, Oct 3, 2015 at 6:13 AM, Jeff Squyres (jsquyres)
wrote:
> v1.10.1 is primarily a bug-fix release. rc1 has been released; it's in
> the usual place:
oward's position regarding how/when/why the code had entered master.
>
> -Paul
>
> On Wed, Jul 1, 2015 at 3:10 AM, Joshua Ladd <jladd.m...@gmail.com> wrote:
>
>> Paul,
>>
>> I think your testing is extremely helpful. Even more so with this new
>> ve
Paul,
I think your testing is extremely helpful. Even more so with this new
versioning scheme.
Setting OMP envars in ORTE should have been discussed. Considering both
Paul and Howard (key members of our community) use OMP in production
environments with Cray and PGI compilers, it seems a bit odd
Thanks, Gilles.
We are addressing this.
Josh
Sent from my iPhone
> On Jun 25, 2015, at 11:03 AM, Gilles Gouaillardet wrote:
>
> Folks,
>
> this is a followup on an issue reported by Daniel on the users mailing list :
> OpenMPI is built with hcoll from Mellanox.
> the coll
Dear Open MPI Community,
I'd like to advertise multiple positions of particular relevance to this
community. Please feel free to contact me directly or our US Hiring
Manager, Scott Chong sco...@mellanox.com, if you or someone you know may
be a good fit.
Two open positions reporting to me. Can
I'm available, but am OK to skip Tuesday's call too.
Josh
On Thu, Feb 26, 2015 at 10:04 AM, Howard Pritchard
wrote:
> I will also be available but suggest we skip next Tuesday.
> On Feb 25, 2015 5:04 PM, "Ralph Castain" wrote:
>
>> Hey folks
>>
>>
You need to configure OMPI --with-mxm=/path/to/mxm in order to use Yalla.
In addition, Yalla is only available on Master as it is a new feature. If
you want to play with other PMLs in the release branch, you may try the MXM
MTL (again, you first need to configure your build to use the MXM library)
Folks,
Is it possible to define an alias for an MCA parameter? Grepping around the
interwebs, it seems there was an RFC along these lines in 2008.
http://www.open-mpi.org/community/lists/devel/2008/04/3613.php
It doesn't appear that the functionality was added or, if it was, it has
since been
George is correct; opal_pmix.fence replaces the grpcomm barrier.
Josh
On Fri, Dec 19, 2014 at 10:47 AM, George Bosilca
wrote:
>
> A opal_pmix.fence seems like a perfect replacement.
>
> George.
>
>
> On Fri, Dec 19, 2014 at 10:26 AM, Adrian Reber wrote:
ic included on the configure
> line and still failed with the same problem, mtl/ofi thinks its okay to
> build...
>
> Howard
>
>
> 2014-12-17 11:48 GMT-07:00 Joshua Ladd <jladd.m...@gmail.com>:
>>
>> Seem to me this should be disabled by default until fol
Seem to me this should be disabled by default until folks can quiet the
noise. If memory serves me, that's the position the community took with
OSHMEM.
Josh
On Wed, Dec 17, 2014 at 1:40 PM, Howard Pritchard
wrote:
>
> Jeff,
>
> I think the problem is that the libfabric
Window creation: MPI_Win_allocate
> # Synchronization: MPI_Win_flush
> # Size Bandwidth (MB/s)
> 1 28.56
> 2 58.74
>
>
> So it wasn't fixed for RHEL 6.6.
>
> Regards, Götz
>
> On Mon, Dec 8, 2014 at 4:00 PM, Götz Waschk &l
anytime soon. In the meantime, Alina will continue to diligently monitor
MTT and report issues along with offending commits.
Best,
Josh
On Sat, Nov 8, 2014 at 11:53 AM, Joshua Ladd <jladd.m...@gmail.com> wrote:
> Alina,
>
> Please take the lead on this and respond to Ralph's query. Wor
, this is a good
idea.
Josh
On Thu, Nov 6, 2014 at 5:00 PM, Ralph Castain <rhc.open...@gmail.com> wrote:
>
> > On Nov 6, 2014, at 1:51 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com>
> wrote:
> >
> > On Nov 6, 2014, at 4:06 PM, Joshua Ladd <jladd.m...@gmail.com&g
tup of Gemini they can not be
> > mixed. If
> > > it is possible to mix them with other networks I would be
> happy
> > to add
> > > an atomic flag for that.
> > >
> > > -Nathan
&
On Thursday, November 6, 2014, Nathan Hjelm <hje...@lanl.gov> wrote:
> On Thu, Nov 06, 2014 at 04:06:23PM -0500, Joshua Ladd wrote:
> >Nathan,
> >Has this bug always been present in OpenIB or is this a recent
> addition?
> >If this is regression, I
in production systems, this issue was never discovered.
Once again, many thanks to Alina for discovering and reporting this. Keep
up the MTT vigilance!
Josh
On Tuesday, November 4, 2014, Joshua Ladd <jladd.m...@gmail.com
<javascript:_e(%7B%7D,'cvml','jladd.m...@gmail.com');>> wrote:
>
Thank you for taking the time to investigate this, Jeff. SC is a hectic and
stressful time for everyone on this list with many deadlines looming. This
bug isn't a priority for us, however, it seems to me that your original
revert, the one that simply wants to disable threading by default (and for
We filed an RFC for the trunk at Jeff's request. This is a new feature.
Josh
On Thu, Nov 6, 2014 at 12:13 PM, Joshua Ladd <jladd.m...@gmail.com> wrote:
> Yalla is only in trunk. Unless you want us to push it to 1.8.4 - we won't
> object :)
>
> Josh
>
> On Thu, Nov 6,
Yalla is only in trunk. Unless you want us to push it to 1.8.4 - we won't
object :)
Josh
On Thu, Nov 6, 2014 at 11:46 AM, Ralph Castain
wrote:
> Hey folks
>
> Here is the NEWS I have for 1.8.4 so far - please respond with any
> additions/mods you would like to suggest
>
I think this is a pretty significant change in behavior for a minor
release, Jeff. According to the interested parties:
"I'm reporting a performance (message rate 16%, latency 3%) regression when
using PSM that occurred between OMPI v1.6.5 and v1.8.1. I would guess it
affects other networks too,
omics
> have to be done through the same btl (including atomics on self). I did
> this because with the default setup of Gemini they can not be mixed. If
> it is possible to mix them with other networks I would be happy to add
> an atomic flag for that.
>
> -Nathan
>
> On Wed, Nov
Quick question. Out of curiosity, how do you handle the (common) case of
mixing network atomics with CPU atomics? Say for a single target with two
initiators, one initiator is on host with the target, so goes through the
SM BTL, and the other initiator is off host, so goes through the network
BTL.
Thanks, Nathan. After a bit more investigation yesterday, this was our
conclusion too; that it is a longstanding bug in OpenIB BTL we just
happened to start triggering the broken flow with some recent changes made
to the default max_lmc parameter. Let us know if you need anything from our
end.
Privet, Artem
ML is the collective component that is invoking the calls into BCOL. The
triplet basesmuma,basesmuma,ptpcoll, for example, means I want three levels
of hierarchy - socket level, UMA level, and then network level. I am
guessing (only a guess after a quick glance) that maybe srun is
@iivanov I am looking into a fix.
On Thu, Sep 25, 2014 at 11:42 AM, Open MPI wrote:
> #4919: Fix the application abort routine so we actually abort
> ---+-
> Reporter: rhc | Owner:
Let me know if Nadia can help here, Ralph.
Josh
On Fri, Sep 12, 2014 at 9:31 AM, Ralph Castain wrote:
>
> On Sep 12, 2014, at 5:45 AM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com> wrote:
>
> Ralph,
>
> On Fri, Sep 12, 2014 at 10:54 AM, Ralph Castain
jladd -> jladd-mlnx
On Wed, Sep 10, 2014 at 8:45 AM, Shamis, Pavel wrote:
> Jeff,
> pasha -> shamisp
>
> > -Original Message-
> > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff
> > Squyres (jsquyres)
> > Sent: Wednesday, September 10, 2014 6:46 AM
>
We will update the README accordingly. Thank you, Paul.
Josh
On Thu, Aug 14, 2014 at 10:00 AM, Jeff Squyres (jsquyres) <
jsquy...@cisco.com> wrote:
> Good points.
>
> Mellanox -- can you update per Paul's suggestions?
>
>
> On Aug 13, 2014, at 8:26 PM, Paul Hargrove wrote:
> Thanks.
>
> *Lenny Verkhovsky*
>
> SW Engineer, Mellanox Technologies
>
> www.mellanox.com
>
>
>
> Office:+972 74 712 9244
>
> Mobile: +972 54 554 0233
>
> Fax:+972 72 257 9400
>
>
>
> *From:* devel [mailto:devel-boun...@open-mpi.org] *On Beha
Lenny,
Is there any particular reason that you're using the trunk? The reason I
ask is because the trunk is in an unusually high state of flux at the
moment with a major move underway. If you're trying to use OMPI for
production grade runs, I would strongly advise picking up one of the stable
Nathan, can you take a look at
https://svn.open-mpi.org/trac/ompi/changeset/32350 when you get a chance.
Thanks,
Josh
On Tue, Jul 29, 2014 at 6:14 PM, Nathan Hjelm wrote:
> On Tue, Jul 29, 2014 at 04:12:18PM -0600, Nathan Hjelm wrote:
> >
> > Yeah. Though it would be best to
s");
>
> mca_base_var_get_value (vari, NULL, , NULL);
>
> If the source is MCA_BASE_VAR_SOURCE_DEFAULT then the value was not
> modified by a file, the enviornment, or MPI_T.
>
> -Nathan
>
> On Tue, Jul 29, 2014 at 05:42:20PM -0400, svn-commit-mai...@open-mpi.org
> wrote:
Hi, Nadia
I CMRed your patch to 1.8.2 and applied the fix on the trunk in:
https://svn.open-mpi.org/trac/ompi/changeset/32346
Thanks for reporting!
Josh
On Fri, Jul 11, 2014 at 6:04 AM, Nadia Derbey wrote:
> Hi,
>
> I noticed that specifying the receive_queues
Pasha,
Is v1.1 posted somewhere? I don't see it up on the LBNL site.
Josh
On Tue, Jul 29, 2014 at 2:05 PM, Shamis, Pavel wrote:
>
> Btw, I'm pretty confident, that this Open SHMEM implementation does
> not
> recognize global or static variables in shared
t;
> mechanism used for the data segment of the executable.
>
>
>
> Howard
>
>
>
>
>
> *From:* devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *Joshua
> Ladd
> *Sent:* Tuesday, July 29, 2014 10:57 AM
> *To:* Open MPI Developers
> *Subject:* Re: [OMPI d
Are you claiming that in the following test, the static variable "val" will
not be seen as a symmetric object?
#include "shmem.h"
int main( int argc, char **argv){
long my_pe, npes, master;
start_pes(0);
my_pe = shmem_my_pe();
npes = shmem_n_pes();
master = npes - 1;
/*
lin, Mellanox
> > alinas -> Alina Sklarevich, Mellanox
> > amikheev -> Alex Mikheev, Mellanox
> > bosilca -> George Bosilca, UTK
> > brbarret -> Brian Barrett, IU, LANL, SNL
> > devendar -> Devendar Bureddy, Mellanox
> > dgoodell -> Dave Goodell, Cisco
> &g
*An enhancement to permit some form of delimiter escaping would probably
still be nice, but is low priority.*
[Josh] Not a problem, Dave. We will do this.
On Wed, Jul 16, 2014 at 4:32 PM, Dave Goodell (dgoodell) <dgood...@cisco.com
> wrote:
> On Jul 16, 2014, at 3:08 PM, Joshua Ladd
Ralph warned me that no matter what decision we made, someone would
probably violently object. So, with that in mind, let me put my diplomat
hat on...
Dave, I'm sorry you view this as a "crapification" of your mpirun user
interface. Your lament is duly noted and we are happy to work with you to
Dave,
Your example will error out. If someone tries to set envars with both
mechanism, the job fails. The decision to do so was also made at the Dev
meeting and is so that we don't have to do this kind of checking.
Josh
On Wed, Jul 16, 2014 at 12:22 PM, Dave Goodell (dgoodell) <
According to http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html
*"constructor *
* destructor *
* constructor (*priority*)** destructor (priority)**The constructor
attribute causes the function to be called automatically before execution
enters main (). Similarly, the destructor attribute
+1 I'm interested in hearing more. RTE is of interest.
Josh
On Thu, May 29, 2014 at 10:33 AM, Ralph Castain wrote:
> +1 for me!
>
> On May 29, 2014, at 7:26 AM, Thomas Naughton wrote:
>
> > Hi,
> >
> > Thanks Jeff, I think that was a pretty good summary
Yes. Will look into it.
Josh
On Mon, May 12, 2014 at 6:01 PM, Jeff Squyres (jsquyres) wrote:
> Ah; I guess the tags aren't getting pulled over.
>
> Mellanox -- can you check into this?
>
>
>
> On May 12, 2014, at 5:52 PM, "Friedley, Andrew"
>
Chris,
The necessary packages will be supported and available in community OFED.
Josh
On Thu, May 8, 2014 at 9:23 AM, Chris Samuel <sam...@unimelb.edu.au> wrote:
> On Thu, 8 May 2014 09:10:00 AM Joshua Ladd wrote:
>
> > We (MLNX) are working on a new SLURM PMI2 p
stream?
> -Adam
>
>
> --
> *From:* devel [devel-boun...@open-mpi.org] on behalf of Joshua Ladd [
> jladd.m...@gmail.com]
> *Sent:* Wednesday, May 07, 2014 7:56 AM
> *To:* Open MPI Developers
>
> *Subject:* Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is
&g
+1
On Thu, May 8, 2014 at 6:08 AM, Jeff Squyres (jsquyres)
wrote:
> WHAT: Remove the backwards-compatibility autogen.sh sym link
>
> WHY: Because it's time
>
> WHERE: svn rm autogen.sh
>
> TIMEOUT: Teleconf next Tuesday, 13 May 2014
>
> MORE DETAIL:
>
> We converted from
MI-2 version and should try PMI-1.
>>
>> Make sense?
>> Ralph
>>
>> On May 7, 2014, at 8:00 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>
>>
>> On May 7, 2014, at 7:56 AM, Joshua Ladd <jladd.m...@gmail.com> wrote:
>>
>> Ah, I see
given that both you and Chris appear to prefer to keep it
> "on-by-default", we'll see if we can find a way to detect that PMI-2 is
> broken and then fall back to PMI-1.
>
>
> On May 7, 2014, at 7:39 AM, Joshua Ladd <jladd.m...@gmail.com> wrote:
>
> Just saw thi
Rolf,
This was run on a Sandy Bridge system with ConnectX-3 cards.
Josh
On Wed, May 7, 2014 at 10:46 AM, Joshua Ladd <jladd.m...@gmail.com> wrote:
> Elena, can you run your reproducer on the trunk, please, and see if the
> problem persists?
>
> Josh
>
>
> On Wed, Ma
Elena, can you run your reproducer on the trunk, please, and see if the
problem persists?
Josh
On Wed, May 7, 2014 at 10:26 AM, Jeff Squyres (jsquyres) wrote:
> On May 7, 2014, at 10:03 AM, Elena Elkina wrote:
>
> > Yes, this commit is also in
Just saw this thread, and I second Chris' observations: at scale we are
seeing huge gains in jobstart performance with PMI2 over PMI1. We
*CANNOT*loose this functionality. For competitive reasons, I cannot
provide exact
numbers, but let's say the difference is in the ballpark of a full
Wait, this can simply have the milestone changed then, right?
On Wed, Apr 30, 2014 at 9:46 AM, Open MPI wrote:
> #4582: Move r31564 to v1.8 branch (OSHMEM: Added missing API for)
> ---+-
> Reporter: miked
Hi, OMPI Community
On the call yesterday, Ralph and Jeff posed the question to the Community
at large and to NVIDIA in particular if they/we/us have a vested interest
in heterogeneous support. Mellanox and NVIDIA are partnering on systems
that, on today's roadmap, could require heterogeneous
In order to run with OMPI 1.8, we need to get you the latest HCOLL drop. Mike
Dubman can handle this for you. He will be back in the office Thursday sometime.
Best,
Josh
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Anthony Alba
Sent: Tuesday, April 08, 2014 9:59 PM
To:
info.
Best,
Josh
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Joshua Ladd
Sent: Tuesday, April 01, 2014 11:15 AM
To: Open MPI Developers (de...@open-mpi.org)
Subject: [OMPI devel] Seeking input for an RFC
Soliciting input from the community:
WHAT: Modify PML cm component
Soliciting input from the community:
WHAT: Modify PML cm component to remove unnecessary initializations,
optimizing blocking operations
WHY:Remove overhead in fast-path by allowing a "direct mode" increases
single packet latency
HOW:In PML cm, even if the request starts and ends
I also believe that for iWARP and RoCE, the RDMA CM will be chosen
automatically, and UD CM will be automatically chosen for IB.
[Josh] If you want to run OMPI over RoCE on Mellanox hardware, you must
explicitly choose rdmacm with -mca btl openib,sm,self -mca
btl_openib_cpc_include rdmacm -
Yossi, is it possible to handle this with OBJ_RELEASE?
-Original Message-
From: bugs [mailto:bugs-boun...@open-mpi.org] On Behalf Of Open MPI
Sent: Monday, March 10, 2014 12:22 PM
Cc: b...@open-mpi.org
Subject: Re: [OMPI bugs] [Open MPI] #4354: Move r30966 to v1.7 branch (In
mtl_mxm,
Fresh checkout did the trick. Sorry to bother.
Josh
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Joshua Ladd
Sent: Tuesday, February 25, 2014 6:51 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] Trunk is broken
SVN up. Reran autogen. I'm trying with a fresh checkout now
.org] On Behalf Of Ralph Castain
Sent: Tuesday, February 25, 2014 6:17 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] Trunk is broken
Odd - it is building fine for me on both Mac and Linux. Is this a git mirror or
the actual svn checkout, or a tarball?
On Feb 25, 2014, at 3:1
Developers
Subject: Re: [OMPI devel] Trunk is broken
Odd - it is building fine for me on both Mac and Linux. Is this a git mirror or
the actual svn checkout, or a tarball?
On Feb 25, 2014, at 3:11 PM, Joshua Ladd
<josh...@mellanox.com<mailto:josh...@mellanox.com>> wrote:
Ralph, may
Ralph, maybe something didn't get pulled over in your OSC merge:
Looks like a few routines were removed and were not replaced or were not
removed from other parts of the code where they are invoked
inompi/mpi/c/profile/paccumulate.c
The offending change set is:
Fri, Feb 7, 2014 at 2:23 PM, Joshua Ladd
<josh...@mellanox.com<mailto:josh...@mellanox.com>> wrote:
What: Add an internal random number generator to OPAL.
Why: OMPI uses rand and srand all over the place. Because the middleware is
mucking with the RNG's global state, applications
It's been CMRed, but scheduled for 1.7.5
https://svn.open-mpi.org/trac/ompi/ticket/4185
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Mike Dubman
Sent: Thursday, February 06, 2014 12:17 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] [OMPI svn] svn:open-mpi r30571 -
Subject: Re: [OMPI devel] hcoll destruction via MPI attribute
On Jan 9, 2014, at 11:00 AM, Joshua Ladd <josh...@mellanox.com> wrote:
> Hcoll uses the PML as an "OOB" to bootstrap itself. When a communicator is
> destroyed, by the time we destroy the hcoll module, th
+Valentine
Jeff,
Hcoll uses the PML as an "OOB" to bootstrap itself. When a communicator is
destroyed, by the time we destroy the hcoll module, the communicator context is
no longer valid and any pending operations that rely on its existence will
fail. In particular, we have a non-blocking
Hjelm
Sent: Monday, December 16, 2013 12:44 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] bug in mca framework?
On Mon, Dec 16, 2013 at 05:21:05PM +, Joshua Ladd wrote:
> After speaking with Igor Ivanov about this this morning, he summarized his
> findings as follows:
>
After speaking with Igor Ivanov about this this morning, he summarized his
findings as follows:
1. Valgrind comes up clean.
2. The issue is not reproduced with a static build.
3. A bisection study reveals that problems first appear after commit:
The proof of the pudding is that all of the MPI layer has been adapted to the
new async behavior -except- for the openib cpc's. The issue of what to do with
these has been raised several times, especially once the ofacm code was
committed. Unfortunately, lack of time and priorities left this
wba...@sandia.gov<mailto:bwba...@sandia.gov>> wrote:
On 11/14/13 1:13 PM, "Joshua Ladd"
<josh...@mellanox.com<mailto:josh...@mellanox.com>> wrote:
Let me try to summarize my understanding of the situation:
1. Ralph made the OOB asynchronous.
2. OOB cpcs don't work
ad of OPENIB.
This made openib initialization code a bit cleaner. Here is my old tree with
openib btl changes https://bitbucket.org/pasha/ofacm
I hope it helps,
Best,
Pasha
On Nov 14, 2013, at 1:17 PM, Joshua Ladd <josh...@mellanox.com> wrote:
> Unless someone went in and "fi
finally complete the switchover.
>>>
>>> Meantime, perhaps someone can CMR and review a copying of the udcm
>>> cpc to the 1.7 branch?
>>>
>>>
>>> On Nov 14, 2013, at 5:14 AM, Joshua Ladd <josh...@mellanox.com> wrote:
>>>
>
code. Looking over
> at that area, I see only oob and xoob - so if the users of the common ofacm
> code are finding that it works, the simple answer may just be to finally
> complete the switchover.
>
> Meantime, perhaps someone can CMR and review a copying of the udcm cpc to the
&
Um, no. It's supposed to work with UDCM which doesn't appear to be enabled in
1.7.
Per Ralph's comment to me last night:
"... you cannot use the oob connection manager. It doesn't work and was
deprecated. You must use udcm, which is why things are supposed to be set to do
so by default.
can't do that, Josh. You are violating the abstraction break rather
badly by searching for specific IB devices down in ORTE.
Please revert this and let's talk about what you are actually trying to do.
On Nov 7, 2013, at 8:28 PM, svn-commit-mai...@open-mpi.org wrote:
> Author: jladd (Jos
AGS="$oshmem_CFLAGS"
>>
>
>Nope, it was not that simple. With that change, the -pedantic and
>-Wundef end up in the CFLAGS for oshmem and I see all the warnings.
>I will submit a ticket and give it to Joshua Ladd.
Yeah, that's not going to work. But a bigger question: why
| sed 's/-Wno-long-
>double//g'`"
>
>I think the solution is simple -- delete this line:
>
>> CFLAGS="$oshmem_CFLAGS"
>
Nope, it was not that simple. With that change, the -pedanti
Sent: Tuesday, October 29, 2013 1:58 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] CM PML / OpenSHMEM
Did that time get finalized? I recall the doodle, but not seeing a final
decision
On Oct 29, 2013, at 10:53 AM, Joshua Ladd <josh...@mellanox.com> wrote:
> These (and others) ar
These (and others) are exactly the issues we need to discuss with you guys next
week.
Josh
-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Tuesday, October 29, 2013 1:29 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] CM PML /
I think the community’s concerns are valid. What Mike is articulating is that
we already maintain a “1.7 ready” OSHMEM branch internally. I think it should
be a simple procedure to do as Brian and Ralph are suggesting and branch off of
1.7 in SVN and apply our patches. We can do this.
Josh
I wondered where that was coming from.
-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres
(jsquyres)
Sent: Tuesday, October 29, 2013 7:53 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] SHMEM v1.7 merge proposal
On Oct 29, 2013, at 7:16
Also getting a compile failure in the trunk:
./autogen.pl && ./configure
--prefix=/hpc/home/USERS/joshual/ompi_trunk/really-the-trunk/ompi-install
--with-mxm=/hpc/local/src/mxm2_release --with-fca=/opt/mellanox/fca --with-pmi
&& make -j 9 && make install
CC ess_slurm_module.lo
CCLD
1 - 100 of 124 matches
Mail list logo