-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 09/05/14 00:16, Joshua Ladd wrote:
> The necessary packages will be supported and available in community
> OFED.
We're constrained to what is in RHEL6 I'm afraid.
This is because we have to run GPFS over IB to BG/Q from the same NSDs
that talk GP
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 08/05/14 23:45, Ralph Castain wrote:
> Artem and I are working on a new PMIx plugin that will resolve it
> for non-Mellanox cases.
Ah yes of course, sorry my bad!
- --
Christopher SamuelSenior Systems Administrator
VLSCI - Victorian L
Chris,
The necessary packages will be supported and available in community OFED.
Josh
On Thu, May 8, 2014 at 9:23 AM, Chris Samuel wrote:
> On Thu, 8 May 2014 09:10:00 AM Joshua Ladd wrote:
>
> > We (MLNX) are working on a new SLURM PMI2 plugin that we plan to
> eventually
> > push upstream.
On May 8, 2014, at 6:23 AM, Chris Samuel wrote:
> On Thu, 8 May 2014 09:10:00 AM Joshua Ladd wrote:
>
>> We (MLNX) are working on a new SLURM PMI2 plugin that we plan to eventually
>> push upstream. However, to use it, it will require linking in a proprietary
>> Mellanox library that accelerate
On Thu, 8 May 2014 09:10:00 AM Joshua Ladd wrote:
> We (MLNX) are working on a new SLURM PMI2 plugin that we plan to eventually
> push upstream. However, to use it, it will require linking in a proprietary
> Mellanox library that accelerates the collective operations (available in
> MOFED versions
---
> *From:* devel [devel-boun...@open-mpi.org] on behalf of Joshua Ladd [
> jladd.m...@gmail.com]
> *Sent:* Wednesday, May 07, 2014 7:56 AM
> *To:* Open MPI Developers
>
> *Subject:* Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is
> specifically requested
>
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 08/05/14 12:54, Ralph Castain wrote:
> I think there was one 2.6.x that was borked, and definitely
> problems in the 14.03.x line. Can't pinpoint it for you, though.
No worries, thanks.
> Sounds good. I'm going to have to dig deeper into those nu
2014-05-08 9:54 GMT+07:00 Ralph Castain :
>
> On May 7, 2014, at 6:15 PM, Christopher Samuel
> wrote:
>
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA1
> >
> > Hi all,
> >
> > Apologies for having dropped out of the thread, night intervened here.
> ;-)
> >
> > On 08/05/14 00:45, Ralph Casta
On May 7, 2014, at 6:15 PM, Christopher Samuel wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Hi all,
>
> Apologies for having dropped out of the thread, night intervened here. ;-)
>
> On 08/05/14 00:45, Ralph Castain wrote:
>
>> Okay, then we'll just have to develop a workarou
On May 7, 2014, at 6:51 PM, Christopher Samuel wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 07/05/14 18:00, Ralph Castain wrote:
>
>> Interesting - how many nodes were involved? As I said, the bad
>> scaling becomes more evident at a fairly high node count.
>
> Our x86-64
That is interesting. I think I will reconstruct your experiments on my
system when I will be testing PMI selection logic. According to your
resource count numbers I can do that. I will publish my results in the list.
2014-05-08 8:51 GMT+07:00 Christopher Samuel :
> -BEGIN PGP SIGNED MESSAGE-
Hi Chris.
Current disign is to provide the runtime parameter for PMI version
selection. It would be even more flexible that configuration-time selection
and (with my current understanding) not very hard to acheive.
2014-05-08 8:15 GMT+07:00 Christopher Samuel :
> -BEGIN PGP SIGNED MESSAGE--
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 07/05/14 18:00, Ralph Castain wrote:
> Interesting - how many nodes were involved? As I said, the bad
> scaling becomes more evident at a fairly high node count.
Our x86-64 systems are low node counts (we've got BG/Q for capacity),
the cluster th
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi all,
Apologies for having dropped out of the thread, night intervened here. ;-)
On 08/05/14 00:45, Ralph Castain wrote:
> Okay, then we'll just have to develop a workaround for all those
> Slurm releases where PMI-2 is borked :-(
Do you know wh
2014-05-08 7:15 GMT+07:00 Ralph Castain :
> Take a look in opal/mca/common/pmi - we already do a bunch of #if PMI2
> stuff in there. All we are talking about doing here is:
>
> * making those selections be runtime based on an MCA param, compiling if
> PMI2 is available but selected at runtime
>
>
Take a look in opal/mca/common/pmi - we already do a bunch of #if PMI2 stuff in
there. All we are talking about doing here is:
* making those selections be runtime based on an MCA param, compiling if PMI2
is available but selected at runtime
* moving some additional functions into that code are
I like #2 too.
But my question was slightly different. Can we incapsulate PMI logic that
OMPI use in common/pmi as #2 suggests but have 2 different implementations
of this component say common/pmi and common/pmi2? I am asking because I
have concerns that this kind of component is not supposed to be
The desired solution is to have the ability to select pmi-1 vs pmi-2 at
runtime. This can be done in two ways:
1. you could have separate pmi1 and pmi2 components in each framework. You'd
want to define only one common MCA param to direct the selection, however.
2. you could have a single pmi c
Just reread your suggestions in our out-of-list discussion and found that I
misunderstand it. So no parallel PMI! Take all possible code into
opal/mca/common/pmi.
To additionally clarify what is the preferred way:
1. to create one joined PMI module having a switches to decide what
functiononality
2014-05-08 5:54 GMT+07:00 Ralph Castain :
> Ummmno, I don't think that's right. I believe we decided to instead
> create the separate components, default to PMI-2 if available, print nice
> error message if not, otherwise use PMI-1.
>
> I don't want to initialize both PMIs in parallel as most
Ummmno, I don't think that's right. I believe we decided to instead create
the separate components, default to PMI-2 if available, print nice error
message if not, otherwise use PMI-1.
I don't want to initialize both PMIs in parallel as most installations won't
support it.
On May 7, 2014,
We discussed with Ralph Joshuas concerns and decided to try automatic PMI2
correctness first as it was initially intended. Here is my idea. The
universal way to decide if PMI2 is correct is to compare PMI_Init(..,
&rank, &size, ...) and PMI2_Init(.., &rank, &size, ...). Size and rank
should be equa
That's a good point. There is actually a bunch of modules in ompi, opal and
orte that has to be duplicated.
среда, 7 мая 2014 г. пользователь Joshua Ladd написал:
> +1 Sounds like a good idea - but decoupling the two and adding all the
> right selection mojo might be a bit of a pain. There are se
Yeah, we'll want to move some of it into common - but a lot of that was already
done, so I think it won't be that hard. Will explore
On May 7, 2014, at 9:00 AM, Joshua Ladd wrote:
> +1 Sounds like a good idea - but decoupling the two and adding all the right
> selection mojo might be a bit of
+1 Sounds like a good idea - but decoupling the two and adding all the
right selection mojo might be a bit of a pain. There are several places in
OMPI where the distinction between PMI1and PMI2 is made, not only in
grpcomm. DB and ESS frameworks off the top of my head.
Josh
On Wed, May 7, 2014 a
Good idea :)!
среда, 7 мая 2014 г. пользователь Ralph Castain написал:
> Jeff actually had a useful suggestion (gasp!).He proposed that we separate
> the PMI-1 and PMI-2 codes into separate components so you could select them
> at runtime. Thus, we would build both (assuming both PMI-1 and 2 libs
Jeff actually had a useful suggestion (gasp!).He proposed that we separate the
PMI-1 and PMI-2 codes into separate components so you could select them at
runtime. Thus, we would build both (assuming both PMI-1 and 2 libs are found),
default to PMI-1, but users could select to try PMI-2. If the P
n MPI Developers
Subject: Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is
specifically requested
Ah, I see. Sorry for the reactionary comment - but this feature falls squarely
within my "jurisdiction", and we've invested a lot in improving OMPI jobstart
under sr
Thanks, Chris.
-Adam
From: devel [devel-boun...@open-mpi.org] on behalf of Christopher Samuel
[sam...@unimelb.edu.au]
Sent: Wednesday, May 07, 2014 12:07 AM
To: de...@open-mpi.org
Subject: Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is
On May 7, 2014, at 7:56 AM, Joshua Ladd wrote:
> Ah, I see. Sorry for the reactionary comment - but this feature falls
> squarely within my "jurisdiction", and we've invested a lot in improving OMPI
> jobstart under srun.
>
> That being said (now that I've taken some deep breaths and careful
Ah, I see. Sorry for the reactionary comment - but this feature falls
squarely within my "jurisdiction", and we've invested a lot in improving
OMPI jobstart under srun.
That being said (now that I've taken some deep breaths and carefully read
your original email :)), what you're proposing isn't a
Okay, then we'll just have to develop a workaround for all those Slurm releases
where PMI-2 is borked :-(
FWIW: I think people misunderstood my statement. I specifically did *not*
propose to *lose* PMI-2 support. I suggested that we change it to
"on-by-request" instead of the current "on-by-def
Just saw this thread, and I second Chris' observations: at scale we are
seeing huge gains in jobstart performance with PMI2 over PMI1. We
*CANNOT*loose this functionality. For competitive reasons, I cannot
provide exact
numbers, but let's say the difference is in the ballpark of a full
order-of-mag
Interesting - how many nodes were involved? As I said, the bad scaling becomes
more evident at a fairly high node count.
On May 7, 2014, at 12:07 AM, Christopher Samuel wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Hiya Ralph,
>
> On 07/05/14 14:49, Ralph Castain wrote:
>
>> I
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hiya Ralph,
On 07/05/14 14:49, Ralph Castain wrote:
> I should have looked closer to see the numbers you posted, Chris -
> those include time for MPI wireup. So what you are seeing is that
> mpirun is much more efficient at exchanging the MPI endpoin
I should have looked closer to see the numbers you posted, Chris - those
include time for MPI wireup. So what you are seeing is that mpirun is much more
efficient at exchanging the MPI endpoint info than PMI. I suspect that PMI2 is
not much better as the primary reason for the difference is that
Ah, interesting - my comments were in respect to startup time (specifically,
MPI wireup)
On May 6, 2014, at 8:49 PM, Christopher Samuel wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 07/05/14 13:37, Moody, Adam T. wrote:
>
>> Hi Chris,
>
> Hi Adam,
>
>> I'm interested in SL
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 07/05/14 13:37, Moody, Adam T. wrote:
> Hi Chris,
Hi Adam,
> I'm interested in SLURM / OpenMPI startup numbers, but I haven't
> done this testing myself. We're stuck with an older version of
> SLURM for various internal reasons, and I'm wonderin
Samuel
> [sam...@unimelb.edu.au]
> Sent: Tuesday, May 06, 2014 8:32 PM
> To: de...@open-mpi.org
> Subject: Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is
> specifically requested
>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 07/05/
ome of the differences in times at different
scales?
Thanks,
-Adam
From: devel [devel-boun...@open-mpi.org] on behalf of Christopher Samuel
[sam...@unimelb.edu.au]
Sent: Tuesday, May 06, 2014 8:32 PM
To: de...@open-mpi.org
Subject: Re: [OMPI devel] RFC:
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 07/05/14 12:53, Ralph Castain wrote:
> We have been seeing a lot of problems with the Slurm PMI-2 support
> (not in OMPI - it's the code in Slurm that is having problems). At
> this time, I'm unaware of any advantage in using PMI-2 over PMI-1
> i
We have been seeing a lot of problems with the Slurm PMI-2 support (not in OMPI
- it's the code in Slurm that is having problems). At this time, I'm unaware of
any advantage in using PMI-2 over PMI-1 in Slurm - the scaling is equally poor,
and PMI-2 does not supports any additional functionality
42 matches
Mail list logo