Hi Ralph

Thank you.
Your fixes covered much more than I could find.
The section about the three levels of process placement
(" Mapping, Ranking, and Binding: Oh My!") really helps.
I would just add at the very beginning
short sentences quickly characterizing each of the three levels.
Kind of an "abstract".
Then explain each level in more detail.

**

Also, I found Jeff's 2013 presentation about the new style
of process placement.

http://www.slideshare.net/jsquyres/open-mpi-explorations-in-process-affinity-eurompi13-presentation

The title calls it "LAMA".
(That is mud in Portuguese! But the presentation is clear.)
OK, the acronym means "Locality Aware Mapping Algorithm".
In any case, it sounds very similar to the current process placement
features of OMPI 1.8, although only Jeff and you can really tell if it
is exactly the same.

If it is the same, it may help to link it to the OMPI FAQ,
or somehow make it more visible, printable, etc.
If there are differences between OMPI 1.8 and the presentation,
it may be worth adjusting the presentation to the
current OMPI 1.8, and posting it as well.
That would be a good way to convey the OMPI 1.8
process placement conceptual model, along with its syntax
and examples.

Thank you,
Gus Correa


On 10/17/2014 12:10 AM, Ralph Castain wrote:
I know this commit could be a little hard to parse, but I have updated
the mpirun man page on the trunk and will port the change over to the
1.8 series tomorrow. FWIW, I’ve provided the link to the commit below so
you can “preview” it.

https://github.com/open-mpi/ompi/commit/f9d620e3a772cdeddd40b4f0789cf59c75b44868

HTH
Ralph


On Oct 16, 2014, at 9:43 AM, Gus Correa <g...@ldeo.columbia.edu
<mailto:g...@ldeo.columbia.edu>> wrote:

Hi Ralph

Yes, I know the process placement features are powerful.
They were already very good in 1.6, even in 1.4,
and I just tried the new 1.8
"-map-by l2cache" (works nicely on Opteron 6300).

Unfortunately I couldn't keep track, test, and use the 1.7 series.
I did that in the previous "odd/new feature" series (1.3, 1.5).
However, my normal workload require that
I focus my attention on the "even/stable" series
(less fun, more production).
Hence I hopped directly from 1.6 to 1.8,
although I read a number of mailing list postings about the new
style of process placement.

Pestering you again about documentation (last time for now):
The mpiexec man page also seems to need an update.
That is probably the first place people look for information
about runtime features.
For instance, the process placement examples still
use deprecated parameters and mpiexec options:
-bind-to-core, rmaps_base_schedule_policy, orte_process_binding, etc.

Thank you,
Gus Correa

On 10/15/2014 11:10 PM, Ralph Castain wrote:

On Oct 15, 2014, at 11:46 AM, Gus Correa <g...@ldeo.columbia.edu
<mailto:g...@ldeo.columbia.edu>
<mailto:g...@ldeo.columbia.edu>> wrote:

Thank you Ralph and Jeff for the help!

Glad to hear the segmentation fault is reproducible and will be fixed.

In any case, one can just avoid the old parameter name
(rmaps_base_schedule_policy),
and use instead the new parameter name
(rmaps_base_mapping_policy)
without any problem in OMPI 1.8.3.


Fix is in the nightly 1.8 tarball - I'll release a 1.8.4 soon to cover
the problem.

**

Thanks Ralph for sending the new (OMPI 1.8)
parameter names for process binding.

My recollection is that sometime ago somebody (Jeff perhaps?)
posted here a link to a presentation (PDF or PPT) explaining the
new style of process binding, but I couldn't find it in the
list archives.
Maybe the link could be part of the FAQ (if not already there)?

I don't think it is, but I'll try to add it over the next day or so.


**

The Open MPI runtime environment is really great.
However, to take advantage of it one often has to do parameter guessing,
and to do time consuming tests by trial and error,
because the main sources of documentation are
the terse output of ompi_info, and several sparse
references in the FAQ.
(Some of them outdated?)

In addition, the runtime environment has evolved over time,
which is certainly a good thing.
However, along with this evolution, several runtime parameters
changed both name and functionality, new ones were introduced,
old ones were deprecated, which can be somewhat confusing,
and can lead to an ineffective use of the runtime environment.
(In 1.8.3 I was using several deprecated parameters from 1.6.5
that seem to be silently ignored at runtime.
I only noticed the problem because that segmentation fault happened.)

I know asking for thorough documentation is foolish,

Not really - it is something we need to get better about :-(

but I guess a simple table of runtime parameter names and valid values
in the FAQ, maybe sorted by their purpose/function, along with a few
examples of use, could help a lot.
Some of this material is now spread across several FAQ, but not so
easy to find/compare.
That doesn't need to be a comprehensive table, but commonly used
items like selecting the btl, selecting interfaces,
dealing with process binding,
modifying/enriching the stdout/sterr output
(tagging output, increasing verbosity, etc),
probably have their place there.

Yeah, we fell down on this one. The changes were announced with each
step in the 1.7 series, but if you step from 1.6 directly to 1.8, you'll
get caught flat-footed. We honestly didn't think of that case, and so we
mentally assumed that "of course people have been following the series -
they know what happened".

You know what they say about those who "assume" :-/

I'll try to get something into the FAQ about the entire new mapping,
ranking, and binding system. It is actually VERY powerful, allowing you
to specify pretty much any placement pattern you can imagine and bind it
to whatever level you desire. It was developed in response to requests
from researchers who wanted to explore application performance versus
placement strategies - but we provided some simplified options to
support more common usage patterns.




Many thanks,
Gus Correa


On 10/15/2014 11:12 AM, Jeff Squyres (jsquyres) wrote:
We talked off-list -- fixed this on master and just filed
https://github.com/open-mpi/ompi-release/pull/33 to get this into the
v1.8 branch.


On Oct 14, 2014, at 7:39 PM, Ralph Castain <r...@open-mpi.org
<mailto:r...@open-mpi.org>
<mailto:r...@open-mpi.org>> wrote:


On Oct 14, 2014, at 5:32 PM, Gus Correa <g...@ldeo.columbia.edu
<mailto:g...@ldeo.columbia.edu>
<mailto:g...@ldeo.columbia.edu>> wrote:

Dear Open MPI fans and experts

This is just a note in case other people run into the same problem.

I just built Open MPI 1.8.3.
As usual I put my old settings on openmpi-mca-params.conf,
with no further thinking.
Then I compiled the connectivity_c.c program and tried
to run it with mpiexec.
That is a routine that never failed before.

Bummer!
I've got a segmentation fault right away.

Strange  - it works fine from the cmd line:

07:27:04  (v1.8) /home/common/openmpi/ompi-release$ mpirun -n 1 -mca
rmaps_base_schedule_policy core hostname
--------------------------------------------------------------------------
A deprecated MCA variable value was specified in the environment or
on the command line.  Deprecated MCA variables should be avoided;
they may disappear in future releases.

Deprecated variable: rmaps_base_schedule_policy
New variable:        rmaps_base_mapping_policy
--------------------------------------------------------------------------
bend001

HOWEVER, I can replicate that behavior when it is in the default
params file! I don't see the immediate cause of the difference, but
will investigate.


After some head scratching, checking my environment, etc,
I thought I might have configured OMPI incorrectly.
Hence, I tried to get information from ompi_info.
Oh well, ompi_info also segfaulted!

It took me a while to realize that the runtime parameter
configuration file was the culprit.

When I inserted the runtime parameter settings one by one,
the segfault came with this one:

rmaps_base_schedule_policy = core

Ompi_info (when I got it to work) told me that the parameter above
is now a deprecated synonym of:

rmaps_base_mapping_policy = core

In any case, the old synonym doesn't work and makes ompi_info and
mpiexec segfault (and I'd guess anything else that requires the
OMPI runtime components).
Only the new parameter name works.

That's because the segfault is happening in the printing of the
deprecation warning.


***

I am also missing in the ompi_info output the following
(OMPI 1.6.5) parameters (not reported by ompi_info --all --all):


1) orte_process_binding  ===> hwloc_base_binding_policy

2) orte_report_bindings   ===> hwloc_base_report_bindings

3) opal_paffinity_alone  ===> gone, use
hwloc_base_binding_policy=none if you don't want any binding


Are they gone forever?

Are there replacements for them, with approximately the same
functionality?

Is there a list comparing the new (1.8) vs. old (1.6)
OMPI runtime parameters, and/or any additional documentation
about the new style of OMPI 1.8 runtime parameters?

Will try to add this to the web site


Since there seems to have been a major revamping of the OMPI
runtime parameters, that would be a great help.

Thank you,
Gus Correa
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/10/25497.php

_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/10/25498.php



_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
<mailto:us...@open-mpi.org>
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
<http://www.open-mpi.org/mailman/listinfo.cgi/users>
Link to this
post:http://www.open-mpi.org/community/lists/users/2014/10/25501.php
<http://www.open-mpi.org/community/lists/users/2014/10/25501.php>



_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/10/25503.php


_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/10/25508.php



_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/10/25526.php


Reply via email to