> On Oct 17, 2014, at 10:23 AM, Gus Correa <g...@ldeo.columbia.edu> wrote: > > Hi Ralph > > Thank you. > Your fixes covered much more than I could find. > The section about the three levels of process placement > (" Mapping, Ranking, and Binding: Oh My!") really helps. > I would just add at the very beginning > short sentences quickly characterizing each of the three levels. > Kind of an "abstract". > Then explain each level in more detail.
Will do - thanks! > > ** > > Also, I found Jeff's 2013 presentation about the new style > of process placement. > > http://www.slideshare.net/jsquyres/open-mpi-explorations-in-process-affinity-eurompi13-presentation > > The title calls it "LAMA". > (That is mud in Portuguese! But the presentation is clear.) > OK, the acronym means "Locality Aware Mapping Algorithm". > In any case, it sounds very similar to the current process placement > features of OMPI 1.8, although only Jeff and you can really tell if it > is exactly the same. > > If it is the same, it may help to link it to the OMPI FAQ, > or somehow make it more visible, printable, etc. > If there are differences between OMPI 1.8 and the presentation, > it may be worth adjusting the presentation to the > current OMPI 1.8, and posting it as well. > That would be a good way to convey the OMPI 1.8 > process placement conceptual model, along with its syntax > and examples. Yeah, I need to do that. LAMA was an alternative implementation of the current map/rank/bind system. It hasn’t been fully maintained since it was introduced, and so I’m not sure how much of it is functional. I need to create an equivalent for the current implementation. > > Thank you, > Gus Correa > > > On 10/17/2014 12:10 AM, Ralph Castain wrote: >> I know this commit could be a little hard to parse, but I have updated >> the mpirun man page on the trunk and will port the change over to the >> 1.8 series tomorrow. FWIW, I’ve provided the link to the commit below so >> you can “preview” it. >> >> https://github.com/open-mpi/ompi/commit/f9d620e3a772cdeddd40b4f0789cf59c75b44868 >> >> HTH >> Ralph >> >> >>> On Oct 16, 2014, at 9:43 AM, Gus Correa <g...@ldeo.columbia.edu >>> <mailto:g...@ldeo.columbia.edu>> wrote: >>> >>> Hi Ralph >>> >>> Yes, I know the process placement features are powerful. >>> They were already very good in 1.6, even in 1.4, >>> and I just tried the new 1.8 >>> "-map-by l2cache" (works nicely on Opteron 6300). >>> >>> Unfortunately I couldn't keep track, test, and use the 1.7 series. >>> I did that in the previous "odd/new feature" series (1.3, 1.5). >>> However, my normal workload require that >>> I focus my attention on the "even/stable" series >>> (less fun, more production). >>> Hence I hopped directly from 1.6 to 1.8, >>> although I read a number of mailing list postings about the new >>> style of process placement. >>> >>> Pestering you again about documentation (last time for now): >>> The mpiexec man page also seems to need an update. >>> That is probably the first place people look for information >>> about runtime features. >>> For instance, the process placement examples still >>> use deprecated parameters and mpiexec options: >>> -bind-to-core, rmaps_base_schedule_policy, orte_process_binding, etc. >>> >>> Thank you, >>> Gus Correa >>> >>> On 10/15/2014 11:10 PM, Ralph Castain wrote: >>>> >>>> On Oct 15, 2014, at 11:46 AM, Gus Correa <g...@ldeo.columbia.edu >>>> <mailto:g...@ldeo.columbia.edu> >>>> <mailto:g...@ldeo.columbia.edu>> wrote: >>>> >>>>> Thank you Ralph and Jeff for the help! >>>>> >>>>> Glad to hear the segmentation fault is reproducible and will be fixed. >>>>> >>>>> In any case, one can just avoid the old parameter name >>>>> (rmaps_base_schedule_policy), >>>>> and use instead the new parameter name >>>>> (rmaps_base_mapping_policy) >>>>> without any problem in OMPI 1.8.3. >>>>> >>>> >>>> Fix is in the nightly 1.8 tarball - I'll release a 1.8.4 soon to cover >>>> the problem. >>>> >>>>> ** >>>>> >>>>> Thanks Ralph for sending the new (OMPI 1.8) >>>>> parameter names for process binding. >>>>> >>>>> My recollection is that sometime ago somebody (Jeff perhaps?) >>>>> posted here a link to a presentation (PDF or PPT) explaining the >>>>> new style of process binding, but I couldn't find it in the >>>>> list archives. >>>>> Maybe the link could be part of the FAQ (if not already there)? >>>> >>>> I don't think it is, but I'll try to add it over the next day or so. >>>> >>>>> >>>>> ** >>>>> >>>>> The Open MPI runtime environment is really great. >>>>> However, to take advantage of it one often has to do parameter guessing, >>>>> and to do time consuming tests by trial and error, >>>>> because the main sources of documentation are >>>>> the terse output of ompi_info, and several sparse >>>>> references in the FAQ. >>>>> (Some of them outdated?) >>>>> >>>>> In addition, the runtime environment has evolved over time, >>>>> which is certainly a good thing. >>>>> However, along with this evolution, several runtime parameters >>>>> changed both name and functionality, new ones were introduced, >>>>> old ones were deprecated, which can be somewhat confusing, >>>>> and can lead to an ineffective use of the runtime environment. >>>>> (In 1.8.3 I was using several deprecated parameters from 1.6.5 >>>>> that seem to be silently ignored at runtime. >>>>> I only noticed the problem because that segmentation fault happened.) >>>>> >>>>> I know asking for thorough documentation is foolish, >>>> >>>> Not really - it is something we need to get better about :-( >>>> >>>>> but I guess a simple table of runtime parameter names and valid values >>>>> in the FAQ, maybe sorted by their purpose/function, along with a few >>>>> examples of use, could help a lot. >>>>> Some of this material is now spread across several FAQ, but not so >>>>> easy to find/compare. >>>>> That doesn't need to be a comprehensive table, but commonly used >>>>> items like selecting the btl, selecting interfaces, >>>>> dealing with process binding, >>>>> modifying/enriching the stdout/sterr output >>>>> (tagging output, increasing verbosity, etc), >>>>> probably have their place there. >>>> >>>> Yeah, we fell down on this one. The changes were announced with each >>>> step in the 1.7 series, but if you step from 1.6 directly to 1.8, you'll >>>> get caught flat-footed. We honestly didn't think of that case, and so we >>>> mentally assumed that "of course people have been following the series - >>>> they know what happened". >>>> >>>> You know what they say about those who "assume" :-/ >>>> >>>> I'll try to get something into the FAQ about the entire new mapping, >>>> ranking, and binding system. It is actually VERY powerful, allowing you >>>> to specify pretty much any placement pattern you can imagine and bind it >>>> to whatever level you desire. It was developed in response to requests >>>> from researchers who wanted to explore application performance versus >>>> placement strategies - but we provided some simplified options to >>>> support more common usage patterns. >>>> >>>> >>>>> >>>>> >>>>> Many thanks, >>>>> Gus Correa >>>>> >>>>> >>>>> On 10/15/2014 11:12 AM, Jeff Squyres (jsquyres) wrote: >>>>>> We talked off-list -- fixed this on master and just filed >>>>>> https://github.com/open-mpi/ompi-release/pull/33 to get this into the >>>>>> v1.8 branch. >>>>>> >>>>>> >>>>>> On Oct 14, 2014, at 7:39 PM, Ralph Castain <r...@open-mpi.org >>>>>> <mailto:r...@open-mpi.org> >>>>>> <mailto:r...@open-mpi.org>> wrote: >>>>>> >>>>>>> >>>>>>> On Oct 14, 2014, at 5:32 PM, Gus Correa <g...@ldeo.columbia.edu >>>>>>> <mailto:g...@ldeo.columbia.edu> >>>>>>> <mailto:g...@ldeo.columbia.edu>> wrote: >>>>>>> >>>>>>>> Dear Open MPI fans and experts >>>>>>>> >>>>>>>> This is just a note in case other people run into the same problem. >>>>>>>> >>>>>>>> I just built Open MPI 1.8.3. >>>>>>>> As usual I put my old settings on openmpi-mca-params.conf, >>>>>>>> with no further thinking. >>>>>>>> Then I compiled the connectivity_c.c program and tried >>>>>>>> to run it with mpiexec. >>>>>>>> That is a routine that never failed before. >>>>>>>> >>>>>>>> Bummer! >>>>>>>> I've got a segmentation fault right away. >>>>>>> >>>>>>> Strange - it works fine from the cmd line: >>>>>>> >>>>>>> 07:27:04 (v1.8) /home/common/openmpi/ompi-release$ mpirun -n 1 -mca >>>>>>> rmaps_base_schedule_policy core hostname >>>>>>> -------------------------------------------------------------------------- >>>>>>> A deprecated MCA variable value was specified in the environment or >>>>>>> on the command line. Deprecated MCA variables should be avoided; >>>>>>> they may disappear in future releases. >>>>>>> >>>>>>> Deprecated variable: rmaps_base_schedule_policy >>>>>>> New variable: rmaps_base_mapping_policy >>>>>>> -------------------------------------------------------------------------- >>>>>>> bend001 >>>>>>> >>>>>>> HOWEVER, I can replicate that behavior when it is in the default >>>>>>> params file! I don't see the immediate cause of the difference, but >>>>>>> will investigate. >>>>>>> >>>>>>>> >>>>>>>> After some head scratching, checking my environment, etc, >>>>>>>> I thought I might have configured OMPI incorrectly. >>>>>>>> Hence, I tried to get information from ompi_info. >>>>>>>> Oh well, ompi_info also segfaulted! >>>>>>>> >>>>>>>> It took me a while to realize that the runtime parameter >>>>>>>> configuration file was the culprit. >>>>>>>> >>>>>>>> When I inserted the runtime parameter settings one by one, >>>>>>>> the segfault came with this one: >>>>>>>> >>>>>>>> rmaps_base_schedule_policy = core >>>>>>>> >>>>>>>> Ompi_info (when I got it to work) told me that the parameter above >>>>>>>> is now a deprecated synonym of: >>>>>>>> >>>>>>>> rmaps_base_mapping_policy = core >>>>>>>> >>>>>>>> In any case, the old synonym doesn't work and makes ompi_info and >>>>>>>> mpiexec segfault (and I'd guess anything else that requires the >>>>>>>> OMPI runtime components). >>>>>>>> Only the new parameter name works. >>>>>>> >>>>>>> That's because the segfault is happening in the printing of the >>>>>>> deprecation warning. >>>>>>> >>>>>>>> >>>>>>>> *** >>>>>>>> >>>>>>>> I am also missing in the ompi_info output the following >>>>>>>> (OMPI 1.6.5) parameters (not reported by ompi_info --all --all): >>>>>>>> >>>>>>> >>>>>>> 1) orte_process_binding ===> hwloc_base_binding_policy >>>>>>> >>>>>>> 2) orte_report_bindings ===> hwloc_base_report_bindings >>>>>>> >>>>>>> 3) opal_paffinity_alone ===> gone, use >>>>>>> hwloc_base_binding_policy=none if you don't want any binding >>>>>>> >>>>>>>> >>>>>>>> Are they gone forever? >>>>>>>> >>>>>>>> Are there replacements for them, with approximately the same >>>>>>>> functionality? >>>>>>>> >>>>>>>> Is there a list comparing the new (1.8) vs. old (1.6) >>>>>>>> OMPI runtime parameters, and/or any additional documentation >>>>>>>> about the new style of OMPI 1.8 runtime parameters? >>>>>>> >>>>>>> Will try to add this to the web site >>>>>>> >>>>>>>> >>>>>>>> Since there seems to have been a major revamping of the OMPI >>>>>>>> runtime parameters, that would be a great help. >>>>>>>> >>>>>>>> Thank you, >>>>>>>> Gus Correa >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>>>>> <mailto:us...@open-mpi.org> >>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> Link to this post: >>>>>>>> http://www.open-mpi.org/community/lists/users/2014/10/25497.php >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>>>> <mailto:us...@open-mpi.org> >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/users/2014/10/25498.php >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>> <mailto:us...@open-mpi.org> >>>>> Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>> Link to this >>>>> post:http://www.open-mpi.org/community/lists/users/2014/10/25501.php >>>>> <http://www.open-mpi.org/community/lists/users/2014/10/25501.php> >>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2014/10/25503.php >>>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/10/25508.php >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/10/25526.php >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25531.php