Re: [OMPI devel] Coll/tuned: Is coll_tuned_use_dynamic_rules MCA parameter still usefull?

2022-08-04 Thread George Bosilca via devel
The idea here is that the dynamic rules are defined by an entire set of parameters, and that we want a quick way to allow OMPI to ignore them all. If we follow your suggestion and remove coll_tuned_use_dynamic_rules, then turning on/off dynamic rules involves a lot of changes into the MCA file (ins

Re: [OMPI devel] Possible Bug / Invalid Read in Ialltoallw

2022-05-04 Thread George Bosilca via devel
Damien, As Gilles indicated an example would be great. Meanwhile, as you already have access to the root cause with a debugger, can you check what branch of the if regarding the communicator type in the ompi_coll_base_retain_datatypes_w function is taken. What is the communicator type ? Intra or i

Re: [OMPI devel] Eager and Rendezvous Implementation

2021-10-19 Thread George Bosilca via devel
Hemmatpour wrote: > > Hi George, > > Thank you very much for your reply. I use UCX for the communication. Is it > somewhere in pml_ucx.c? > > Thanks, > > > > On Tue, Oct 19, 2021 at 4:41 PM George Bosilca > wrote: > >> Masoud, >> >> The protocol s

Re: [OMPI devel] Eager and Rendezvous Implementation

2021-10-19 Thread George Bosilca via devel
Masoud, The protocol selection and implementation in OMPI is only available for the PML OB1, other PMLs make their own internal selection that is usually maintained in some other code base. For OB1, the selection starts in ompi/mca/pml/ob1/pml_ob1_sendreq.c in the function mca_pml_ob1_send_reques

Re: [OMPI devel] Question regarding the completion of btl_flush

2021-09-29 Thread George Bosilca via devel
the question is which one is broken (and then we’ll have > to figure out how to fix…). > > > > Brian > > > > On 9/28/21, 7:11 PM, "devel on behalf of George Bosilca via devel" < > devel-boun...@lists.open-mpi.org on behalf of devel@lists.open-mpi.org>

Re: [OMPI devel] Question regarding the completion of btl_flush

2021-09-28 Thread George Bosilca via devel
Based on my high-level understanding of the code path and according to the UCX implementation of the flush, the required level of completion is local. George. On Tue, Sep 28, 2021 at 19:26 Zhang, Wei via devel wrote: > Dear All, > > > > I have a question regarding the completion semantics of

Re: [OMPI devel] [EXTERNAL] RE: How to display device selection or routing info

2021-08-20 Thread George Bosilca via devel
Larry, There is no simple answer to your question as it depends on many, software and hardware, factors. A user selectable PML (our high level messaging layer) component will decide what protocol to be used to move the data around using what hardware. At this level you have a choice between OB1 (w

Re: [OMPI devel] mpirun alternative

2021-03-09 Thread George Bosilca via devel
egex > "ip-[2:10]-0-16-120,[2:10].0.35.43,[2:10].0.35.42@0(3)" -mca orte_hnp_uri > "2752512000.0;tcp://10.0.16.120:44789" -mca plm "rsh" --tree-spawn -mca > routed "radix" -mca orte_parent_uri "2752512000.0;tcp://10.0.16.120:44789" > -mca

Re: [OMPI devel] mpirun alternative

2021-03-05 Thread George Bosilca via devel
Gabriel, You should be able to. Here are at least 2 different ways of doing this. 1. Purely MPI. Start singletons (or smaller groups), and connect via sockets using MPI_Comm_join. You can setup your own DNS-like service, with the goal of having the independent MPI jobs leave a trace there, such t

Re: [OMPI devel] Warning

2020-05-15 Thread George Bosilca via devel
Luis, With some low frequency we remove warnings from the code. In this particular instance the meaning of the code is correct, the ompi_info_t structure starts with an opal_info_t, but removing the warnings is good policy. In general we can either cast the ompi_info_t pointer directly to an opal

Re: [OMPI devel] OMPI master fatal error in pml_ob1_sendreq.c

2020-05-04 Thread George Bosilca via devel
John, The common denominator across all these errors is an error from connect while trying to connect to 10.71.2.58 on port 1024. Who is 10.71.2.58 ? If the firewall open ? Is the port 1024 allowed to connect to ? George. On Mon, May 4, 2020 at 11:36 AM John DelSignore via devel < devel@lists

Re: [OMPI devel] MPI_Info args to spawn - resolving deprecated values?

2020-04-08 Thread George Bosilca via devel
Deprecate, warn and convert seems reasonable. But for how long ? As the number of automatic conversions OMPI supports has shown a tendency to increase, and as these conversions happen all over the code base, we might want to setup a well defined path to deprecation, what and when has been deprecat

Re: [OMPI devel] --mca coll choices

2020-04-07 Thread George Bosilca via devel
All the collective decisions are done on the first collective on each communicator. So basically you can change the MCA or pvar before the first collective in a communicator to affect how the decision selection is made. I have posted few examples over the years on the mailing list. George. On

Re: [OMPI devel] Dynamic topologies using MPI_Dist_graph_create

2020-04-06 Thread George Bosilca via devel
Bradley, You call then through a blocking MPI function, the operation is therefore completed by the time you return from the MPI call. So, short story you should be safe calling the dost_graph_create in a loop. The segfault indicates a memory issue with some of the internals of the treematch. Do

Re: [OMPI devel] Open MPI BTL TCP interface mapping

2020-01-09 Thread George Bosilca via devel
Will, The 7134 issue is complex in its interactions with the rest of the TCP BTL, and I could not find the time to look at it careful enough (or test it on AWS). But maybe you can address my main concern here. #7134 interfaces selection will have an impact on the traffic distribution among the dif

Re: [OMPI devel] Reachable framework integration

2020-01-02 Thread George Bosilca via devel
Ralph, I think the first use is still pending reviews (more precisely my review) at https://github.com/open-mpi/ompi/pull/7134. George. On Wed, Jan 1, 2020 at 9:53 PM Ralph Castain via devel < devel@lists.open-mpi.org> wrote: > Hey folks > > I can't find where the opal/reachable framework is

Re: [OMPI devel] Open MPI v4.0.1: Process is hanging inside MPI_Init() when debugged with TotalView

2019-11-12 Thread George Bosilca via devel
y case, it > did after over 3 years running the code without interruption. I doubt > anyone had ever run the code for such a long sample interval. We found out > because we missed recording an important earthquake a week after the race > condition was tripped. Murphy's law trium

Re: [OMPI devel] Open MPI v4.0.1: Process is hanging inside MPI_Init() when debugged with TotalView

2019-11-12 Thread George Bosilca via devel
If the issue was some kind of memory consistently between threads, then printing that variable in the context of the debugger would show the value of debugger_event_active being false. volatile is not a memory barrier, it simply forces a load for each access of the data, allowing us to weakly sync

Re: [OMPI devel] Open MPI v4.0.1: Process is hanging inside MPI_Init() when debugged with TotalView

2019-11-12 Thread George Bosilca via devel
I don't think there is a need any protection around that variable. It will change value only once (in a callback triggered from opal_progress), and the volatile guarantees that loads will be issued for every access, so the waiting thread will eventually notice the change. George. On Tue, Nov 12

Re: [OMPI devel] Anyone have any thoughts about cache-alignment issue in osc/sm?

2019-09-13 Thread George Bosilca via devel
I think we can remove the header, we don't use it anymore. I commented on the issue. George. On Thu, Sep 12, 2019 at 5:23 PM Geoffrey Paulsen via devel < devel@lists.open-mpi.org> wrote: > Does anyone have any thoughts about the cache-alignment issue in osc/sm, > reported in https://github.co

Re: [OMPI devel] Memory performance with Bcast

2019-03-21 Thread George Bosilca
gt; progression, or do I have to call the matching Bcast? >> >> Anyone from Mellanox here, who knows how HCOLL does this internally? >> Especially on the EDR architecture. Is there any hardware aid? >> >> Thanks! >> >> Marcin >> >> >> On 3/2

Re: [OMPI devel] Memory performance with Bcast

2019-03-20 Thread George Bosilca
If you have support for FCA then it might happen that the collective will use the hardware support. In any case, most of the bcast algorithms have a logarithmic behavior, so there will be at most O(log(P)) memory accesses on the root. If you want to take a look at the code in OMPI to understand wh

Re: [OMPI devel] Proposal: Github "stale" bot

2019-03-19 Thread George Bosilca
:+1: George. On Tue, Mar 19, 2019 at 12:45 PM Jeff Squyres (jsquyres) via devel < devel@lists.open-mpi.org> wrote: > I have proposed the use of the Github Probot "stale" bot: > > https://probot.github.io/apps/stale/ > https://github.com/open-mpi/ompi/pull/6495 > > The short version of

Re: [OMPI devel] Error in TCP BTL??

2018-10-01 Thread George Bosilca
https://github.com/open-mpi/ompi/pull/5819 will ease the pain. I couldn't figure out what exactly trigger this, but apparently recent versions of OSX refuse to bind with port 0. George. On Mon, Oct 1, 2018 at 4:12 PM Jeff Squyres (jsquyres) via devel < devel@lists.open-mpi.org> wrote: > I ge

Re: [OMPI devel] OFI issues on Open MPI v4.0.0rc1

2018-09-20 Thread George Bosilca
Sorry, I missed the 4.0 on the PR (despite being the first thing in the title). George. > On Sep 20, 2018, at 22:15 , Ralph H Castain wrote: > > That’s why we are leaving it in master - only removing it from release branch > > Sent from my iPhone > > On Sep 20, 201

Re: [OMPI devel] OFI issues on Open MPI v4.0.0rc1

2018-09-20 Thread George Bosilca
Why not simply ompi_ignore it ? Removing a component to bring it back later would force us to lose all history. I would a rather add an .ompi_ignore and give an opportunity to power users do continue playing with it. George. On Thu, Sep 20, 2018 at 8:04 PM Ralph H Castain wrote: > I already

Re: [OMPI devel] Network simulation from within OpenMPI

2018-09-17 Thread George Bosilca
Millian, The level of interposition you need depends on what exactly you are trying to simulate and at which granularity. If you want to simulate the different protocols (small, eager, PUT, GET, pipelining) supported by our default PML OB1, then you need to provide a BTL (with the exclusive flag a

Re: [OMPI devel] Collective communication algorithms

2018-03-26 Thread George Bosilca
Mikhail, Some of these algorithms have been left out due to practical purposes, they did not behave better than existing algorithms in any case. Some other (such as Traff's butterfly or double tree) because the implementation efforts shifted to other types of collective, or because there was a lac

Re: [OMPI devel] Default tag for OFI MTL

2018-03-04 Thread George Bosilca
is is just a > fallback for potential new ones. FI_DIRECTED_RECV is necessary to > discriminate the source at RX time when the source is not in the tag. > > c) I will include build_time_plan_B you just suggested ;) > > > > Thanks, again. > > > > _MAC > &g

Re: [OMPI devel] Default tag for OFI MTL

2018-03-03 Thread George Bosilca
Hi Matias, Relaxing the restriction on the number of ranks is definitively a good thing. The cost will be reflected on the number of communicators and tags, and we must be careful how we balance this. Assuming context_id is the communicator cid, with 10 bits you can only support 1024. A little lo

Re: [OMPI devel] OSC module change

2017-11-28 Thread George Bosilca
or count. > > I guess a different question would be what you need the communicator for. > It shouldn’t have any useful semantic meaning, so why isn’t a silent > implementation detail for the monitoring component? > > Brian > > > On Nov 28, 2017, at 8:45 AM, George Bosilca w

[OMPI devel] OSC module change

2017-11-28 Thread George Bosilca
Devels, We would like to change the definition of the OSC module to move the communicator one level up from the different module structures into the base OSC module. The reason for this, as well as a lengthy discussion on other possible solutions can be found in https://github.com/open-mpi/ompi/pu

Re: [OMPI devel] subcommunicator OpenMPI issues on K

2017-11-07 Thread George Bosilca
Samuel, You are right, we use qsort to sort the keys, but the qsort only applies on participants with the same color. So while the complexity of the qsort might reach bottom only when most of the processes participate with the same color. What I think is OMPI problem in this are is the selection

Re: [OMPI devel] Open MPI3.0

2017-10-22 Thread George Bosilca
in order to fix that. > > > fwiw, nightly tarballs for v3.0.x, v3.1.x and master are affected > > > Cheers, > > > Gilles > > > On 10/23/2017 5:47 AM, George Bosilca wrote: > >> Did we include by mistake the PMIX config header >> (opal/mca/pmix/pmix2

[OMPI devel] Open MPI3.0

2017-10-22 Thread George Bosilca
Did we include by mistake the PMIX config header (opal/mca/pmix/pmix2x/pmix/src/include/pmix_config.h) in the 3.0 release ? George. ___ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Jenkins nowhere land again

2017-10-03 Thread George Bosilca
We have an unused mac that we can add to the pool. I'll be more than happy to help set it up. George. On Tue, Oct 3, 2017 at 5:43 PM, Barrett, Brian via devel < devel@lists.open-mpi.org> wrote: > My MacOS box is back up and jobs are progressing again. The queue got kind > of long, so it migh

Re: [OMPI devel] Stale PRs

2017-08-31 Thread George Bosilca
Ralph, I updated the TCP-related pending PR. It offers a better solution that what we have today, unfortunately not perfect as it would require additions to the configure. Waiting for reviews. George. On Thu, Aug 31, 2017 at 10:12 AM, r...@open-mpi.org wrote: > Thanks to those who made a fi

[OMPI devel] PMIX visibility

2017-07-24 Thread George Bosilca
The last PMIX import broke the master on all platforms that support visibility. I have pushed a patch that solves __most__ of the issues (that I could find). I say most because there is a big left that require a significant change in PMIX design. This problem arise from the use of the pmix_setenv

[OMPI devel] orterun busted

2017-06-23 Thread George Bosilca
Ralph, I got consistent segfaults during the infrastructure tearing down in the orterun (I noticed them on a OSX). After digging a little bit it turns out that the opal_buffet_t class has been cleaned-up in orte_finalize before orte_proc_info_finalize is called, leading to calling the destructors

Re: [OMPI devel] Master warnings

2017-06-13 Thread George Bosilca
e9d533e62ecb should fix these warnings. They are harmless, as we cannot be reaching the context needed for them to have an impact because collectives communications with 0 bytes are trimmed out in the MPI layer. Thanks for reporting. George. On Tue, Jun 13, 2017 at 12:43 PM, r...@open-mpi.org

Re: [OMPI devel] ompi_info "developer warning"

2017-06-05 Thread George Bosilca
I do care a little as the default size for most terminal is still 80 chars. I would prefer your second choice where we replace "disabled" by "-" to losing information on the useful part of the message. George. On Mon, Jun 5, 2017 at 9:45 AM, wrote: > George, > > > > it seems today the limit i

Re: [OMPI devel] ompi_info "developer warning"

2017-06-05 Thread George Bosilca
So we are finally getting rid of the 80 chars per line limit? George. On Sun, Jun 4, 2017 at 11:23 PM, r...@open-mpi.org wrote: > Really? Sigh - frustrating. I’ll change itas it gets irritating to keep > get this warning. > > Frankly, I find I’m constantly doing --all because otherwise I ha

Re: [OMPI devel] about ompi_datatype_is_valid

2017-06-01 Thread George Bosilca
You have to pass it an allocated datatype, and it tells you if the pointer object is a valid MPI datatype for communications (aka it has a corresponding type with a well defined size, extent and alignment). There is no construct in C able to tell you if a random number if a valid C "object". Ge

Re: [OMPI devel] PMIX busted

2017-05-31 Thread George Bosilca
utogen? > > > > On May 31, 2017, at 7:02 AM, George Bosilca wrote: > > > > I have problems compiling the current master. Anyone else has similar > issues ? > > > > George. > > > > > > CC base/ptl_base_frame.lo > > In file included from

[OMPI devel] PMIX busted

2017-05-31 Thread George Bosilca
I have problems compiling the current master. Anyone else has similar issues ? George. CC base/ptl_base_frame.lo In file included from /Users/bosilca/unstable/ompi/trunk/ompi/opal/mca/pmix/pmix2x/pmix/src/threads/thread_usage.h:31:0, from /Users/bosilca/unstable/ompi/t

Re: [OMPI devel] NetPIPE performance curves

2017-05-09 Thread George Bosilca
Dave, I think I know the reason, or at least part of the reason, for these spikes. As an example, when we select between the different protocols to use to exchange the message between peers, we only use predefined lengths, and we completely disregard buffer alignment. I was planning to address th

Re: [OMPI devel] about MPI_ANY_SOURCE in MPI_Sendrecv_replace

2017-05-09 Thread George Bosilca
PR#3500 (https://github.com/open-mpi/ompi/pull/3500) should fix the problem. Is not optimal, but it is simple and works in all cases. George. On Tue, May 9, 2017 at 2:39 PM, George Bosilca wrote: > Please go ahead and open an issue, I will attach the PR once I have the > core re

Re: [OMPI devel] about MPI_ANY_SOURCE in MPI_Sendrecv_replace

2017-05-09 Thread George Bosilca
ed one? > > Dahai > > On Fri, May 5, 2017 at 1:27 PM, George Bosilca <mailto:bosi...@icl.utk.edu>> wrote: > Indeed, our current implementation of the MPI_Sendrecv_replace prohibits the > use of MPI_ANY_SOURCE. Will work a patch later today. > > George.

Re: [OMPI devel] Open MPI 3.x branch naming

2017-05-05 Thread George Bosilca
If we rebranch from master for every "major" release it makes sense to rename the branch. In the long term renaming seems like the way to go, and thus the pain of altering everything that depends on the naming will exist at some point. I'am in favor of doing it asap (but I have no stakes in the gam

Re: [OMPI devel] about MPI_ANY_SOURCE in MPI_Sendrecv_replace

2017-05-05 Thread George Bosilca
Indeed, our current implementation of the MPI_Sendrecv_replace prohibits the use of MPI_ANY_SOURCE. Will work a patch later today. George. On Fri, May 5, 2017 at 11:49 AM, Dahai Guo wrote: > The following code causes memory fault problem. The initial check shows > that it seemed caused by *o

Re: [OMPI devel] count = -1 for reduce

2017-05-05 Thread George Bosilca
main(int argc, char** argv) >> { >> int r[1], s[1]; >> MPI_Init(&argc,&argv); >> >> s[0] = 1; >> r[0] = -1; >> MPI_Reduce(s,r,0,MPI_INT,MPI_SUM,0,MPI_COMM_WORLD); >> printf("%d\n",r[0]); >> MPI_Red

Re: [OMPI devel] count = -1 for reduce

2017-05-04 Thread George Bosilca
figure --prefix=${installdir} \ > > --enable-orterun-prefix-by-default > > > > Dahai > > > > On Thu, May 4, 2017 at 4:45 PM, George Bosilca > wrote: > > Dahai, > > > > You are right the segfault is unexpected. I can't replicate this

Re: [OMPI devel] count = -1 for reduce

2017-05-04 Thread George Bosilca
Dahai, You are right the segfault is unexpected. I can't replicate this on my mac. What architecture are you seeing this issue ? How was your OMPI compiled ? Please post the output of ompi_info. Thanks, George. On Thu, May 4, 2017 at 5:42 PM, Dahai Guo wrote: > Those messages are what I lik

Re: [OMPI devel] NetPIPE performance curves

2017-05-03 Thread George Bosilca
ect line so it is more specific >> than "Re: Contents of devel digest..." >> >> >> Today's Topics: >> >>1. NetPIPE performance curves (Dave Turner) >>2. Re: NetPIPE performance curves (George Bosilca) >>3. remote spawn - hav

Re: [OMPI devel] NetPIPE performance curves

2017-05-02 Thread George Bosilca
David, Are you using the OB1 PML or one of our IB-enabled MTLs (UCX or MXM) ? I have access to similar cards, and I can't replicate your results. I do see a performance loss, but nowhere near what you have seen (it is going down to 47Gb instead of 50Gb). George. On Tue, May 2, 2017 at 4:40 PM,

Re: [OMPI devel] MPI_Type_Dup specification

2017-05-02 Thread George Bosilca
In my interpretation of the text you mention I have not considered the type name as an intrinsic property of the datatype (unlike ub, lb, size and many others). Thus, I took the freedom to alter it in a meaningful way for debugging purposes. This should not affect the users if they set the name, as

Re: [OMPI devel] TCP BTL's multi-link behavior

2017-04-26 Thread George Bosilca
I wouldn't put too much fate in my memory either. What I recall is that the multi-link code was written mainly for the PS4, where the supervisor would limit the bandwidth per socket to about 60% of the hardware capabilities. Thus, by using multiple links (in fact sockets between a set of peers) we

Re: [OMPI devel] Segfault during a free in reduce_scatter using basic component

2017-03-28 Thread George Bosilca
Emmanuel, I tried with both 2.x and master (they are only syntactically different with regard to reduce_scatter) and I can't reproduce your issue. I run the OSU test with the following command line: mpirun -n 97 --mca coll basic,libnbc,self --mca pml ob1 ./osu_reduce_scatter -m 524288: -i 1 -x 0

Re: [OMPI devel] PVAR definition - Variable type regarding the PVAR class

2017-03-22 Thread George Bosilca
Clement, I would continue to use the SIZE_T atomics but define the PVARs type accordingly (UNSIGNED_LONG or UNSIGNED_LONG_LONG depending on which of their size match the one from size_t). George. On Wed, Mar 22, 2017 at 6:45 AM, Clement FOYER wrote: > Hi everyone, > > I'm facing an issue wi

Re: [OMPI devel] Open MPI, ssh and limits

2017-03-03 Thread George Bosilca
Isn't this supposed to be part of cluster 101? I would rather add it to our faq, maybe in a slightly more generic way (not only focused towards 'ulimit - c'. Otherwise we will be bound to define what is forwarded and what is not, and potentially creates chaos for knowledgeable users (that know how

Re: [OMPI devel] Segfault on MPI init

2017-02-10 Thread George Bosilca
was cad4c03). > > FYI, Clément Foyer is talking with George Bosilca about this problem. > > > Cyril. > > Le 08/02/2017 à 16:46, Jeff Squyres (jsquyres) a écrit : >> What version of Open MPI are you running? >> >> The error is indicating that Open MPI is tr

Re: [OMPI devel] Disable progress engine

2017-01-24 Thread George Bosilca
If you do not explicitly enabled asynchronous progress (which btw is only supported by some of the BTLs), there will be little support for asynchronous progress in Open MPI. Indeed, as long as the library is not in an MPI call, all progress is stalled, no matching is done, and the only possible pro

Re: [OMPI devel] MPI_Bcast algorithm

2017-01-19 Thread George Bosilca
Enim, Before going into all these efforts to implement a new function, let me describe what is there already and that you can use to achieve something similar. It will not be exactly what you describe, because changing a particular collective algorithm dynamically on a communicator might is not as

Re: [OMPI devel] OMPI v1.10.6

2017-01-18 Thread George Bosilca
https://github.com/open-mpi/ompi/issues/2750 George. On Wed, Jan 18, 2017 at 12:57 PM, r...@open-mpi.org wrote: > Last call for v1.10.6 changes - we still have a few pending for review, > but none marked as critical. If you want them included, please push for a > review _now_ > > Thanks > R

Re: [OMPI devel] Wtime is 0.0

2016-12-21 Thread George Bosilca
Jan, You can use MPI_Type_match_size ( https://www.open-mpi.org/doc/v2.0/man3/MPI_Type_match_size.3.php) to find the MPI type that meets certain requirements. George. On Wed, Dec 21, 2016 at 3:16 AM, 🐋 Jan Hegewald wrote: > Hi Jeff et al, > > > On 20 Dec 2016, at 20:23, Jeff Squyres (jsquy

Re: [OMPI devel] LD_PRELOAD a C-coded shared object with a FORTRAN application

2016-12-12 Thread George Bosilca
Indeed this is the best solution. If you really want a clean portable solution, take a look at any of the files in ompi/mpi/fortran/mpif-h directory, to see how we define the 4 different versions of the Fortran interface. George. On Mon, Dec 12, 2016 at 10:42 AM, Clement FOYER wrote: > Thank

Re: [OMPI devel] Current progress threads status in Open MPI

2016-11-22 Thread George Bosilca
Christoph, This is work in progress. Right now, only the TCP BTL has an integrated progress thread, but we are working on a more general solution that will handle all BTLs (and possible some of the MTL). If you want more info, or want to volunteer for beta-testing, please ping me offline. Thanks,

Re: [OMPI devel] MPI_Win_lock semantic

2016-11-21 Thread George Bosilca
Gilles, I looked at the test and I think the current behavior is indeed correct. What matters for an exclusive lock is that all operations in an epoch (everything surrounded by lock/unlock) are atomically applied to the destination (and are not interleaved with other updates). As Nathan stated, MP

Re: [OMPI devel] MPI_Win_lock semantic

2016-11-21 Thread George Bosilca
Why is MPI_Win_flush required to ensure the lock is acquired ? According to the standard MPI_Win_flush "completes all outstanding RMA operations initiated by the calling process to the target rank on the specified window", which can be read as being a noop if no pending operations exists. George

Re: [OMPI devel] Removing from opal_hashtable while iterating over the elements

2016-11-18 Thread George Bosilca
Absolutely, if you keep the pointer to the previous or next element, it is safe to remove an element. If you are in the process of completely emptying the hashtable you can just keep removing the head element. George On Nov 18, 2016 6:51 AM, "Clement FOYER" wrote: > Hi everyone, > > I was wonde

Re: [OMPI devel] New Open MPI Community Bylaws to discuss

2016-10-12 Thread George Bosilca
Yes, my understanding is that unsystematic contributors will not have to sign the contributor agreement, but instead will have to provide a signed patch. George. On Wed, Oct 12, 2016 at 9:29 AM, Pavel Shamis wrote: > Does it mean that contributors don't have to sign contributor agreement ? >

Re: [OMPI devel] use of OBJ_NEW and related calls

2016-10-10 Thread George Bosilca
These macros are defined in opal/class/opal_object.h. We are using them all over the OMPI code base, including OPAL, ORTE, OSHMEM and OMPI. These calls are indeed somewhat similar to an OO language, the intent was to have a thread-safe way to refcount objects to keep them around for as long as they

Re: [OMPI devel] OMPI devel] RFC: Reenabling the TCP BTL over local interfaces (when specifically requested)

2016-09-23 Thread George Bosilca
RFC applied via 93fa94f9. On Fri, Sep 23, 2016 at 7:13 AM, George Bosilca wrote: > It turns out the OMPI behavior today was divergent from what is written in > the README. We already explicitly state that > > - If specified, the "btl_tcp_if_exclude" parameter must incl

Re: [OMPI devel] OMPI devel] RFC: Reenabling the TCP BTL over local interfaces (when specifically requested)

2016-09-23 Thread George Bosilca
t; OK then, > I recommend we explicitly state in the README that loopback interface can > no more be omitted from btl_tcp_if_exclude when running on multiple nodes > > Cheers, > > Gilles > > > On Thursday, September 22, 2016, George Bosilca > wrote: > >> T

Re: [OMPI devel] OMPI devel] RFC: Reenabling the TCP BTL over local interfaces (when specifically requested)

2016-09-22 Thread George Bosilca
self is excluded, the app will not start and it is trivial > to append to the error message a note asking to ensure btl/self was > not excluded. > in this case, i do not think we have a mechanism to issue a warning > message (e.g. "ensure lo is excluded") when hiccups occur.

Re: [OMPI devel] OMPI devel] RFC: Reenabling the TCP BTL over local interfaces (when specifically requested)

2016-09-21 Thread George Bosilca
alternative to --mca btl xxx > (for example --networks shm,infiniband. today Open MPI does not > provide any alternative to btl/self. also infiniband can be used via > btl/openib, mtl/mxm or libfabric, which makes it painful to > blacklist). i cannot remember the outcome of the discus

Re: [OMPI devel] RFC: Reenabling the TCP BTL over local interfaces (when specifically requested)

2016-09-21 Thread George Bosilca
On Wed, Sep 21, 2016 at 11:23 AM, Jeff Squyres (jsquyres) < jsquy...@cisco.com> wrote: > > I would have agreed with you if the current code was doing a better > decision of what is local and what not. But it is not, it simply remove all > 127.x.x.x interfaces (opal/util/net.c:222). Thus, the only

Re: [OMPI devel] OMPI devel] RFC: Reenabling the TCP BTL over local interfaces (when specifically requested)

2016-09-21 Thread George Bosilca
cp). my first impression is that i am not so comfortable > with that, and we could add yet an other MCA parameter so btl/openib > disqualifies itself for intra node communications. > > > Cheers, > > Gilles > > On Thu, Sep 22, 2016 at 12:56 AM, George Bosilca > wrote: > &g

Re: [OMPI devel] OMPI devel] RFC: Reenabling the TCP BTL over local interfaces (when specifically requested)

2016-09-21 Thread George Bosilca
or not excluded) for > inter node communication > > Cheers, > > Gilles > > "Jeff Squyres (jsquyres)" wrote: > >On Sep 21, 2016, at 10:56 AM, George Bosilca wrote: > >> > >> No, because 127.x.x.x is by default part of the exclude, so it will > ne

Re: [OMPI devel] RFC: Reenabling the TCP BTL over local interfaces (when specifically requested)

2016-09-21 Thread George Bosilca
code does, is preventing a power-user from using the loopback (despite being explicitly enabled via the corresponding MCA parameters). George. > > > > > On Sep 21, 2016, at 7:59 AM, George Bosilca wrote: > > > > The current code in the TCP BTL prevents local executi

[OMPI devel] RFC: Reenabling the TCP BTL over local interfaces (when specifically requested)

2016-09-21 Thread George Bosilca
The current code in the TCP BTL prevents local execution on a laptop not exposing a public IP address, by unconditionally disqualifying all interfaces with local addresses. This is not done based on MCA parameters but instead is done deep inside the IP matching logic, independent of what the user s

Re: [OMPI devel] Deadlock in sync_wait_mt(): Proposed patch

2016-09-21 Thread George Bosilca
Nice catch. Keeping the first check only works because the signaling field prevent us from releasing the condition too early. I added some comments around the code (131fe42d). George. On Wed, Sep 21, 2016 at 5:33 AM, Nathan Hjelm wrote: > Yeah, that looks like a bug to me. We need to keep the

Re: [OMPI devel] Sample of merging ompi and ompi-release

2016-09-19 Thread George Bosilca
:+1: George. On Mon, Sep 19, 2016 at 6:56 PM, Jeff Squyres (jsquyres) wrote: > (we can discuss all of this on the Webex tomorrow) > > Here's a sample repo where I merged ompi and ompi-release: > > https://github.com/open-mpi/ompi-all-the-branches > > Please compare it to: > > https:/

Re: [OMPI devel] [MPI_Tools] How to branch components

2016-09-08 Thread George Bosilca
On Thu, Sep 8, 2016 at 10:17 AM, Clément wrote: > Hi every one, > > I'm currently working on a monitoring component for OpenMPI. After > browsing the MPI standard, in order to know what the MPI_Tools interface > looks like, and how it's working, I had a look at the code I recieved and > something

Re: [OMPI devel] Hanging tests

2016-09-06 Thread George Bosilca
I can make MPI_Issend_rtoa deadlock with vader and sm. George. On Tue, Sep 6, 2016 at 12:06 PM, r...@open-mpi.org wrote: > FWIW: those tests hang for me with TCP (I don’t have openib on my > cluster). I’ll check it with your change as well > > > On Sep 6, 2016, at 1:29 AM, Gilles Gouaillarde

Re: [OMPI devel] Question about Open MPI bindings

2016-09-05 Thread George Bosilca
behavior you describe, then you simply tell ORTE to > “--map-by core --bind-to core” > > On Sep 5, 2016, at 11:05 AM, George Bosilca wrote: > > On Sat, Sep 3, 2016 at 10:34 AM, r...@open-mpi.org > wrote: > >> Interesting - well, it looks like ORTE is working correctly. The map

Re: [OMPI devel] OMPI devel] Question about Open MPI bindings

2016-09-05 Thread George Bosilca
to 1,9. > All of this matches what the OS reported. > > So it looks like it is report-bindings that is messed up for some reason. > > > On Sep 3, 2016, at 7:14 AM, George Bosilca wrote: > > $mpirun -np 3 --tag-output --bind-to core --report-bindings > --display-devel-map -

Re: [OMPI devel] Question about Open MPI bindings

2016-09-05 Thread George Bosilca
hion (such as rank 0 on core 0, rank 1 on core 1 and rank 2 on core 2) ? George. > > > On Sep 3, 2016, at 7:14 AM, George Bosilca wrote: > > $mpirun -np 3 --tag-output --bind-to core --report-bindings > --display-devel-map --mca rmaps_base_verbose 10 true > > [dancer.icl

Re: [OMPI devel] Question about Open MPI bindings

2016-09-03 Thread George Bosilca
arc00:07612] MCW rank 2 bound to socket 0[core 1[hwt 0]], > socket 0[core 9[hwt 0]]: [../B./../../../../../../../B. > ][../../../../../../../../../..] On Sat, Sep 3, 2016 at 9:44 AM, r...@open-mpi.org wrote: > Okay, can you add --display-devel-map --mca rmaps_base_verbose 10 to your > cmd li

Re: [OMPI devel] Question about Open MPI bindings

2016-09-02 Thread George Bosilca
v2 cannot be used in Open MPI yet) > > You might also want to try > mpirun --tag-output --bind-to xxx --report-bindings grep Cpus_allowed_list > /proc/self/status > > So you can confirm both openmpi and /proc/self/status report the same thing > > Hope this helps a bi

Re: [OMPI devel] Question about Open MPI bindings

2016-09-02 Thread George Bosilca
ore 11[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../../../..] > > > I have seen the segfault when something fails early in the setup procedure > - planned to fix that this weekend. > > > On Sep 2, 2016, at 9:09 PM, George Bosilca wrote: > >

[OMPI devel] Question about Open MPI bindings

2016-09-02 Thread George Bosilca
While investigating the ongoing issue with OMPI messaging layer, I run into some troubles with process binding. I read the documentation, but I still find this puzzling. Disclaimer: all experiments were done with current master (9c496f7) compiled in optimized mode. The hardware: a single node 20 c

Re: [OMPI devel] [2.0.1rc1] ppc64 atomics (still) broken w/ xlc-12.1

2016-08-27 Thread George Bosilca
v2.x: https://github.com/open-mpi/ompi-release/pull/1344 master: https://github.com/open-mpi/ompi/commit/a6d515b Thanks, George. On Sat, Aug 27, 2016 at 12:45 PM, George Bosilca wrote: > Paul, > > Sorry for the half-fix. I'll submit a patch and PRs to the releases asap

Re: [OMPI devel] [2.0.1rc1] ppc64 atomics (still) broken w/ xlc-12.1

2016-08-27 Thread George Bosilca
Paul, Sorry for the half-fix. I'll submit a patch and PRs to the releases asap. George. On Sat, Aug 27, 2016 at 4:14 AM, Paul Hargrove wrote: > I didn't get to test 2.0.1rc1 with xlc-12.1 until just now because I need > a CRYPTOCard for access (== not fully automated like my other tests). >

Re: [OMPI devel] Performance analysis proposal

2016-08-26 Thread George Bosilca
or jumping in so late. > > > > > > Honestly, there's no problem with making a repo in the open-mpi github > org. It's just as trivial to make one there as anywhere else. > > > > > > Let me know if you want one. > > > > > > > >

Re: [OMPI devel] Performance analysis proposal

2016-08-26 Thread George Bosilca
'll send > them later this week. > > Do you think of regular calls or per agreement? > > пятница, 26 августа 2016 г. пользователь George Bosilca написал: > >> We are serious about this. However, we not only have to define a set of >> meaningful tests (which we don

Re: [OMPI devel] Performance analysis proposal

2016-08-26 Thread George Bosilca
; tests and drop the ball here? > > пятница, 26 августа 2016 г. пользователь George Bosilca написал: > > Arm repo is a good location until we converge to a well-defined set of >> tests. >> >> George. >> >> >> On Thu, Aug 25, 2016 at 1:44 PM, Artem Polyakov &

Re: [OMPI devel] Performance analysis proposal

2016-08-25 Thread George Bosilca
Arm repo is a good location until we converge to a well-defined set of tests. George. On Thu, Aug 25, 2016 at 1:44 PM, Artem Polyakov wrote: > That's a good question. I have results myself and I don't know where to > place them. > I think that Arm's repo is not a right place to collect the d

Re: [OMPI devel] Coll/sync component missing???

2016-08-22 Thread George Bosilca
ration is a per-communicator boolean), so i do not see how > s->in_operation can be true in a valid MPI program. > > > Though the first point can be seen as a "matter of style", i am pretty > curious about the second one. > > > Cheers, > > > Gilles > >

Re: [OMPI devel] Coll/sync component missing???

2016-08-20 Thread George Bosilca
Ralph, Bringing back the coll/sync is a cheap shot at hiding a real issue behind a smoke curtain. As Nathan described in his email, Open MPI lacks of control flow on eager messages is the real culprit here, and the loop around any one-to-many collective (bcast and scatter*) was only helping to exa

  1   2   3   4   5   6   7   8   9   10   >