The idea here is that the dynamic rules are defined by an entire set of
parameters, and that we want a quick way to allow OMPI to ignore them all.
If we follow your suggestion and remove coll_tuned_use_dynamic_rules, then
turning on/off dynamic rules involves a lot of changes into the MCA file
Damien,
As Gilles indicated an example would be great. Meanwhile, as you already
have access to the root cause with a debugger, can you check what branch of
the if regarding the communicator type in the
ompi_coll_base_retain_datatypes_w function is taken. What is the
communicator type ? Intra or
No, the PML UCX in OMPI is just a shim layer of translation from our API
into UCX API. The selection of the communication protocol you are
interested in is deep inside the UCX code. You will need to talk with UCX
developers about that.
George.
On Tue, Oct 19, 2021 at 10:57 AM Masoud
Masoud,
The protocol selection and implementation in OMPI is only available for the
PML OB1, other PMLs make their own internal selection that is usually
maintained in some other code base.
For OB1, the selection starts in ompi/mca/pml/ob1/pml_ob1_sendreq.c in the
function
uestion is which one is broken (and then we’ll have
> to figure out how to fix…).
>
>
>
> Brian
>
>
>
> On 9/28/21, 7:11 PM, "devel on behalf of George Bosilca via devel" <
> devel-boun...@lists.open-mpi.org on behalf of devel@lists.open-mpi.org>
>
Based on my high-level understanding of the code path and according to the
UCX implementation of the flush, the required level of completion is local.
George.
On Tue, Sep 28, 2021 at 19:26 Zhang, Wei via devel
wrote:
> Dear All,
>
>
>
> I have a question regarding the completion semantics
Larry,
There is no simple answer to your question as it depends on many, software
and hardware, factors. A user selectable PML (our high level messaging
layer) component will decide what protocol to be used to move the data
around using what hardware. At this level you have a choice between OB1
Gabriel,
Awesome, good luck.
I have no idea which are or are not necessary for a proper functioning
daemon. To me all of the ones you have here seem critical. Ralph would be a
better source of information regarding the daemons' requirements.
Thanks,
George.
On Tue, Mar 9, 2021 at 10:25 AM
Gabriel,
You should be able to. Here are at least 2 different ways of doing this.
1. Purely MPI. Start singletons (or smaller groups), and connect via
sockets using MPI_Comm_join. You can setup your own DNS-like service, with
the goal of having the independent MPI jobs leave a trace there, such
Luis,
With some low frequency we remove warnings from the code. In this
particular instance the meaning of the code is correct, the ompi_info_t
structure starts with an opal_info_t, but removing the warnings is good
policy.
In general we can either cast the ompi_info_t pointer directly to an
John,
The common denominator across all these errors is an error from connect
while trying to connect to 10.71.2.58 on port 1024. Who is 10.71.2.58 ? If
the firewall open ? Is the port 1024 allowed to connect to ?
George.
On Mon, May 4, 2020 at 11:36 AM John DelSignore via devel <
Deprecate, warn and convert seems reasonable. But for how long ?
As the number of automatic conversions OMPI supports has shown a tendency
to increase, and as these conversions happen all over the code base, we
might want to setup a well defined path to deprecation, what and when has
been
All the collective decisions are done on the first collective on each
communicator. So basically you can change the MCA or pvar before the first
collective in a communicator to affect how the decision selection is made.
I have posted few examples over the years on the mailing list.
George.
On
Bradley,
You call then through a blocking MPI function, the operation is therefore
completed by the time you return from the MPI call. So, short story you
should be safe calling the dost_graph_create in a loop.
The segfault indicates a memory issue with some of the internals of the
treematch. Do
Will,
The 7134 issue is complex in its interactions with the rest of the TCP BTL,
and I could not find the time to look at it careful enough (or test it on
AWS). But maybe you can address my main concern here. #7134 interfaces
selection will have an impact on the traffic distribution among the
Ralph,
I think the first use is still pending reviews (more precisely my review)
at https://github.com/open-mpi/ompi/pull/7134.
George.
On Wed, Jan 1, 2020 at 9:53 PM Ralph Castain via devel <
devel@lists.open-mpi.org> wrote:
> Hey folks
>
> I can't find where the opal/reachable framework
> did after over 3 years running the code without interruption. I doubt
> anyone had ever run the code for such a long sample interval. We found out
> because we missed recording an important earthquake a week after the race
> condition was tripped. Murphy's law triumphs again. :)
>
&
If the issue was some kind of memory consistently between threads, then
printing that variable in the context of the debugger would show the value
of debugger_event_active being false.
volatile is not a memory barrier, it simply forces a load for each access
of the data, allowing us to weakly
I don't think there is a need any protection around that variable. It will
change value only once (in a callback triggered from opal_progress), and
the volatile guarantees that loads will be issued for every access, so the
waiting thread will eventually notice the change.
George.
On Tue, Nov
I think we can remove the header, we don't use it anymore. I commented on
the issue.
George.
On Thu, Sep 12, 2019 at 5:23 PM Geoffrey Paulsen via devel <
devel@lists.open-mpi.org> wrote:
> Does anyone have any thoughts about the cache-alignment issue in osc/sm,
> reported in
20 matches
Mail list logo