The idea here is that the dynamic rules are defined by an entire set of
parameters, and that we want a quick way to allow OMPI to ignore them all.
If we follow your suggestion and remove coll_tuned_use_dynamic_rules, then
turning on/off dynamic rules involves a lot of changes into the MCA file
(ins
Damien,
As Gilles indicated an example would be great. Meanwhile, as you already
have access to the root cause with a debugger, can you check what branch of
the if regarding the communicator type in the
ompi_coll_base_retain_datatypes_w function is taken. What is the
communicator type ? Intra or i
Hemmatpour
wrote:
>
> Hi George,
>
> Thank you very much for your reply. I use UCX for the communication. Is it
> somewhere in pml_ucx.c?
>
> Thanks,
>
>
>
> On Tue, Oct 19, 2021 at 4:41 PM George Bosilca
> wrote:
>
>> Masoud,
>>
>> The protocol s
Masoud,
The protocol selection and implementation in OMPI is only available for the
PML OB1, other PMLs make their own internal selection that is usually
maintained in some other code base.
For OB1, the selection starts in ompi/mca/pml/ob1/pml_ob1_sendreq.c in the
function mca_pml_ob1_send_reques
the question is which one is broken (and then we’ll have
> to figure out how to fix…).
>
>
>
> Brian
>
>
>
> On 9/28/21, 7:11 PM, "devel on behalf of George Bosilca via devel" <
> devel-boun...@lists.open-mpi.org on behalf of devel@lists.open-mpi.org>
Based on my high-level understanding of the code path and according to the
UCX implementation of the flush, the required level of completion is local.
George.
On Tue, Sep 28, 2021 at 19:26 Zhang, Wei via devel
wrote:
> Dear All,
>
>
>
> I have a question regarding the completion semantics of
Larry,
There is no simple answer to your question as it depends on many, software
and hardware, factors. A user selectable PML (our high level messaging
layer) component will decide what protocol to be used to move the data
around using what hardware. At this level you have a choice between OB1
(w
egex
> "ip-[2:10]-0-16-120,[2:10].0.35.43,[2:10].0.35.42@0(3)" -mca orte_hnp_uri
> "2752512000.0;tcp://10.0.16.120:44789" -mca plm "rsh" --tree-spawn -mca
> routed "radix" -mca orte_parent_uri "2752512000.0;tcp://10.0.16.120:44789"
> -mca
Gabriel,
You should be able to. Here are at least 2 different ways of doing this.
1. Purely MPI. Start singletons (or smaller groups), and connect via
sockets using MPI_Comm_join. You can setup your own DNS-like service, with
the goal of having the independent MPI jobs leave a trace there, such t
Luis,
With some low frequency we remove warnings from the code. In this
particular instance the meaning of the code is correct, the ompi_info_t
structure starts with an opal_info_t, but removing the warnings is good
policy.
In general we can either cast the ompi_info_t pointer directly to an
opal
John,
The common denominator across all these errors is an error from connect
while trying to connect to 10.71.2.58 on port 1024. Who is 10.71.2.58 ? If
the firewall open ? Is the port 1024 allowed to connect to ?
George.
On Mon, May 4, 2020 at 11:36 AM John DelSignore via devel <
devel@lists
Deprecate, warn and convert seems reasonable. But for how long ?
As the number of automatic conversions OMPI supports has shown a tendency
to increase, and as these conversions happen all over the code base, we
might want to setup a well defined path to deprecation, what and when has
been deprecat
All the collective decisions are done on the first collective on each
communicator. So basically you can change the MCA or pvar before the first
collective in a communicator to affect how the decision selection is made.
I have posted few examples over the years on the mailing list.
George.
On
Bradley,
You call then through a blocking MPI function, the operation is therefore
completed by the time you return from the MPI call. So, short story you
should be safe calling the dost_graph_create in a loop.
The segfault indicates a memory issue with some of the internals of the
treematch. Do
Will,
The 7134 issue is complex in its interactions with the rest of the TCP BTL,
and I could not find the time to look at it careful enough (or test it on
AWS). But maybe you can address my main concern here. #7134 interfaces
selection will have an impact on the traffic distribution among the
dif
Ralph,
I think the first use is still pending reviews (more precisely my review)
at https://github.com/open-mpi/ompi/pull/7134.
George.
On Wed, Jan 1, 2020 at 9:53 PM Ralph Castain via devel <
devel@lists.open-mpi.org> wrote:
> Hey folks
>
> I can't find where the opal/reachable framework is
y case, it
> did after over 3 years running the code without interruption. I doubt
> anyone had ever run the code for such a long sample interval. We found out
> because we missed recording an important earthquake a week after the race
> condition was tripped. Murphy's law trium
If the issue was some kind of memory consistently between threads, then
printing that variable in the context of the debugger would show the value
of debugger_event_active being false.
volatile is not a memory barrier, it simply forces a load for each access
of the data, allowing us to weakly sync
I don't think there is a need any protection around that variable. It will
change value only once (in a callback triggered from opal_progress), and
the volatile guarantees that loads will be issued for every access, so the
waiting thread will eventually notice the change.
George.
On Tue, Nov 12
I think we can remove the header, we don't use it anymore. I commented on
the issue.
George.
On Thu, Sep 12, 2019 at 5:23 PM Geoffrey Paulsen via devel <
devel@lists.open-mpi.org> wrote:
> Does anyone have any thoughts about the cache-alignment issue in osc/sm,
> reported in https://github.co
gt; progression, or do I have to call the matching Bcast?
>>
>> Anyone from Mellanox here, who knows how HCOLL does this internally?
>> Especially on the EDR architecture. Is there any hardware aid?
>>
>> Thanks!
>>
>> Marcin
>>
>>
>> On 3/2
If you have support for FCA then it might happen that the collective will
use the hardware support. In any case, most of the bcast algorithms have a
logarithmic behavior, so there will be at most O(log(P)) memory accesses on
the root.
If you want to take a look at the code in OMPI to understand wh
:+1:
George.
On Tue, Mar 19, 2019 at 12:45 PM Jeff Squyres (jsquyres) via devel <
devel@lists.open-mpi.org> wrote:
> I have proposed the use of the Github Probot "stale" bot:
>
> https://probot.github.io/apps/stale/
> https://github.com/open-mpi/ompi/pull/6495
>
> The short version of
https://github.com/open-mpi/ompi/pull/5819 will ease the pain. I couldn't
figure out what exactly trigger this, but apparently recent versions of OSX
refuse to bind with port 0.
George.
On Mon, Oct 1, 2018 at 4:12 PM Jeff Squyres (jsquyres) via devel <
devel@lists.open-mpi.org> wrote:
> I ge
Sorry, I missed the 4.0 on the PR (despite being the first thing in the title).
George.
> On Sep 20, 2018, at 22:15 , Ralph H Castain wrote:
>
> That’s why we are leaving it in master - only removing it from release branch
>
> Sent from my iPhone
>
> On Sep 20, 201
Why not simply ompi_ignore it ? Removing a component to bring it back later
would force us to lose all history. I would a rather add an .ompi_ignore
and give an opportunity to power users do continue playing with it.
George.
On Thu, Sep 20, 2018 at 8:04 PM Ralph H Castain wrote:
> I already
Millian,
The level of interposition you need depends on what exactly you are trying
to simulate and at which granularity. If you want to simulate the different
protocols (small, eager, PUT, GET, pipelining) supported by our default PML
OB1, then you need to provide a BTL (with the exclusive flag a
Mikhail,
Some of these algorithms have been left out due to practical purposes, they
did not behave better than existing algorithms in any case. Some other
(such as Traff's butterfly or double tree) because the implementation
efforts shifted to other types of collective, or because there was a lac
is is just a
> fallback for potential new ones. FI_DIRECTED_RECV is necessary to
> discriminate the source at RX time when the source is not in the tag.
>
> c) I will include build_time_plan_B you just suggested ;)
>
>
>
> Thanks, again.
>
>
>
> _MAC
>
&g
Hi Matias,
Relaxing the restriction on the number of ranks is definitively a good
thing. The cost will be reflected on the number of communicators and tags,
and we must be careful how we balance this.
Assuming context_id is the communicator cid, with 10 bits you can only
support 1024. A little lo
or count.
>
> I guess a different question would be what you need the communicator for.
> It shouldn’t have any useful semantic meaning, so why isn’t a silent
> implementation detail for the monitoring component?
>
> Brian
>
>
> On Nov 28, 2017, at 8:45 AM, George Bosilca w
Devels,
We would like to change the definition of the OSC module to move the
communicator one level up from the different module structures into the
base OSC module. The reason for this, as well as a lengthy discussion on
other possible solutions can be found in
https://github.com/open-mpi/ompi/pu
Samuel,
You are right, we use qsort to sort the keys, but the qsort only applies on
participants with the same color. So while the complexity of the qsort
might reach bottom only when most of the processes participate with the
same color.
What I think is OMPI problem in this are is the selection
in order to fix that.
>
>
> fwiw, nightly tarballs for v3.0.x, v3.1.x and master are affected
>
>
> Cheers,
>
>
> Gilles
>
>
> On 10/23/2017 5:47 AM, George Bosilca wrote:
>
>> Did we include by mistake the PMIX config header
>> (opal/mca/pmix/pmix2
Did we include by mistake the PMIX config header
(opal/mca/pmix/pmix2x/pmix/src/include/pmix_config.h) in the 3.0 release ?
George.
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel
We have an unused mac that we can add to the pool. I'll be more than happy
to help set it up.
George.
On Tue, Oct 3, 2017 at 5:43 PM, Barrett, Brian via devel <
devel@lists.open-mpi.org> wrote:
> My MacOS box is back up and jobs are progressing again. The queue got kind
> of long, so it migh
Ralph,
I updated the TCP-related pending PR. It offers a better solution that what
we have today, unfortunately not perfect as it would require additions to
the configure. Waiting for reviews.
George.
On Thu, Aug 31, 2017 at 10:12 AM, r...@open-mpi.org wrote:
> Thanks to those who made a fi
The last PMIX import broke the master on all platforms that support
visibility. I have pushed a patch that solves __most__ of the issues (that
I could find). I say most because there is a big left that require a
significant change in PMIX design.
This problem arise from the use of the pmix_setenv
Ralph,
I got consistent segfaults during the infrastructure tearing down in the
orterun (I noticed them on a OSX). After digging a little bit it turns out
that the opal_buffet_t class has been cleaned-up in orte_finalize before
orte_proc_info_finalize is called, leading to calling the destructors
e9d533e62ecb should fix these warnings. They are harmless, as we cannot be
reaching the context needed for them to have an impact because collectives
communications with 0 bytes are trimmed out in the MPI layer.
Thanks for reporting.
George.
On Tue, Jun 13, 2017 at 12:43 PM, r...@open-mpi.org
I do care a little as the default size for most terminal is still 80 chars.
I would prefer your second choice where we replace "disabled" by "-" to
losing information on the useful part of the message.
George.
On Mon, Jun 5, 2017 at 9:45 AM, wrote:
> George,
>
>
>
> it seems today the limit i
So we are finally getting rid of the 80 chars per line limit?
George.
On Sun, Jun 4, 2017 at 11:23 PM, r...@open-mpi.org wrote:
> Really? Sigh - frustrating. I’ll change itas it gets irritating to keep
> get this warning.
>
> Frankly, I find I’m constantly doing --all because otherwise I ha
You have to pass it an allocated datatype, and it tells you if the pointer
object is a valid MPI datatype for communications (aka it has a
corresponding type with a well defined size, extent and alignment).
There is no construct in C able to tell you if a random number if a valid C
"object".
Ge
utogen?
>
>
> > On May 31, 2017, at 7:02 AM, George Bosilca wrote:
> >
> > I have problems compiling the current master. Anyone else has similar
> issues ?
> >
> > George.
> >
> >
> > CC base/ptl_base_frame.lo
> > In file included from
I have problems compiling the current master. Anyone else has similar
issues ?
George.
CC base/ptl_base_frame.lo
In file included from
/Users/bosilca/unstable/ompi/trunk/ompi/opal/mca/pmix/pmix2x/pmix/src/threads/thread_usage.h:31:0,
from
/Users/bosilca/unstable/ompi/t
Dave,
I think I know the reason, or at least part of the reason, for these
spikes. As an example, when we select between the different protocols to
use to exchange the message between peers, we only use predefined lengths,
and we completely disregard buffer alignment.
I was planning to address th
PR#3500 (https://github.com/open-mpi/ompi/pull/3500) should fix the
problem. Is not optimal, but it is simple and works in all cases.
George.
On Tue, May 9, 2017 at 2:39 PM, George Bosilca wrote:
> Please go ahead and open an issue, I will attach the PR once I have the
> core re
ed one?
>
> Dahai
>
> On Fri, May 5, 2017 at 1:27 PM, George Bosilca <mailto:bosi...@icl.utk.edu>> wrote:
> Indeed, our current implementation of the MPI_Sendrecv_replace prohibits the
> use of MPI_ANY_SOURCE. Will work a patch later today.
>
> George.
If we rebranch from master for every "major" release it makes sense to
rename the branch. In the long term renaming seems like the way to go, and
thus the pain of altering everything that depends on the naming will exist
at some point. I'am in favor of doing it asap (but I have no stakes in the
gam
Indeed, our current implementation of the MPI_Sendrecv_replace prohibits
the use of MPI_ANY_SOURCE. Will work a patch later today.
George.
On Fri, May 5, 2017 at 11:49 AM, Dahai Guo wrote:
> The following code causes memory fault problem. The initial check shows
> that it seemed caused by *o
main(int argc, char** argv)
>> {
>> int r[1], s[1];
>> MPI_Init(&argc,&argv);
>>
>> s[0] = 1;
>> r[0] = -1;
>> MPI_Reduce(s,r,0,MPI_INT,MPI_SUM,0,MPI_COMM_WORLD);
>> printf("%d\n",r[0]);
>> MPI_Red
figure --prefix=${installdir} \
> > --enable-orterun-prefix-by-default
> >
> > Dahai
> >
> > On Thu, May 4, 2017 at 4:45 PM, George Bosilca
> wrote:
> > Dahai,
> >
> > You are right the segfault is unexpected. I can't replicate this
Dahai,
You are right the segfault is unexpected. I can't replicate this on my mac.
What architecture are you seeing this issue ? How was your OMPI compiled ?
Please post the output of ompi_info.
Thanks,
George.
On Thu, May 4, 2017 at 5:42 PM, Dahai Guo wrote:
> Those messages are what I lik
ect line so it is more specific
>> than "Re: Contents of devel digest..."
>>
>>
>> Today's Topics:
>>
>>1. NetPIPE performance curves (Dave Turner)
>>2. Re: NetPIPE performance curves (George Bosilca)
>>3. remote spawn - hav
David,
Are you using the OB1 PML or one of our IB-enabled MTLs (UCX or MXM) ? I
have access to similar cards, and I can't replicate your results. I do see
a performance loss, but nowhere near what you have seen (it is going down
to 47Gb instead of 50Gb).
George.
On Tue, May 2, 2017 at 4:40 PM,
In my interpretation of the text you mention I have not considered the type
name as an intrinsic property of the datatype (unlike ub, lb, size and many
others). Thus, I took the freedom to alter it in a meaningful way for
debugging purposes. This should not affect the users if they set the name,
as
I wouldn't put too much fate in my memory either. What I recall is that the
multi-link code was written mainly for the PS4, where the supervisor would
limit the bandwidth per socket to about 60% of the hardware capabilities.
Thus, by using multiple links (in fact sockets between a set of peers) we
Emmanuel,
I tried with both 2.x and master (they are only syntactically different
with regard to reduce_scatter) and I can't reproduce your issue. I run the
OSU test with the following command line:
mpirun -n 97 --mca coll basic,libnbc,self --mca pml ob1
./osu_reduce_scatter -m 524288: -i 1 -x 0
Clement,
I would continue to use the SIZE_T atomics but define the PVARs type
accordingly (UNSIGNED_LONG or UNSIGNED_LONG_LONG depending on which of
their size match the one from size_t).
George.
On Wed, Mar 22, 2017 at 6:45 AM, Clement FOYER
wrote:
> Hi everyone,
>
> I'm facing an issue wi
Isn't this supposed to be part of cluster 101?
I would rather add it to our faq, maybe in a slightly more generic way (not
only focused towards 'ulimit - c'. Otherwise we will be bound to define
what is forwarded and what is not, and potentially creates chaos for
knowledgeable users (that know how
was cad4c03).
>
> FYI, Clément Foyer is talking with George Bosilca about this problem.
>
>
> Cyril.
>
> Le 08/02/2017 à 16:46, Jeff Squyres (jsquyres) a écrit :
>> What version of Open MPI are you running?
>>
>> The error is indicating that Open MPI is tr
If you do not explicitly enabled asynchronous progress (which btw is only
supported by some of the BTLs), there will be little support for
asynchronous progress in Open MPI. Indeed, as long as the library is not in
an MPI call, all progress is stalled, no matching is done, and the only
possible pro
Enim,
Before going into all these efforts to implement a new function, let me
describe what is there already and that you can use to achieve something
similar. It will not be exactly what you describe, because changing a
particular collective algorithm dynamically on a communicator might is not
as
https://github.com/open-mpi/ompi/issues/2750
George.
On Wed, Jan 18, 2017 at 12:57 PM, r...@open-mpi.org wrote:
> Last call for v1.10.6 changes - we still have a few pending for review,
> but none marked as critical. If you want them included, please push for a
> review _now_
>
> Thanks
> R
Jan,
You can use MPI_Type_match_size (
https://www.open-mpi.org/doc/v2.0/man3/MPI_Type_match_size.3.php) to find
the MPI type that meets certain requirements.
George.
On Wed, Dec 21, 2016 at 3:16 AM, 🐋 Jan Hegewald
wrote:
> Hi Jeff et al,
>
> > On 20 Dec 2016, at 20:23, Jeff Squyres (jsquy
Indeed this is the best solution. If you really want a clean portable
solution, take a look at any of the files in ompi/mpi/fortran/mpif-h
directory, to see how we define the 4 different versions of the Fortran
interface.
George.
On Mon, Dec 12, 2016 at 10:42 AM, Clement FOYER
wrote:
> Thank
Christoph,
This is work in progress. Right now, only the TCP BTL has an integrated
progress thread, but we are working on a more general solution that will
handle all BTLs (and possible some of the MTL). If you want more info, or
want to volunteer for beta-testing, please ping me offline.
Thanks,
Gilles,
I looked at the test and I think the current behavior is indeed correct.
What matters for an exclusive lock is that all operations in an epoch
(everything surrounded by lock/unlock) are atomically applied to the
destination (and are not interleaved with other updates). As Nathan stated,
MP
Why is MPI_Win_flush required to ensure the lock is acquired ? According to
the standard MPI_Win_flush "completes all outstanding RMA operations
initiated by the calling process to the target rank on the specified
window", which can be read as being a noop if no pending operations exists.
George
Absolutely, if you keep the pointer to the previous or next element, it is
safe to remove an element. If you are in the process of completely emptying
the hashtable you can just keep removing the head element.
George
On Nov 18, 2016 6:51 AM, "Clement FOYER" wrote:
> Hi everyone,
>
> I was wonde
Yes, my understanding is that unsystematic contributors will not have to
sign the contributor agreement, but instead will have to provide a signed
patch.
George.
On Wed, Oct 12, 2016 at 9:29 AM, Pavel Shamis
wrote:
> Does it mean that contributors don't have to sign contributor agreement ?
>
These macros are defined in opal/class/opal_object.h. We are using them all
over the OMPI code base, including OPAL, ORTE, OSHMEM and OMPI. These calls
are indeed somewhat similar to an OO language, the intent was to have a
thread-safe way to refcount objects to keep them around for as long as they
RFC applied via 93fa94f9.
On Fri, Sep 23, 2016 at 7:13 AM, George Bosilca wrote:
> It turns out the OMPI behavior today was divergent from what is written in
> the README. We already explicitly state that
>
> - If specified, the "btl_tcp_if_exclude" parameter must incl
t; OK then,
> I recommend we explicitly state in the README that loopback interface can
> no more be omitted from btl_tcp_if_exclude when running on multiple nodes
>
> Cheers,
>
> Gilles
>
>
> On Thursday, September 22, 2016, George Bosilca
> wrote:
>
>> T
self is excluded, the app will not start and it is trivial
> to append to the error message a note asking to ensure btl/self was
> not excluded.
> in this case, i do not think we have a mechanism to issue a warning
> message (e.g. "ensure lo is excluded") when hiccups occur.
alternative to --mca btl xxx
> (for example --networks shm,infiniband. today Open MPI does not
> provide any alternative to btl/self. also infiniband can be used via
> btl/openib, mtl/mxm or libfabric, which makes it painful to
> blacklist). i cannot remember the outcome of the discus
On Wed, Sep 21, 2016 at 11:23 AM, Jeff Squyres (jsquyres) <
jsquy...@cisco.com> wrote:
> > I would have agreed with you if the current code was doing a better
> decision of what is local and what not. But it is not, it simply remove all
> 127.x.x.x interfaces (opal/util/net.c:222). Thus, the only
cp). my first impression is that i am not so comfortable
> with that, and we could add yet an other MCA parameter so btl/openib
> disqualifies itself for intra node communications.
>
>
> Cheers,
>
> Gilles
>
> On Thu, Sep 22, 2016 at 12:56 AM, George Bosilca
> wrote:
> &g
or not excluded) for
> inter node communication
>
> Cheers,
>
> Gilles
>
> "Jeff Squyres (jsquyres)" wrote:
> >On Sep 21, 2016, at 10:56 AM, George Bosilca wrote:
> >>
> >> No, because 127.x.x.x is by default part of the exclude, so it will
> ne
code does, is preventing a power-user from using the loopback
(despite being explicitly enabled via the corresponding MCA parameters).
George.
>
>
>
> > On Sep 21, 2016, at 7:59 AM, George Bosilca wrote:
> >
> > The current code in the TCP BTL prevents local executi
The current code in the TCP BTL prevents local execution on a laptop not
exposing a public IP address, by unconditionally disqualifying all
interfaces with local addresses. This is not done based on MCA parameters
but instead is done deep inside the IP matching logic, independent of what
the user s
Nice catch. Keeping the first check only works because the signaling field
prevent us from releasing the condition too early. I added some comments
around the code (131fe42d).
George.
On Wed, Sep 21, 2016 at 5:33 AM, Nathan Hjelm wrote:
> Yeah, that looks like a bug to me. We need to keep the
:+1:
George.
On Mon, Sep 19, 2016 at 6:56 PM, Jeff Squyres (jsquyres) wrote:
> (we can discuss all of this on the Webex tomorrow)
>
> Here's a sample repo where I merged ompi and ompi-release:
>
> https://github.com/open-mpi/ompi-all-the-branches
>
> Please compare it to:
>
> https:/
On Thu, Sep 8, 2016 at 10:17 AM, Clément wrote:
> Hi every one,
>
> I'm currently working on a monitoring component for OpenMPI. After
> browsing the MPI standard, in order to know what the MPI_Tools interface
> looks like, and how it's working, I had a look at the code I recieved and
> something
I can make MPI_Issend_rtoa deadlock with vader and sm.
George.
On Tue, Sep 6, 2016 at 12:06 PM, r...@open-mpi.org wrote:
> FWIW: those tests hang for me with TCP (I don’t have openib on my
> cluster). I’ll check it with your change as well
>
>
> On Sep 6, 2016, at 1:29 AM, Gilles Gouaillarde
behavior you describe, then you simply tell ORTE to
> “--map-by core --bind-to core”
>
> On Sep 5, 2016, at 11:05 AM, George Bosilca wrote:
>
> On Sat, Sep 3, 2016 at 10:34 AM, r...@open-mpi.org
> wrote:
>
>> Interesting - well, it looks like ORTE is working correctly. The map
to 1,9.
> All of this matches what the OS reported.
>
> So it looks like it is report-bindings that is messed up for some reason.
>
>
> On Sep 3, 2016, at 7:14 AM, George Bosilca wrote:
>
> $mpirun -np 3 --tag-output --bind-to core --report-bindings
> --display-devel-map -
hion (such as rank 0 on core 0,
rank 1 on core 1 and rank 2 on core 2) ?
George.
>
>
> On Sep 3, 2016, at 7:14 AM, George Bosilca wrote:
>
> $mpirun -np 3 --tag-output --bind-to core --report-bindings
> --display-devel-map --mca rmaps_base_verbose 10 true
>
> [dancer.icl
arc00:07612] MCW rank 2 bound to socket 0[core 1[hwt 0]],
> socket 0[core 9[hwt 0]]: [../B./../../../../../../../B.
> ][../../../../../../../../../..]
On Sat, Sep 3, 2016 at 9:44 AM, r...@open-mpi.org wrote:
> Okay, can you add --display-devel-map --mca rmaps_base_verbose 10 to your
> cmd li
v2 cannot be used in Open MPI yet)
>
> You might also want to try
> mpirun --tag-output --bind-to xxx --report-bindings grep Cpus_allowed_list
> /proc/self/status
>
> So you can confirm both openmpi and /proc/self/status report the same thing
>
> Hope this helps a bi
ore 11[hwt 0-1]]:
> [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../../../..]
>
>
> I have seen the segfault when something fails early in the setup procedure
> - planned to fix that this weekend.
>
>
> On Sep 2, 2016, at 9:09 PM, George Bosilca wrote:
>
>
While investigating the ongoing issue with OMPI messaging layer, I run into
some troubles with process binding. I read the documentation, but I still
find this puzzling.
Disclaimer: all experiments were done with current master (9c496f7)
compiled in optimized mode. The hardware: a single node 20 c
v2.x: https://github.com/open-mpi/ompi-release/pull/1344
master: https://github.com/open-mpi/ompi/commit/a6d515b
Thanks,
George.
On Sat, Aug 27, 2016 at 12:45 PM, George Bosilca
wrote:
> Paul,
>
> Sorry for the half-fix. I'll submit a patch and PRs to the releases asap
Paul,
Sorry for the half-fix. I'll submit a patch and PRs to the releases asap.
George.
On Sat, Aug 27, 2016 at 4:14 AM, Paul Hargrove wrote:
> I didn't get to test 2.0.1rc1 with xlc-12.1 until just now because I need
> a CRYPTOCard for access (== not fully automated like my other tests).
>
or jumping in so late.
> > >
> > > Honestly, there's no problem with making a repo in the open-mpi github
> org. It's just as trivial to make one there as anywhere else.
> > >
> > > Let me know if you want one.
> > >
> > >
> >
'll send
> them later this week.
>
> Do you think of regular calls or per agreement?
>
> пятница, 26 августа 2016 г. пользователь George Bosilca написал:
>
>> We are serious about this. However, we not only have to define a set of
>> meaningful tests (which we don
; tests and drop the ball here?
>
> пятница, 26 августа 2016 г. пользователь George Bosilca написал:
>
> Arm repo is a good location until we converge to a well-defined set of
>> tests.
>>
>> George.
>>
>>
>> On Thu, Aug 25, 2016 at 1:44 PM, Artem Polyakov
&
Arm repo is a good location until we converge to a well-defined set of
tests.
George.
On Thu, Aug 25, 2016 at 1:44 PM, Artem Polyakov wrote:
> That's a good question. I have results myself and I don't know where to
> place them.
> I think that Arm's repo is not a right place to collect the d
ration is a per-communicator boolean), so i do not see how
> s->in_operation can be true in a valid MPI program.
>
>
> Though the first point can be seen as a "matter of style", i am pretty
> curious about the second one.
>
>
> Cheers,
>
>
> Gilles
>
>
Ralph,
Bringing back the coll/sync is a cheap shot at hiding a real issue behind a
smoke curtain. As Nathan described in his email, Open MPI lacks of control
flow on eager messages is the real culprit here, and the loop around any
one-to-many collective (bcast and scatter*) was only helping to exa
1 - 100 of 1295 matches
Mail list logo