George,
I will revisit this.
if I added const modifier when not required by the standard, this was not
intentional, this was a mistake.
thanks for the report
Gilles
On Wednesday, September 16, 2015, George Bosilca
wrote:
> Gilles,
>
> Your commit 6e6a3e96 is only partially correct. There is n
Hello everyone,
My name is Kay. I’m a huge "oom-pi" fan, but only recently have been looking at
from devel perspective.
I appreciate if somebody shows me the entry point into understanding how
orterun and user program interact, and more importantly how to change the way
they interact.
The rea
Hi all,
We experimented with MPI+OpenMP hybrid application (MPI_THREAD_MULTIPLE
support level) where several threads submits a lot of MPI_Irecv() requests
simultaneously and encountered an intermittent bug
OMPI_ERR_TEMP_OUT_OF_RESOURCE after MCA_PML_OB1_RECV_REQUEST_ALLOC()
because OMPI_FREE_LIS
something is borked right now on master in the management of inter vs.
intra communicators. It looks like intra communicators are wrongly
selecting the inter coll module thinking that it is an inter
communicator, and we have hangs because of that. I attach a small
replicator, where a bcast of a
The formatting of the code got all messed up. Please send a diff and I
will take a look. ompi free list no longer exists in master or the next
release branch but the change may be worthwhile for the opal free list
code.
-Nathan
On Wed, Sep 16, 2015 at 04:03:44PM +0300, Алексей Рыжих wrote:
>
The reproducer is working for me with master on OX 10.10. Some changes
to ompi_comm_set went in yesterday. Are you on the latest hash?
-Nathan
On Wed, Sep 16, 2015 at 08:49:59AM -0500, Edgar Gabriel wrote:
> something is borked right now on master in the management of inter vs. intra
> communica
yes, I did fresh pull this morning, for me it deadlocks reliably for 2
and more processes.
Thanks
Edgar
On 9/16/2015 10:42 AM, Nathan Hjelm wrote:
The reproducer is working for me with master on OX 10.10. Some changes
to ompi_comm_set went in yesterday. Are you on the latest hash?
-Nathan
O
I just realized my branch is behind master. Updating now and will retest.
-Nathan
On Wed, Sep 16, 2015 at 10:43:45AM -0500, Edgar Gabriel wrote:
> yes, I did fresh pull this morning, for me it deadlocks reliably for 2 and
> more processes.
>
> Thanks
> Edgar
>
> On 9/16/2015 10:42 AM, Nathan H
Alexey,
This is not necessarily the fix for all cases. Most of the internal uses of
the free_list can easily accommodate to the fact that no more elements are
available. Based on your description of the problem I would assume you
encounter this problem once the MCA_PML_OB1_RECV_REQUEST_ALLOC is ca
While looking into a possible fix for this problem we should also cleanup
in the trunk the leftover from the OMPI_FREE_LIST.
$find . -name "*.[ch]" -exec grep -Hn OMPI_FREE_LIST_GET_MT {} +
./opal/mca/btl/usnic/btl_usnic_compat.h:161:OMPI_FREE_LIST_GET_MT(list,
(item))
./ompi/mca/pml/bfo/pml_b
Hi Folks,
I had to update my password for NERSC systems and that broke the credentials
the IU jenkins was using to launch on those nodes. Should be working again.
Sorry for the inconvenience,
Howard
iboffload and bfo are opal ignored by default. Neither exists in the
release branch.
-Nathan
On Wed, Sep 16, 2015 at 12:02:29PM -0400, George Bosilca wrote:
>While looking into a possible fix for this problem we should also cleanup
>in the trunk the leftover from the OMPI_FREE_LIST.
>
I see the problem. Before my changes ompi_comm_dup signalled that the
communicator was not an inter-communicator by setting remote_size to
0. The remote size is now from the remote group if one was supplied
(which is the case with intra-communicators) so ompi_comm_dup needs to
make sure NULL is pa
As they don't even compile why are we keeping them around?
George.
On Wed, Sep 16, 2015 at 12:05 PM, Nathan Hjelm wrote:
>
> iboffload and bfo are opal ignored by default. Neither exists in the
> release branch.
>
> -Nathan
>
> On Wed, Sep 16, 2015 at 12:02:29PM -0400, George Bosilca wrote:
Not sure. I give a +1 for blowing them away. We can bring them back
later if needed.
-Nathan
On Wed, Sep 16, 2015 at 01:19:24PM -0400, George Bosilca wrote:
>As they don't even compile why are we keeping them around?
> George.
>On Wed, Sep 16, 2015 at 12:05 PM, Nathan Hjelm wrote:
The bfo was my creation many years ago. Can we keep it around for a little
longer? If we blow it away, then we should probably clean up all the code I
also have in the openib BTL for supporting failover. There is also some
configure code that would have to go as well.
Rolf
>-Original Me
George,
You are right. The sequence of calls in our test is MPI_Irecv ->
mca_pml_ob1_irecv -> MCA_PML_OB1_RECV_REQUEST_ALLOC. We will try to use
OMPI_FREE_LIST_WAIT_MT.
We saw the following problem in OMPI_FREE_LIST_WAIT_MT. It returned NULL in
case when thread A was suspended after the call
Sorry, “We saw the following problem in OMPI_FREE_LIST_GET_MT…”.
*From:* Владимир Трущин [mailto:vdtrusc...@compcenter.org]
*Sent:* Wednesday, September 16, 2015 10:09 PM
*To:* 'Open MPI Developers'
*Subject:* RE: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()
George,
You are right. T
On Wed, Sep 16, 2015 at 3:11 PM, Владимир Трущин
wrote:
> Sorry, “We saw the following problem in OMPI_FREE_LIST_GET_MT…”.
>
That's exactly what the WAIT macro is supposed to solve, wait (grow the
freelist and call opal_progress) until an item become available.
George.
>
>
> *From:* Владим
Did something change in the group structure in the last 24-48 hours?
--enable-spare-groups groups are currently broken:
make[2]: Entering directory `/home/jsquyres/git/ompi/ompi/debuggers'
CC libdebuggers_la-ompi_debuggers.lo
In file included from ../../ompi/communicator/communicator
Yes - Nathan made some changes related to the add_procs code. I doubt that
configure option was checked...
On Wed, Sep 16, 2015 at 7:13 PM, Jeff Squyres (jsquyres) wrote:
> Did something change in the group structure in the last 24-48 hours?
>
> --enable-spare-groups groups are currently broken:
Edgar
Do you have a simple test we could run with jenkins ghprb that would catch
this going forward?
I could add it to some of the checks we run on your UH slave node.
Howard
--
sent from my smart phonr so no good type.
Howard
On Sep 16, 2015 12:36 PM, "Nathan Hjelm" wrote:
>
> I se
Actually, Edgar attached a simple reproducer to the first message in this
thread.
On Wed, Sep 16, 2015 at 7:27 PM, Howard Pritchard
wrote:
> Edgar
>
> Do you have a simple test we could run with jenkins ghprb that would catch
> this going forward?
>
> I could add it to some of the checks we run
thanks Ralph. I will add it to one of the UH jenkins scripts.
--
sent from my smart phonr so no good type.
Howard
On Sep 16, 2015 10:28 PM, "Ralph Castain" wrote:
> Actually, Edgar attached a simple reproducer to the first message in this
> thread.
>
>
> On Wed, Sep 16, 2015 at 7:27 P
24 matches
Mail list logo