Hi,
I'm implementing a new BTL component, and
1. I read the TCP code and ran into the three fragment lists:
/* free list of fragment descriptors */
ompi_free_list_t tcp_frag_eager;
ompi_free_list_t tcp_frag_max;
ompi_free_list_t tcp_frag_user;
I've looked it up, and found that
On Mar 9, 2012, at 08:38 , Alex Margolin wrote:
> Hi,
>
> I'm implementing a new BTL component, and
>
> 1. I read the TCP code and ran into the three fragment lists:
>
>/* free list of fragment descriptors */
>ompi_free_list_t tcp_frag_eager;
>ompi_free_list_t tcp_frag_max;
>om
Fixed in r26122. I tested locally with the ibm test suite, and it looks
good. MTT should highlight if there are any other issues - but I doubt
there will be.
-- Josh
On Thu, Mar 8, 2012 at 5:16 PM, Josh Hursey wrote:
> Good point (I did not even look at ompi_comm_compare, I was using this for
>
I just made an interesting observation:
When binding the processes to two neighboring cores (L2 sharing) NetPIPE shows
*sometimes* pretty good results: ~0.5us
$ mpirun -mca btl sm,self -np 1 hwloc-bind -v core:0 ./NPmpi_ompi1.5.5 -u 4 -n
10 -p 0 : -np 1 hwloc-bind -v core:1 ./NPmpi_ompi1.5.
George --
I believe that this is the subject of a few long-standing tickets (i.e., what
to do when running out of registered memory -- right now, we hang, for a few
reasons). I think that this is Mellanox's attempt to at least warn the user
that we have run out of registered memory, and will t
Not exactly, the PML invokes the mpool which invokes the registration function.
If registration fails the mpool will deregister from its lru (if possible) and
try again. So, it is not an error if ibv_reg_mr fails unless it fails because
the process is starved of registered memory (or truely run
On Mar 9, 2012, at 12:59 , Nathan Hjelm wrote:
> Not exactly, the PML invokes the mpool which invokes the registration
> function. If registration fails the mpool will deregister from its lru (if
> possible) and try again. So, it is not an error if ibv_reg_mr fails unless it
> fails because th
On Mar 9, 2012, at 1:14 PM, George Bosilca wrote:
>> The hang occurs because there is nothing on the lru to deregister and
>> ibv_reg_mr (or GNI_MemRegister in the uGNI case) fails. The PML then puts
>> the request on its rdma pending list and continues. If any message comes in
>> the rdma pend
On Fri, 9 Mar 2012, Jeffrey Squyres wrote:
On Mar 9, 2012, at 1:14 PM, George Bosilca wrote:
The hang occurs because there is nothing on the lru to deregister and
ibv_reg_mr (or GNI_MemRegister in the uGNI case) fails. The PML then puts the
request on its rdma pending list and continues. I
On Mar 9, 2012, at 1:32 PM, Nathan Hjelm wrote:
> An mpool that is aware of local processes lru's will solve the problem in
> most cases (all that I have seen)
I agree -- don't let words in my emails make you think otherwise. I think this
will fix "most" problems, but undoubtedly, some will st
On Fri, 9 Mar 2012, Jeffrey Squyres wrote:
On Mar 9, 2012, at 1:32 PM, Nathan Hjelm wrote:
An mpool that is aware of local processes lru's will solve the problem in most
cases (all that I have seen)
I agree -- don't let words in my emails make you think otherwise. I think this will fix
>> Depending on the timing, this might go to 1.6 (1.5.5 has waited for too
>> long, and this is not a regression). Keep in mind that the problem has been
>> around for *a long, long time*, which is why I approved the diag message
>> (i.e., because a real solution is still nowhere in sight). Th
[Comment at bottom]
>-Original Message-
>From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
>On Behalf Of Nathan Hjelm
>Sent: Friday, March 09, 2012 2:23 PM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26106
>
>
>
>On Fri, 9 Mar 2012, J
On Mar 9, 2012, at 14:23 , Nathan Hjelm wrote:
> BTW, can anyone tell me why each mpool defines mca_mpool_base_resources_t
> instead of defining mca_mpool_blah_resources_t. The current design makes it
> impossible to support more than one mpool in a btl. I can delete a bunch of
> code if I can
On Fri, 9 Mar 2012, George Bosilca wrote:
On Mar 9, 2012, at 14:23 , Nathan Hjelm wrote:
BTW, can anyone tell me why each mpool defines mca_mpool_base_resources_t
instead of defining mca_mpool_blah_resources_t. The current design makes it
impossible to support more than one mpool in a btl
I tested my grdma mpool with the openib btl and IMB Alltoall/Alltoallv on a
system that consistently hangs. If I give the connection module the ability to
evict from the lru grdma prevents both the out of registered memory hang AND
problems creating QPs (due to exhaustion of registered memory).
16 matches
Mail list logo