[OMPI devel] MCA BTL Fragment lists

2012-03-09 Thread Alex Margolin
Hi, I'm implementing a new BTL component, and 1. I read the TCP code and ran into the three fragment lists: /* free list of fragment descriptors */ ompi_free_list_t tcp_frag_eager; ompi_free_list_t tcp_frag_max; ompi_free_list_t tcp_frag_user; I've looked it up, and found that

Re: [OMPI devel] MCA BTL Fragment lists

2012-03-09 Thread George Bosilca
On Mar 9, 2012, at 08:38 , Alex Margolin wrote: > Hi, > > I'm implementing a new BTL component, and > > 1. I read the TCP code and ran into the three fragment lists: > >/* free list of fragment descriptors */ >ompi_free_list_t tcp_frag_eager; >ompi_free_list_t tcp_frag_max; >om

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26118

2012-03-09 Thread Josh Hursey
Fixed in r26122. I tested locally with the ibm test suite, and it looks good. MTT should highlight if there are any other issues - but I doubt there will be. -- Josh On Thu, Mar 8, 2012 at 5:16 PM, Josh Hursey wrote: > Good point (I did not even look at ompi_comm_compare, I was using this for >

Re: [OMPI devel] poor btl sm latency

2012-03-09 Thread Matthias Jurenz
I just made an interesting observation: When binding the processes to two neighboring cores (L2 sharing) NetPIPE shows *sometimes* pretty good results: ~0.5us $ mpirun -mca btl sm,self -np 1 hwloc-bind -v core:0 ./NPmpi_ompi1.5.5 -u 4 -n 10 -p 0 : -np 1 hwloc-bind -v core:1 ./NPmpi_ompi1.5.

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26106

2012-03-09 Thread Jeffrey Squyres
George -- I believe that this is the subject of a few long-standing tickets (i.e., what to do when running out of registered memory -- right now, we hang, for a few reasons). I think that this is Mellanox's attempt to at least warn the user that we have run out of registered memory, and will t

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26106

2012-03-09 Thread Nathan Hjelm
Not exactly, the PML invokes the mpool which invokes the registration function. If registration fails the mpool will deregister from its lru (if possible) and try again. So, it is not an error if ibv_reg_mr fails unless it fails because the process is starved of registered memory (or truely run

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26106

2012-03-09 Thread George Bosilca
On Mar 9, 2012, at 12:59 , Nathan Hjelm wrote: > Not exactly, the PML invokes the mpool which invokes the registration > function. If registration fails the mpool will deregister from its lru (if > possible) and try again. So, it is not an error if ibv_reg_mr fails unless it > fails because th

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26106

2012-03-09 Thread Jeffrey Squyres
On Mar 9, 2012, at 1:14 PM, George Bosilca wrote: >> The hang occurs because there is nothing on the lru to deregister and >> ibv_reg_mr (or GNI_MemRegister in the uGNI case) fails. The PML then puts >> the request on its rdma pending list and continues. If any message comes in >> the rdma pend

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26106

2012-03-09 Thread Nathan Hjelm
On Fri, 9 Mar 2012, Jeffrey Squyres wrote: On Mar 9, 2012, at 1:14 PM, George Bosilca wrote: The hang occurs because there is nothing on the lru to deregister and ibv_reg_mr (or GNI_MemRegister in the uGNI case) fails. The PML then puts the request on its rdma pending list and continues. I

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26106

2012-03-09 Thread Jeffrey Squyres
On Mar 9, 2012, at 1:32 PM, Nathan Hjelm wrote: > An mpool that is aware of local processes lru's will solve the problem in > most cases (all that I have seen) I agree -- don't let words in my emails make you think otherwise. I think this will fix "most" problems, but undoubtedly, some will st

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26106

2012-03-09 Thread Nathan Hjelm
On Fri, 9 Mar 2012, Jeffrey Squyres wrote: On Mar 9, 2012, at 1:32 PM, Nathan Hjelm wrote: An mpool that is aware of local processes lru's will solve the problem in most cases (all that I have seen) I agree -- don't let words in my emails make you think otherwise. I think this will fix

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26106

2012-03-09 Thread Shamis, Pavel
>> Depending on the timing, this might go to 1.6 (1.5.5 has waited for too >> long, and this is not a regression). Keep in mind that the problem has been >> around for *a long, long time*, which is why I approved the diag message >> (i.e., because a real solution is still nowhere in sight). Th

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26106

2012-03-09 Thread Rolf vandeVaart
[Comment at bottom] >-Original Message- >From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] >On Behalf Of Nathan Hjelm >Sent: Friday, March 09, 2012 2:23 PM >To: Open MPI Developers >Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26106 > > > >On Fri, 9 Mar 2012, J

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26106

2012-03-09 Thread George Bosilca
On Mar 9, 2012, at 14:23 , Nathan Hjelm wrote: > BTW, can anyone tell me why each mpool defines mca_mpool_base_resources_t > instead of defining mca_mpool_blah_resources_t. The current design makes it > impossible to support more than one mpool in a btl. I can delete a bunch of > code if I can

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26106

2012-03-09 Thread Nathan Hjelm
On Fri, 9 Mar 2012, George Bosilca wrote: On Mar 9, 2012, at 14:23 , Nathan Hjelm wrote: BTW, can anyone tell me why each mpool defines mca_mpool_base_resources_t instead of defining mca_mpool_blah_resources_t. The current design makes it impossible to support more than one mpool in a btl

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26106

2012-03-09 Thread Nathan Hjelm
I tested my grdma mpool with the openib btl and IMB Alltoall/Alltoallv on a system that consistently hangs. If I give the connection module the ability to evict from the lru grdma prevents both the out of registered memory hang AND problems creating QPs (due to exhaustion of registered memory).