I would start by adjusting btl_openib_receive_queues . The default uses a 
per-peer QP which can eat up a lot of memory. I recommend using no per-peer and 
several shared receive queues. We use S,4096,1024:S,12288,512:S,65536,512

-Nathan

On Thu, 12 Jan 2012, V. Ram wrote:

Open MPI IB Gurus,

I have some slightly older InfiniBand-equipped nodes with IB which have
less RAM than we'd like, and on which we tend to run jobs that can span
16-32 nodes of this type.  The jobs themselves tend to run on the heavy
side in terms of their own memory requirements.

When we used to run on an older Intel MPI, these jobs managed to run
within the available RAM without paging out to disk.  Now using Open MPI
1.5.3, we can end up paging to disk or even running out of memory for
the same codes and exact same jobs and node distributions.

I'm suspecting that I can reduce overall memory consumption by tuning
the IB-related memory that Open MPI consumes.  I've looked at the FAQ:
http://www.open-mpi.org/faq/?category=openfabrics#limiting-registered-memory-usage
, but I'm still not certain about where I should start.  Again, this is
all for 1.5.3 (we are willing to update to 1.5.4 or 1.5.5 when released,
if it would help).

1. It looks like there are several independent IB BTL MCA parameters to
try adjusting: i. mpool_rdma_rcache_size_limit, ii.
btl_openib_free_list_max , iii. btl_openib_max_send_size , iv.
btl_openib_eager_rdma_num, v. btl_openib_max_eager_rdma, vi.
btl_openib_eager_limit .  Have I missed any others parameters that
impact InfiniBand-related memory usage?  These parameters are listed as
affecting registered memory.  Are there parameters that affect
unregistered IB-related memory consumption on the part of Open MPI
itself?

2. Where should I start with this?  For example, is it worth trying to
adjust any of the eager parameters, or are the bulk of the memory
requirements coming from the mpool_rdma_rcache_size_limit?

3. Are there any gross/overall "master" parameters that will set limits,
but keep the various buffers in intelligent proportion to one another,
or will I need to manually adjust each set of buffers independently?  If
the latter, are there any guidelines on the relative proportions between
buffers, or overall recommendations?

Thank you very much.

--
http://www.fastmail.fm - A fast, anti-spam email service.

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to