Some folks from ORNL have done some studies about OMPI memory usage a few
years ago, but I am not sure if these studies are openly available. OMPI
manages all the MCA parameters, user facing requests, unexpected messages,
temporary buffers for collectives and IO. And those are, I might be
slightly extrapolating here, linearly dependent on the number of
communicators and non-blocking and persistent requests existing.

As a general statement low-level communication libraries are not supposed
to use much memory, and the amount would be capped to some extent
(logically by number of endpoints or connections). In particular, UCX has a
similar memory tracking mechanism to OMPI, via ucs_malloc and friends. Take
a look at ucs/debug/memtrack.c to figure out how to enable it (maybe
enabling statistics, aka. ENABLE_STATS, is enough).

  George.




On Mon, Apr 17, 2023 at 1:16 PM Brian Dobbins <bdobb...@gmail.com> wrote:

>
> Hi George,
>
>   Got it, thanks for the info - I naively hadn't even considered that of
> course all the related libraries likely have their *own* allocators.  So,
> for *OpenMPI, *it sounds like I can use my own opal_[mc]alloc calls, with
> a new build turning mem debugging on, to tally up and report the total size
> of OpenMPI allocations, and that seems pretty straightforward.  But I'd
> guess that for a data-heavy MPI application, the majority of the memory
> will be in transport-level buffers, and that's (for me) likely the UCX
> layer, so I should look to that community / code for quantifying how large
> those buffers get inside my application?
>
>   Thanks again, and apologies for what is surely a woeful misuse of the
> correct terminology here on some of this stuff.
>
>   - Brian
>
>
> On Mon, Apr 17, 2023 at 11:05 AM George Bosilca <bosi...@icl.utk.edu>
> wrote:
>
>> Brian,
>>
>> OMPI does not have an official mechanism to report how much memory OMPI
>> allocates. But, there is hope:
>>
>> 1. We have a mechanism to help debug memory issues
>> (OPAL_ENABLE_MEM_DEBUG). You could enable it and then provide your own
>> flavor of memory tracking in opal/util/malloc.c
>> 2. You can use a traditional malloc trapping mechanism (valgrind, malt,
>> mtrace,...), and investigate the stack to detect where the allocation was
>> issued and then count.
>>
>> The first approach would only give you the memory used by OMPI itself,
>> not the other libraries we are using (PMIx, HWLOC, UCX, ...). The second
>> might be a little more generic, but depend on external tools and might take
>> a little time to setup.
>>
>> George.
>>
>>
>> On Fri, Apr 14, 2023 at 3:31 PM Brian Dobbins via users <
>> users@lists.open-mpi.org> wrote:
>>
>>>
>>> Hi all,
>>>
>>>   I'm wondering if there's a simple way to get statistics from OpenMPI
>>> as to how much memory the *MPI* layer in an application is taking.  For
>>> example, I'm running a model and I can get the RSS size at various points
>>> in the code, and that reflects the user data for the application, *plus*,
>>> surely, buffers for MPI messages that are either allocated at runtime or,
>>> maybe, a pool from start-up.  The memory use -which I assume is tied to
>>> internal buffers? differs considerably with *how* I run MPI - eg, TCP
>>> vs UCX, and with UCX, a UD vs RC mode.
>>>
>>>   Here's an example of this:
>>>
>>> 60km (163842 columns), 2304 ranks [OpenMPI]
>>> UCX Transport Changes (environment variable)
>>> (No recompilation; all runs done on same nodes)
>>> Showing memory after ATM-TO-MED Step
>>> [RSS Memory in MB]
>>>
>>> Standard Decomposition
>>> UCX_TLS value ud default rc
>>> Run 1 347.03 392.08 750.32
>>> Run 2 346.96 391.86 748.39
>>> Run 3 346.89 392.18 750.23
>>>
>>>   I'd love a way to trace how much *MPI alone* is using, since here I'm
>>> still measuring the *process's* RSS.  My feeling is that if, for
>>> example, I'm running on N nodes and have a 1GB dataset + (for the sake of
>>> discussion) 100MB of MPI info, then at 2N, with good scaling of domain
>>> memory, that's 500MB + 100MB, at 4N it's 250MB/100MB, and eventually, at
>>> 16N, the MPI memory dominates.  As a result, when we scale out, even with
>>> perfect scaling of *domain* memory, at some point memory associated
>>> with MPI will cause this curve to taper off, and potentially invert.  But
>>> I'm admittedly *way* out of date on how modern MPI implementations
>>> allocate buffers.
>>>
>>>   In short, any tips on ways to better characterize MPI memory use would
>>> be *greatly* appreciated!  If this is purely on the UCX (or other
>>> transport) level, that's good to know too.
>>>
>>>   Thanks,
>>>   - Brian
>>>
>>>
>>>

Reply via email to