Hi George,

  Got it, thanks for the info - I naively hadn't even considered that of
course all the related libraries likely have their *own* allocators.  So,
for *OpenMPI, *it sounds like I can use my own opal_[mc]alloc calls, with a
new build turning mem debugging on, to tally up and report the total size
of OpenMPI allocations, and that seems pretty straightforward.  But I'd
guess that for a data-heavy MPI application, the majority of the memory
will be in transport-level buffers, and that's (for me) likely the UCX
layer, so I should look to that community / code for quantifying how large
those buffers get inside my application?

  Thanks again, and apologies for what is surely a woeful misuse of the
correct terminology here on some of this stuff.

  - Brian


On Mon, Apr 17, 2023 at 11:05 AM George Bosilca <bosi...@icl.utk.edu> wrote:

> Brian,
>
> OMPI does not have an official mechanism to report how much memory OMPI
> allocates. But, there is hope:
>
> 1. We have a mechanism to help debug memory issues
> (OPAL_ENABLE_MEM_DEBUG). You could enable it and then provide your own
> flavor of memory tracking in opal/util/malloc.c
> 2. You can use a traditional malloc trapping mechanism (valgrind, malt,
> mtrace,...), and investigate the stack to detect where the allocation was
> issued and then count.
>
> The first approach would only give you the memory used by OMPI itself, not
> the other libraries we are using (PMIx, HWLOC, UCX, ...). The second might
> be a little more generic, but depend on external tools and might take a
> little time to setup.
>
> George.
>
>
> On Fri, Apr 14, 2023 at 3:31 PM Brian Dobbins via users <
> users@lists.open-mpi.org> wrote:
>
>>
>> Hi all,
>>
>>   I'm wondering if there's a simple way to get statistics from OpenMPI as
>> to how much memory the *MPI* layer in an application is taking.  For
>> example, I'm running a model and I can get the RSS size at various points
>> in the code, and that reflects the user data for the application, *plus*,
>> surely, buffers for MPI messages that are either allocated at runtime or,
>> maybe, a pool from start-up.  The memory use -which I assume is tied to
>> internal buffers? differs considerably with *how* I run MPI - eg, TCP vs
>> UCX, and with UCX, a UD vs RC mode.
>>
>>   Here's an example of this:
>>
>> 60km (163842 columns), 2304 ranks [OpenMPI]
>> UCX Transport Changes (environment variable)
>> (No recompilation; all runs done on same nodes)
>> Showing memory after ATM-TO-MED Step
>> [RSS Memory in MB]
>>
>> Standard Decomposition
>> UCX_TLS value ud default rc
>> Run 1 347.03 392.08 750.32
>> Run 2 346.96 391.86 748.39
>> Run 3 346.89 392.18 750.23
>>
>>   I'd love a way to trace how much *MPI alone* is using, since here I'm
>> still measuring the *process's* RSS.  My feeling is that if, for
>> example, I'm running on N nodes and have a 1GB dataset + (for the sake of
>> discussion) 100MB of MPI info, then at 2N, with good scaling of domain
>> memory, that's 500MB + 100MB, at 4N it's 250MB/100MB, and eventually, at
>> 16N, the MPI memory dominates.  As a result, when we scale out, even with
>> perfect scaling of *domain* memory, at some point memory associated with
>> MPI will cause this curve to taper off, and potentially invert.  But I'm
>> admittedly *way* out of date on how modern MPI implementations allocate
>> buffers.
>>
>>   In short, any tips on ways to better characterize MPI memory use would
>> be *greatly* appreciated!  If this is purely on the UCX (or other
>> transport) level, that's good to know too.
>>
>>   Thanks,
>>   - Brian
>>
>>
>>

Reply via email to