Hi George, Got it, thanks for the info - I naively hadn't even considered that of course all the related libraries likely have their *own* allocators. So, for *OpenMPI, *it sounds like I can use my own opal_[mc]alloc calls, with a new build turning mem debugging on, to tally up and report the total size of OpenMPI allocations, and that seems pretty straightforward. But I'd guess that for a data-heavy MPI application, the majority of the memory will be in transport-level buffers, and that's (for me) likely the UCX layer, so I should look to that community / code for quantifying how large those buffers get inside my application?
Thanks again, and apologies for what is surely a woeful misuse of the correct terminology here on some of this stuff. - Brian On Mon, Apr 17, 2023 at 11:05 AM George Bosilca <bosi...@icl.utk.edu> wrote: > Brian, > > OMPI does not have an official mechanism to report how much memory OMPI > allocates. But, there is hope: > > 1. We have a mechanism to help debug memory issues > (OPAL_ENABLE_MEM_DEBUG). You could enable it and then provide your own > flavor of memory tracking in opal/util/malloc.c > 2. You can use a traditional malloc trapping mechanism (valgrind, malt, > mtrace,...), and investigate the stack to detect where the allocation was > issued and then count. > > The first approach would only give you the memory used by OMPI itself, not > the other libraries we are using (PMIx, HWLOC, UCX, ...). The second might > be a little more generic, but depend on external tools and might take a > little time to setup. > > George. > > > On Fri, Apr 14, 2023 at 3:31 PM Brian Dobbins via users < > users@lists.open-mpi.org> wrote: > >> >> Hi all, >> >> I'm wondering if there's a simple way to get statistics from OpenMPI as >> to how much memory the *MPI* layer in an application is taking. For >> example, I'm running a model and I can get the RSS size at various points >> in the code, and that reflects the user data for the application, *plus*, >> surely, buffers for MPI messages that are either allocated at runtime or, >> maybe, a pool from start-up. The memory use -which I assume is tied to >> internal buffers? differs considerably with *how* I run MPI - eg, TCP vs >> UCX, and with UCX, a UD vs RC mode. >> >> Here's an example of this: >> >> 60km (163842 columns), 2304 ranks [OpenMPI] >> UCX Transport Changes (environment variable) >> (No recompilation; all runs done on same nodes) >> Showing memory after ATM-TO-MED Step >> [RSS Memory in MB] >> >> Standard Decomposition >> UCX_TLS value ud default rc >> Run 1 347.03 392.08 750.32 >> Run 2 346.96 391.86 748.39 >> Run 3 346.89 392.18 750.23 >> >> I'd love a way to trace how much *MPI alone* is using, since here I'm >> still measuring the *process's* RSS. My feeling is that if, for >> example, I'm running on N nodes and have a 1GB dataset + (for the sake of >> discussion) 100MB of MPI info, then at 2N, with good scaling of domain >> memory, that's 500MB + 100MB, at 4N it's 250MB/100MB, and eventually, at >> 16N, the MPI memory dominates. As a result, when we scale out, even with >> perfect scaling of *domain* memory, at some point memory associated with >> MPI will cause this curve to taper off, and potentially invert. But I'm >> admittedly *way* out of date on how modern MPI implementations allocate >> buffers. >> >> In short, any tips on ways to better characterize MPI memory use would >> be *greatly* appreciated! If this is purely on the UCX (or other >> transport) level, that's good to know too. >> >> Thanks, >> - Brian >> >> >>