Hi Gilles, you catch the bug! With this patch, on a single node, the memory leak disappear. The cluster is actualy overloaded, as soon as possible I will launch a multinode test. Below the memory used by rank 0 before (blue) and after (red) the patch.
Thanks Patrick Le 10/12/2020 à 10:15, Gilles Gouaillardet via users a écrit : > Patrick, > > > First, thank you very much for sharing the reproducer. > > > Yes, please open a github issue so we can track this. > > > I cannot fully understand where the leak is coming from, but so far > > - the code fails on master built with --enable-debug (the data engine > reports an error) but not with the v3.1.x branch > > (this suggests there could be an error in the latest Open MPI ... or > in the code) > > - the attached patch seems to have a positive effect, can you please > give it a try? > > > Cheers, > > > Gilles > > > > On 12/7/2020 6:15 PM, Patrick Bégou via users wrote: >> Hi, >> >> I've written a small piece of code to show the problem. Based on my >> application but 2D and using integers arrays for testing. >> The figure below shows the max RSS size of rank 0 process on 20000 >> iterations on 8 and 16 cores, with openib and tcp drivers. >> The more processes I have, the larger the memory leak. I use the >> same binaries for the 4 runs and OpenMPI 3.1 (same behavior with 4.0.5). >> The code is in attachment. I'll try to check type deallocation as >> soon as possible. >> >> Patrick >> >> >> >> >> Le 04/12/2020 à 01:34, Gilles Gouaillardet via users a écrit : >>> Patrick, >>> >>> >>> based on George's idea, a simpler check is to retrieve the Fortran >>> index via the (standard) MPI_Type_c2() function >>> >>> after you create a derived datatype. >>> >>> >>> If the index keeps growing forever even after you MPI_Type_free(), >>> then this clearly indicates a leak. >>> >>> Unfortunately, this simple test cannot be used to definitely rule >>> out any memory leak. >>> >>> >>> Note you can also >>> >>> mpirun --mca pml ob1 --mca btl tcp,self ... >>> >>> in order to force communications over TCP/IP and hence rule out any >>> memory leak that could be triggered by your fast interconnect. >>> >>> >>> >>> In any case, a reproducer will greatly help us debugging this issue. >>> >>> >>> Cheers, >>> >>> >>> Gilles >>> >>> >>> >>> On 12/4/2020 7:20 AM, George Bosilca via users wrote: >>>> Patrick, >>>> >>>> I'm afraid there is no simple way to check this. The main reason >>>> being that OMPI use handles for MPI objects, and these handles are >>>> not tracked by the library, they are supposed to be provided by the >>>> user for each call. In your case, as you already called >>>> MPI_Type_free on the datatype, you cannot produce a valid handle. >>>> >>>> There might be a trick. If the datatype is manipulated with any >>>> Fortran MPI functions, then we convert the handle (which in fact is >>>> a pointer) to an index into a pointer array structure. Thus, the >>>> index will remain used, and can therefore be used to convert back >>>> into a valid datatype pointer, until OMPI completely releases the >>>> datatype. Look into the ompi_datatype_f_to_c_table table to see the >>>> datatypes that exist and get their pointers, and then use these >>>> pointers as arguments to ompi_datatype_dump() to see if any of >>>> these existing datatypes are the ones you define. >>>> >>>> George. >>>> >>>> >>>> >>>> >>>> On Thu, Dec 3, 2020 at 4:44 PM Patrick Bégou via users >>>> <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> wrote: >>>> >>>> Hi, >>>> >>>> I'm trying to solve a memory leak since my new implementation of >>>> communications based on MPI_AllToAllW and MPI_type_Create_SubArray >>>> calls. Arrays of SubArray types are created/destroyed at each >>>> time step and used for communications. >>>> >>>> On my laptop the code runs fine (running for 15000 temporal >>>> itérations on 32 processes with oversubscription) but on our >>>> cluster memory used by the code increase until the OOMkiller stop >>>> the job. On the cluster we use IB QDR for communications. >>>> >>>> Same Gcc/Gfortran 7.3 (built from sources), same sources of >>>> OpenMPI (3.1 or 4.0.5 tested), same sources of the fortran code on >>>> the laptop and on the cluster. >>>> >>>> Using Gcc/Gfortran 4.8 and OpenMPI 1.7.3 on the cluster do not >>>> show the problem (resident memory do not increase and we ran >>>> 100000 temporal iterations) >>>> >>>> MPI_type_free manual says that it "/Marks the datatype object >>>> associated with datatype for deallocation/". But how can I check >>>> that the deallocation is really done ? >>>> >>>> Thanks for ant suggestions. >>>> >>>> Patrick >>>> >>