Hi Gilles,

you catch the bug! With this patch, on a single node, the memory leak
disappear. The cluster is actualy overloaded, as soon as possible I will
launch a multinode test.
Below the memory used by rank 0 before (blue) and after (red) the patch.

Thanks

Patrick


Le 10/12/2020 à 10:15, Gilles Gouaillardet via users a écrit :
> Patrick,
>
>
> First, thank you very much for sharing the reproducer.
>
>
> Yes, please open a github issue so we can track this.
>
>
> I cannot fully understand where the leak is coming from, but so far
>
>  - the code fails on master built with --enable-debug (the data engine
> reports an error) but not with the v3.1.x branch
>
>   (this suggests there could be an error in the latest Open MPI ... or
> in the code)
>
>  - the attached patch seems to have a positive effect, can you please
> give it a try?
>
>
> Cheers,
>
>
> Gilles
>
>
>
> On 12/7/2020 6:15 PM, Patrick Bégou via users wrote:
>> Hi,
>>
>> I've written a small piece of code to show the problem. Based on my
>> application but 2D and using integers arrays for testing.
>> The  figure below shows the max RSS size of rank 0 process on 20000
>> iterations on 8 and 16 cores, with openib and tcp drivers.
>> The more processes I have, the larger the memory leak.  I use the
>> same binaries for the 4 runs and OpenMPI 3.1 (same behavior with 4.0.5).
>> The code is in attachment. I'll try to check type deallocation as
>> soon as possible.
>>
>> Patrick
>>
>>
>>
>>
>> Le 04/12/2020 à 01:34, Gilles Gouaillardet via users a écrit :
>>> Patrick,
>>>
>>>
>>> based on George's idea, a simpler check is to retrieve the Fortran
>>> index via the (standard) MPI_Type_c2() function
>>>
>>> after you create a derived datatype.
>>>
>>>
>>> If the index keeps growing forever even after you MPI_Type_free(),
>>> then this clearly indicates a leak.
>>>
>>> Unfortunately, this simple test cannot be used to definitely rule
>>> out any memory leak.
>>>
>>>
>>> Note you can also
>>>
>>> mpirun --mca pml ob1 --mca btl tcp,self ...
>>>
>>> in order to force communications over TCP/IP and hence rule out any
>>> memory leak that could be triggered by your fast interconnect.
>>>
>>>
>>>
>>> In any case, a reproducer will greatly help us debugging this issue.
>>>
>>>
>>> Cheers,
>>>
>>>
>>> Gilles
>>>
>>>
>>>
>>> On 12/4/2020 7:20 AM, George Bosilca via users wrote:
>>>> Patrick,
>>>>
>>>> I'm afraid there is no simple way to check this. The main reason
>>>> being that OMPI use handles for MPI objects, and these handles are
>>>> not tracked by the library, they are supposed to be provided by the
>>>> user for each call. In your case, as you already called
>>>> MPI_Type_free on the datatype, you cannot produce a valid handle.
>>>>
>>>> There might be a trick. If the datatype is manipulated with any
>>>> Fortran MPI functions, then we convert the handle (which in fact is
>>>> a pointer) to an index into a pointer array structure. Thus, the
>>>> index will remain used, and can therefore be used to convert back
>>>> into a valid datatype pointer, until OMPI completely releases the
>>>> datatype. Look into the ompi_datatype_f_to_c_table table to see the
>>>> datatypes that exist and get their pointers, and then use these
>>>> pointers as arguments to ompi_datatype_dump() to see if any of
>>>> these existing datatypes are the ones you define.
>>>>
>>>> George.
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Dec 3, 2020 at 4:44 PM Patrick Bégou via users
>>>> <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> wrote:
>>>>
>>>>     Hi,
>>>>
>>>>     I'm trying to solve a memory leak since my new implementation of
>>>>     communications based on MPI_AllToAllW and MPI_type_Create_SubArray
>>>>     calls.  Arrays of SubArray types are created/destroyed at each
>>>>     time step and used for communications.
>>>>
>>>>     On my laptop the code runs fine (running for 15000 temporal
>>>>     itérations on 32 processes with oversubscription) but on our
>>>>     cluster memory used by the code increase until the OOMkiller stop
>>>>     the job. On the cluster we use IB QDR for communications.
>>>>
>>>>     Same Gcc/Gfortran 7.3 (built from sources), same sources of
>>>>     OpenMPI (3.1 or 4.0.5 tested), same sources of the fortran code on
>>>>     the laptop and on the cluster.
>>>>
>>>>     Using Gcc/Gfortran 4.8 and OpenMPI 1.7.3 on the cluster do not
>>>>     show the problem (resident memory do not increase and we ran
>>>>     100000 temporal iterations)
>>>>
>>>>     MPI_type_free manual says that it "/Marks the datatype object
>>>>     associated with datatype for deallocation/". But  how can I check
>>>>     that the deallocation is really done ?
>>>>
>>>>     Thanks for ant suggestions.
>>>>
>>>>     Patrick
>>>>
>>

Reply via email to