PS: Also pulling in Nico (CC'd) who is working on the network stack.

On Thu, Aug 17, 2017 at 11:23 AM, Ufuk Celebi <u...@apache.org> wrote:
> Hey Gwenhael,
>
> the network buffers are recycled automatically after a job terminates.
> If this does not happen, it would be quite a major bug.
>
> To help debug this:
>
> - Which version of Flink are you using?
> - Does the job fail immediately after submission or later during execution?
> - Is the following correct: the batch job that eventually fails
> because of missing network buffers runs without problems if you submit
> it to a fresh cluster with the same memory
>
> The network buffers are recycled after the task managers report the
> task being finished. If you immediately submit the next batch there is
> a slight chance that the buffers are not recycled yet. As a possible
> temporary work around, could you try waiting for a short amount of
> time before submitting the next batch?
>
> I think we should also be able to run the job without splitting it up
> after increasing the network memory configuration. Did you already try
> this?
>
> Best,
>
> Ufuk
>
>
> On Thu, Aug 17, 2017 at 10:38 AM, Gwenhael Pasquiers
> <gwenhael.pasqui...@ericsson.com> wrote:
>> Hello,
>>
>>
>>
>> We’re meeting a limit with the numberOfBuffers.
>>
>>
>>
>> In a quite complex job we do a lot of operations, with a lot of operators,
>> on a lot of folders (datehours).
>>
>>
>>
>> In order to split the job into smaller “batches” (to limit the necessary
>> “numberOfBuffers”) I’ve done a loop over the batches (handle the datehours 3
>> by 3), for each batch I create a new env then call the execute() method.
>>
>>
>>
>> However it looks like there is no cleanup : after a while, if the number of
>> batches is too big, there is an error saying that the numberOfBuffers isn’t
>> high enough. It kinds of looks like some leak. Is there a way to clean them
>> up ?

Reply via email to