Hi,

Thanks for getting back with the semi solution!

Sorry that I was not responding before - I was trying to figure this out with 
some of my colleagues.

> I think the DeleteOnExit problem will mean it needs to be restarted every few 
> weeks, but that's acceptable for now.

I hope by the time you find this annoying, Hadoop issue will be fixed somehow 
for you (AWS using Hadoop 3.3+?)

Piotrek

> On 3 Feb 2020, at 15:54, Mark Harris <mark.har...@hivehome.com> wrote:
> 
> Hi all,
> 
> The out-of-memory heap dump had the answer - the job was failing with an 
> OutOfMemoryError because the activeBuckets members of 3 instances of 
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets were filling 
> a significant enough part of the memory of the taskmanager that no progress 
> was being made. Increasing the memory available to the TM seems to have fixed 
> the problem.
> 
> I think the DeleteOnExit problem will mean it needs to be restarted every few 
> weeks, but that's acceptable for now.
> 
> Thanks again,
> 
> Mark
> From: Mark Harris <mark.har...@hivehome.com>
> Sent: 30 January 2020 14:36
> To: Piotr Nowojski <pi...@ververica.com>
> Cc: Cliff Resnick <cre...@gmail.com>; David Magalhães 
> <speeddra...@gmail.com>; Till Rohrmann <trohrm...@apache.org>; 
> flink-u...@apache.org <flink-u...@apache.org>; kkloudas <kklou...@apache.org>
> Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks 
> for S3a files
>  
> Hi,
> 
> Thanks for your help with this. 🙂
> 
> The EMR cluster has 3 15GB VMs, and the flink cluster is started with:
> 
> /usr/lib/flink/bin/yarn-session.sh -d -n 3 -tm 5760 -jm 5760 -s 3
> 
> Usually the task runs for about 15 minutes before it restarts, usually due to 
> with an "java.lang.OutOfMemoryError: Java heap space" exception. 
> 
> The figures came from a MemoryAnalyzer session on a manual memory dump from 
> one of the taskmanagers. The total size of that heap was only 1.8gb.  In that 
> heap, 1.7gb is taken up by the static field "files" in DeleteOnExitHook, 
> which is a linked hash set containing the 9 million strings. 
> 
> A full example of one the path is 
> /tmp/hadoop-yarn/s3a/s3ablock-0001-6061210725685.tmp, at for 120 bytes per 
> char[] for a solid 1.2gb of chars. Then 200mb for their String wrappers and 
> another 361MB for LinkedHashMap$Entry objects. Despite valiantly holding on 
> to an array of 16777216 HashMap$Node elements, the LinkedHashMap can only 
> contribute another 20MB or so. 
> I goofed in not taking that 85% figure from MemoryAnalyzer - it tells me 
> DeleteOnExitHook is responsible for 96.98% of the heap dump.
> 
> Looking at the files it managed to write before this started to happen 
> regularly, it looks like they're being written approximately every 3 minutes. 
> I'll triple check our config, but I'm reasonably sure the job is configured 
> to checkpoint every 15 minutes - could something else be causing it to write?
> 
> This may all be a red herring - something else may be taking up the 
> taskmanagers memory which didn't make it into that heap dump. I plan to 
> repeat the analysis on a heapdump created by  -XX:+HeapDumpOnOutOfMemoryError 
> shortly.
> 
> Best regards,
> 
> Mark
> 
> From: Piotr Nowojski <pi...@ververica.com>
> Sent: 30 January 2020 13:44
> To: Mark Harris <mark.har...@hivehome.com>
> Cc: Cliff Resnick <cre...@gmail.com>; David Magalhães 
> <speeddra...@gmail.com>; Till Rohrmann <trohrm...@apache.org>; 
> flink-u...@apache.org <flink-u...@apache.org>; kkloudas <kklou...@apache.org>
> Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks 
> for S3a files
>  
> Hi,
> 
> What is your job setup? Size of the nodes, memory settings of the Flink/JVM?
> 
> 9 041 060 strings is awfully small number to bring down a whole cluster. With 
> each tmp string having ~30 bytes, that’s only 271MB. Is this really 85% of 
> the heap? And also, with parallelism of 6 and checkpoints every 15 minutes, 9 
> 000 000 of leaked strings should happen only after one month  assuming 
> 500-600 total number of buckets. (Also assuming that there is a separate file 
> per each bucket).
> 
> Piotrek 
> 
>> On 30 Jan 2020, at 14:21, Mark Harris <mark.har...@hivehome.com 
>> <mailto:mark.har...@hivehome.com>> wrote:
>> 
>> Trying a few different approaches to the fs.s3a.fast.upload settings has 
>> bought me no joy - the taskmanagers end up simply crashing or complaining of 
>> high GC load. Heap dumps suggest that this time they're clogged with buffers 
>> instead, which makes sense.
>> 
>> Our job has parallelism of 6 and checkpoints every 15 minutes - if anything, 
>> we'd like to increase the frequency of that checkpoint duration. I suspect 
>> this could be affected by the partition structure we were bucketing to as 
>> well, and at any given moment we could be receiving data for up to 280 
>> buckets at once.
>> Could this be a factor?
>> 
>> Best regards,
>> 
>> Mark
>> From: Piotr Nowojski <pi...@ververica.com <mailto:pi...@ververica.com>>
>> Sent: 27 January 2020 16:16
>> To: Cliff Resnick <cre...@gmail.com <mailto:cre...@gmail.com>>
>> Cc: David Magalhães <speeddra...@gmail.com <mailto:speeddra...@gmail.com>>; 
>> Mark Harris <mark.har...@hivehome.com <mailto:mark.har...@hivehome.com>>; 
>> Till Rohrmann <trohrm...@apache.org <mailto:trohrm...@apache.org>>; 
>> flink-u...@apache.org <mailto:flink-u...@apache.org> <flink-u...@apache.org 
>> <mailto:flink-u...@apache.org>>; kkloudas <kklou...@apache.org 
>> <mailto:kklou...@apache.org>>
>> Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks 
>> for S3a files
>>  
>> Hi,
>> 
>> I think reducing the frequency of the checkpoints and decreasing parallelism 
>> of the things using the S3AOutputStream class, would help to mitigate the 
>> issue. 
>> 
>> I don’t know about other solutions. I would suggest to ask this question 
>> directly to Steve L. in the bug ticket [1], as he is the one that fixed the 
>> issue. If there is no workaround, maybe it would be possible to put a 
>> pressure on the Hadoop guys to back port the fix to older versions?
>> 
>> Piotrek
>> 
>> [1] https://issues.apache.org/jira/browse/HADOOP-15658 
>> <https://issues.apache.org/jira/browse/HADOOP-15658>
>> 
>>> On 27 Jan 2020, at 15:41, Cliff Resnick <cre...@gmail.com 
>>> <mailto:cre...@gmail.com>> wrote:
>>> 
>>> I know from experience that Flink's shaded S3A FileSystem does not 
>>> reference core-site.xml, though I don't remember offhand what file (s) it 
>>> does reference. However since it's shaded, maybe this could be fixed by 
>>> building a Flink FS referencing 3.3.0? Last I checked I think it referenced 
>>> 3.1.0.
>>> 
>>> On Mon, Jan 27, 2020, 8:48 AM David Magalhães <speeddra...@gmail.com 
>>> <mailto:speeddra...@gmail.com>> wrote:
>>> Does StreamingFileSink use core-site.xml ? When I was using it, it didn't 
>>> load any configurations from core-site.xml.
>>> 
>>> On Mon, Jan 27, 2020 at 12:08 PM Mark Harris <mark.har...@hivehome.com 
>>> <mailto:mark.har...@hivehome.com>> wrote:
>>> Hi Piotr,
>>> 
>>> Thanks for the link to the issue.
>>> 
>>> Do you know if there's a workaround? I've tried setting the following in my 
>>> core-site.xml:
>>> 
>>> ​fs.s3a.fast.upload.buffer=true
>>> 
>>> To try and avoid writing the buffer files, but the taskmanager breaks with 
>>> the same problem.
>>> 
>>> Best regards,
>>> 
>>> Mark
>>> From: Piotr Nowojski <pi...@data-artisans.com 
>>> <mailto:pi...@data-artisans.com>> on behalf of Piotr Nowojski 
>>> <pi...@ververica.com <mailto:pi...@ververica.com>>
>>> Sent: 22 January 2020 13:29
>>> To: Till Rohrmann <trohrm...@apache.org <mailto:trohrm...@apache.org>>
>>> Cc: Mark Harris <mark.har...@hivehome.com 
>>> <mailto:mark.har...@hivehome.com>>; flink-u...@apache.org 
>>> <mailto:flink-u...@apache.org> <flink-u...@apache.org 
>>> <mailto:flink-u...@apache.org>>; kkloudas <kklou...@apache.org 
>>> <mailto:kklou...@apache.org>>
>>> Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks 
>>> for S3a files
>>>  
>>> Hi,
>>> 
>>> This is probably a known issue of Hadoop [1]. Unfortunately it was only 
>>> fixed in 3.3.0.
>>> 
>>> Piotrek
>>> 
>>> [1] https://issues.apache.org/jira/browse/HADOOP-15658 
>>> <https://issues.apache.org/jira/browse/HADOOP-15658>
>>> 
>>>> On 22 Jan 2020, at 13:56, Till Rohrmann <trohrm...@apache.org 
>>>> <mailto:trohrm...@apache.org>> wrote:
>>>> 
>>>> Thanks for reporting this issue Mark. I'm pulling Klou into this 
>>>> conversation who knows more about the StreamingFileSink. @Klou does the 
>>>> StreamingFileSink relies on DeleteOnExitHooks to clean up files?
>>>> 
>>>> Cheers,
>>>> Till
>>>> 
>>>> On Tue, Jan 21, 2020 at 3:38 PM Mark Harris <mark.har...@hivehome.com 
>>>> <mailto:mark.har...@hivehome.com>> wrote:
>>>> Hi,
>>>> 
>>>> We're using flink 1.7.2 on an EMR cluster v emr-5.22.0, which runs hadoop 
>>>> v "Amazon 2.8.5". We've recently noticed that some TaskManagers fail 
>>>> (causing all the jobs running on them to fail) with an 
>>>> "java.lang.OutOfMemoryError: GC overhead limit exceeded”. The taskmanager 
>>>> (and jobs that should be running on it) remain down until manually 
>>>> restarted.
>>>> 
>>>> I managed to take and analyze a memory dump from one of the afflicted 
>>>> taskmanagers. 
>>>> 
>>>> It showed that 85% of the heap was made up of the 
>>>> java.io.DeleteOnExitHook.files hashset. The majority of the strings in 
>>>> that hashset (9041060 out of ~9041100) pointed to files that began 
>>>> /tmp/hadoop-yarn/s3a/s3ablock
>>>> 
>>>> The problem seems to affect jobs that make use of the StreamingFileSink - 
>>>> all of the taskmanager crashes have been on the taskmaster running at 
>>>> least one job using this sink, and a cluster running only a single 
>>>> taskmanager / job that uses the StreamingFileSink crashed with the GC 
>>>> overhead limit exceeded error.
>>>> 
>>>> I've had a look for advice on handling this error more broadly without 
>>>> luck.
>>>> 
>>>> Any suggestions or advice gratefully received.
>>>> 
>>>> Best regards,
>>>> 
>>>> Mark Harris
>>>> 
>>>> 
>>>> 
>>>> The information contained in or attached to this email is intended only 
>>>> for the use of the individual or entity to which it is addressed. If you 
>>>> are not the intended recipient, or a person responsible for delivering it 
>>>> to the intended recipient, you are not authorised to and must not 
>>>> disclose, copy, distribute, or retain this message or any part of it. It 
>>>> may contain information which is confidential and/or covered by legal 
>>>> professional or other privilege under applicable law. 
>>>> 
>>>> The views expressed in this email are not necessarily the views of 
>>>> Centrica plc or its subsidiaries, and the company, its directors, officers 
>>>> or employees make no representation or accept any liability for its 
>>>> accuracy or completeness unless expressly stated to the contrary. 
>>>> 
>>>> Additional regulatory disclosures may be found here: 
>>>> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email 
>>>> <https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email> 
>>>> 
>>>> PH Jones is a trading name of British Gas Social Housing Limited. British 
>>>> Gas Social Housing Limited (company no: 01026007), British Gas Trading 
>>>> Limited (company no: 03078711), British Gas Services Limited (company no: 
>>>> 3141243), British Gas Insurance Limited (company no: 06608316), British 
>>>> Gas New Heating Limited (company no: 06723244), British Gas Services 
>>>> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) 
>>>> Limited (company no: 02877397) are all wholly owned subsidiaries of 
>>>> Centrica plc (company no: 3033654). Each company is registered in England 
>>>> and Wales with a registered office at Millstream, Maidenhead Road, 
>>>> Windsor, Berkshire SL4 5GD. 
>>>> 
>>>> British Gas Insurance Limited is authorised by the Prudential Regulation 
>>>> Authority and regulated by the Financial Conduct Authority and the 
>>>> Prudential Regulation Authority. British Gas Services Limited and Centrica 
>>>> Energy (Trading) Limited are authorised and regulated by the Financial 
>>>> Conduct Authority. British Gas Trading Limited is an appointed 
>>>> representative of British Gas Services Limited which is authorised and 
>>>> regulated by the Financial Conduct Authority.
>>> 
>>> 
>>> 
>>> The information contained in or attached to this email is intended only for 
>>> the use of the individual or entity to which it is addressed. If you are 
>>> not the intended recipient, or a person responsible for delivering it to 
>>> the intended recipient, you are not authorised to and must not disclose, 
>>> copy, distribute, or retain this message or any part of it. It may contain 
>>> information which is confidential and/or covered by legal professional or 
>>> other privilege under applicable law. 
>>> 
>>> The views expressed in this email are not necessarily the views of Centrica 
>>> plc or its subsidiaries, and the company, its directors, officers or 
>>> employees make no representation or accept any liability for its accuracy 
>>> or completeness unless expressly stated to the contrary. 
>>> 
>>> Additional regulatory disclosures may be found here: 
>>> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email 
>>> <https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email> 
>>> 
>>> PH Jones is a trading name of British Gas Social Housing Limited. British 
>>> Gas Social Housing Limited (company no: 01026007), British Gas Trading 
>>> Limited (company no: 03078711), British Gas Services Limited (company no: 
>>> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas 
>>> New Heating Limited (company no: 06723244), British Gas Services 
>>> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) 
>>> Limited (company no: 02877397) are all wholly owned subsidiaries of 
>>> Centrica plc (company no: 3033654). Each company is registered in England 
>>> and Wales with a registered office at Millstream, Maidenhead Road, Windsor, 
>>> Berkshire SL4 5GD. 
>>> 
>>> British Gas Insurance Limited is authorised by the Prudential Regulation 
>>> Authority and regulated by the Financial Conduct Authority and the 
>>> Prudential Regulation Authority. British Gas Services Limited and Centrica 
>>> Energy (Trading) Limited are authorised and regulated by the Financial 
>>> Conduct Authority. British Gas Trading Limited is an appointed 
>>> representative of British Gas Services Limited which is authorised and 
>>> regulated by the Financial Conduct Authority.
>> 
>> 
>> 
>> The information contained in or attached to this email is intended only for 
>> the use of the individual or entity to which it is addressed. If you are not 
>> the intended recipient, or a person responsible for delivering it to the 
>> intended recipient, you are not authorised to and must not disclose, copy, 
>> distribute, or retain this message or any part of it. It may contain 
>> information which is confidential and/or covered by legal professional or 
>> other privilege under applicable law. 
>> 
>> The views expressed in this email are not necessarily the views of Centrica 
>> plc or its subsidiaries, and the company, its directors, officers or 
>> employees make no representation or accept any liability for its accuracy or 
>> completeness unless expressly stated to the contrary. 
>> 
>> Additional regulatory disclosures may be found here: 
>> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email 
>> <https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email> 
>> 
>> PH Jones is a trading name of British Gas Social Housing Limited. British 
>> Gas Social Housing Limited (company no: 01026007), British Gas Trading 
>> Limited (company no: 03078711), British Gas Services Limited (company no: 
>> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas 
>> New Heating Limited (company no: 06723244), British Gas Services 
>> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) 
>> Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica 
>> plc (company no: 3033654). Each company is registered in England and Wales 
>> with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire 
>> SL4 5GD. 
>> 
>> British Gas Insurance Limited is authorised by the Prudential Regulation 
>> Authority and regulated by the Financial Conduct Authority and the 
>> Prudential Regulation Authority. British Gas Services Limited and Centrica 
>> Energy (Trading) Limited are authorised and regulated by the Financial 
>> Conduct Authority. British Gas Trading Limited is an appointed 
>> representative of British Gas Services Limited which is authorised and 
>> regulated by the Financial Conduct Authority.
> 
> 
> 
> The information contained in or attached to this email is intended only for 
> the use of the individual or entity to which it is addressed. If you are not 
> the intended recipient, or a person responsible for delivering it to the 
> intended recipient, you are not authorised to and must not disclose, copy, 
> distribute, or retain this message or any part of it. It may contain 
> information which is confidential and/or covered by legal professional or 
> other privilege under applicable law. 
> 
> The views expressed in this email are not necessarily the views of Centrica 
> plc or its subsidiaries, and the company, its directors, officers or 
> employees make no representation or accept any liability for its accuracy or 
> completeness unless expressly stated to the contrary. 
> 
> Additional regulatory disclosures may be found here: 
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email 
> 
> PH Jones is a trading name of British Gas Social Housing Limited. British Gas 
> Social Housing Limited (company no: 01026007), British Gas Trading Limited 
> (company no: 03078711), British Gas Services Limited (company no: 3141243), 
> British Gas Insurance Limited (company no: 06608316), British Gas New Heating 
> Limited (company no: 06723244), British Gas Services (Commercial) Limited 
> (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 
> 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 
> 3033654). Each company is registered in England and Wales with a registered 
> office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD. 
> 
> British Gas Insurance Limited is authorised by the Prudential Regulation 
> Authority and regulated by the Financial Conduct Authority and the Prudential 
> Regulation Authority. British Gas Services Limited and Centrica Energy 
> (Trading) Limited are authorised and regulated by the Financial Conduct 
> Authority. British Gas Trading Limited is an appointed representative of 
> British Gas Services Limited which is authorised and regulated by the 
> Financial Conduct Authority.
> 
> 
> The information contained in or attached to this email is intended only for 
> the use of the individual or entity to which it is addressed. If you are not 
> the intended recipient, or a person responsible for delivering it to the 
> intended recipient, you are not authorised to and must not disclose, copy, 
> distribute, or retain this message or any part of it. It may contain 
> information which is confidential and/or covered by legal professional or 
> other privilege under applicable law. 
> 
> The views expressed in this email are not necessarily the views of Centrica 
> plc or its subsidiaries, and the company, its directors, officers or 
> employees make no representation or accept any liability for its accuracy or 
> completeness unless expressly stated to the contrary. 
> 
> Additional regulatory disclosures may be found here: 
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email 
> <https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email> 
> 
> PH Jones is a trading name of British Gas Social Housing Limited. British Gas 
> Social Housing Limited (company no: 01026007), British Gas Trading Limited 
> (company no: 03078711), British Gas Services Limited (company no: 3141243), 
> British Gas Insurance Limited (company no: 06608316), British Gas New Heating 
> Limited (company no: 06723244), British Gas Services (Commercial) Limited 
> (company no: 07385984) and Centrica Energy (Trading) Limited (company no: 
> 02877397) are all wholly owned subsidiaries of Centrica plc (company no: 
> 3033654). Each company is registered in England and Wales with a registered 
> office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD. 
> 
> British Gas Insurance Limited is authorised by the Prudential Regulation 
> Authority and regulated by the Financial Conduct Authority and the Prudential 
> Regulation Authority. British Gas Services Limited and Centrica Energy 
> (Trading) Limited are authorised and regulated by the Financial Conduct 
> Authority. British Gas Trading Limited is an appointed representative of 
> British Gas Services Limited which is authorised and regulated by the 
> Financial Conduct Authority.

Reply via email to