Hi, Thanks for getting back with the semi solution!
Sorry that I was not responding before - I was trying to figure this out with some of my colleagues. > I think the DeleteOnExit problem will mean it needs to be restarted every few > weeks, but that's acceptable for now. I hope by the time you find this annoying, Hadoop issue will be fixed somehow for you (AWS using Hadoop 3.3+?) Piotrek > On 3 Feb 2020, at 15:54, Mark Harris <mark.har...@hivehome.com> wrote: > > Hi all, > > The out-of-memory heap dump had the answer - the job was failing with an > OutOfMemoryError because the activeBuckets members of 3 instances of > org.apache.flink.streaming.api.functions.sink.filesystem.Buckets were filling > a significant enough part of the memory of the taskmanager that no progress > was being made. Increasing the memory available to the TM seems to have fixed > the problem. > > I think the DeleteOnExit problem will mean it needs to be restarted every few > weeks, but that's acceptable for now. > > Thanks again, > > Mark > From: Mark Harris <mark.har...@hivehome.com> > Sent: 30 January 2020 14:36 > To: Piotr Nowojski <pi...@ververica.com> > Cc: Cliff Resnick <cre...@gmail.com>; David Magalhães > <speeddra...@gmail.com>; Till Rohrmann <trohrm...@apache.org>; > flink-u...@apache.org <flink-u...@apache.org>; kkloudas <kklou...@apache.org> > Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks > for S3a files > > Hi, > > Thanks for your help with this. 🙂 > > The EMR cluster has 3 15GB VMs, and the flink cluster is started with: > > /usr/lib/flink/bin/yarn-session.sh -d -n 3 -tm 5760 -jm 5760 -s 3 > > Usually the task runs for about 15 minutes before it restarts, usually due to > with an "java.lang.OutOfMemoryError: Java heap space" exception. > > The figures came from a MemoryAnalyzer session on a manual memory dump from > one of the taskmanagers. The total size of that heap was only 1.8gb. In that > heap, 1.7gb is taken up by the static field "files" in DeleteOnExitHook, > which is a linked hash set containing the 9 million strings. > > A full example of one the path is > /tmp/hadoop-yarn/s3a/s3ablock-0001-6061210725685.tmp, at for 120 bytes per > char[] for a solid 1.2gb of chars. Then 200mb for their String wrappers and > another 361MB for LinkedHashMap$Entry objects. Despite valiantly holding on > to an array of 16777216 HashMap$Node elements, the LinkedHashMap can only > contribute another 20MB or so. > I goofed in not taking that 85% figure from MemoryAnalyzer - it tells me > DeleteOnExitHook is responsible for 96.98% of the heap dump. > > Looking at the files it managed to write before this started to happen > regularly, it looks like they're being written approximately every 3 minutes. > I'll triple check our config, but I'm reasonably sure the job is configured > to checkpoint every 15 minutes - could something else be causing it to write? > > This may all be a red herring - something else may be taking up the > taskmanagers memory which didn't make it into that heap dump. I plan to > repeat the analysis on a heapdump created by -XX:+HeapDumpOnOutOfMemoryError > shortly. > > Best regards, > > Mark > > From: Piotr Nowojski <pi...@ververica.com> > Sent: 30 January 2020 13:44 > To: Mark Harris <mark.har...@hivehome.com> > Cc: Cliff Resnick <cre...@gmail.com>; David Magalhães > <speeddra...@gmail.com>; Till Rohrmann <trohrm...@apache.org>; > flink-u...@apache.org <flink-u...@apache.org>; kkloudas <kklou...@apache.org> > Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks > for S3a files > > Hi, > > What is your job setup? Size of the nodes, memory settings of the Flink/JVM? > > 9 041 060 strings is awfully small number to bring down a whole cluster. With > each tmp string having ~30 bytes, that’s only 271MB. Is this really 85% of > the heap? And also, with parallelism of 6 and checkpoints every 15 minutes, 9 > 000 000 of leaked strings should happen only after one month assuming > 500-600 total number of buckets. (Also assuming that there is a separate file > per each bucket). > > Piotrek > >> On 30 Jan 2020, at 14:21, Mark Harris <mark.har...@hivehome.com >> <mailto:mark.har...@hivehome.com>> wrote: >> >> Trying a few different approaches to the fs.s3a.fast.upload settings has >> bought me no joy - the taskmanagers end up simply crashing or complaining of >> high GC load. Heap dumps suggest that this time they're clogged with buffers >> instead, which makes sense. >> >> Our job has parallelism of 6 and checkpoints every 15 minutes - if anything, >> we'd like to increase the frequency of that checkpoint duration. I suspect >> this could be affected by the partition structure we were bucketing to as >> well, and at any given moment we could be receiving data for up to 280 >> buckets at once. >> Could this be a factor? >> >> Best regards, >> >> Mark >> From: Piotr Nowojski <pi...@ververica.com <mailto:pi...@ververica.com>> >> Sent: 27 January 2020 16:16 >> To: Cliff Resnick <cre...@gmail.com <mailto:cre...@gmail.com>> >> Cc: David Magalhães <speeddra...@gmail.com <mailto:speeddra...@gmail.com>>; >> Mark Harris <mark.har...@hivehome.com <mailto:mark.har...@hivehome.com>>; >> Till Rohrmann <trohrm...@apache.org <mailto:trohrm...@apache.org>>; >> flink-u...@apache.org <mailto:flink-u...@apache.org> <flink-u...@apache.org >> <mailto:flink-u...@apache.org>>; kkloudas <kklou...@apache.org >> <mailto:kklou...@apache.org>> >> Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks >> for S3a files >> >> Hi, >> >> I think reducing the frequency of the checkpoints and decreasing parallelism >> of the things using the S3AOutputStream class, would help to mitigate the >> issue. >> >> I don’t know about other solutions. I would suggest to ask this question >> directly to Steve L. in the bug ticket [1], as he is the one that fixed the >> issue. If there is no workaround, maybe it would be possible to put a >> pressure on the Hadoop guys to back port the fix to older versions? >> >> Piotrek >> >> [1] https://issues.apache.org/jira/browse/HADOOP-15658 >> <https://issues.apache.org/jira/browse/HADOOP-15658> >> >>> On 27 Jan 2020, at 15:41, Cliff Resnick <cre...@gmail.com >>> <mailto:cre...@gmail.com>> wrote: >>> >>> I know from experience that Flink's shaded S3A FileSystem does not >>> reference core-site.xml, though I don't remember offhand what file (s) it >>> does reference. However since it's shaded, maybe this could be fixed by >>> building a Flink FS referencing 3.3.0? Last I checked I think it referenced >>> 3.1.0. >>> >>> On Mon, Jan 27, 2020, 8:48 AM David Magalhães <speeddra...@gmail.com >>> <mailto:speeddra...@gmail.com>> wrote: >>> Does StreamingFileSink use core-site.xml ? When I was using it, it didn't >>> load any configurations from core-site.xml. >>> >>> On Mon, Jan 27, 2020 at 12:08 PM Mark Harris <mark.har...@hivehome.com >>> <mailto:mark.har...@hivehome.com>> wrote: >>> Hi Piotr, >>> >>> Thanks for the link to the issue. >>> >>> Do you know if there's a workaround? I've tried setting the following in my >>> core-site.xml: >>> >>> fs.s3a.fast.upload.buffer=true >>> >>> To try and avoid writing the buffer files, but the taskmanager breaks with >>> the same problem. >>> >>> Best regards, >>> >>> Mark >>> From: Piotr Nowojski <pi...@data-artisans.com >>> <mailto:pi...@data-artisans.com>> on behalf of Piotr Nowojski >>> <pi...@ververica.com <mailto:pi...@ververica.com>> >>> Sent: 22 January 2020 13:29 >>> To: Till Rohrmann <trohrm...@apache.org <mailto:trohrm...@apache.org>> >>> Cc: Mark Harris <mark.har...@hivehome.com >>> <mailto:mark.har...@hivehome.com>>; flink-u...@apache.org >>> <mailto:flink-u...@apache.org> <flink-u...@apache.org >>> <mailto:flink-u...@apache.org>>; kkloudas <kklou...@apache.org >>> <mailto:kklou...@apache.org>> >>> Subject: Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks >>> for S3a files >>> >>> Hi, >>> >>> This is probably a known issue of Hadoop [1]. Unfortunately it was only >>> fixed in 3.3.0. >>> >>> Piotrek >>> >>> [1] https://issues.apache.org/jira/browse/HADOOP-15658 >>> <https://issues.apache.org/jira/browse/HADOOP-15658> >>> >>>> On 22 Jan 2020, at 13:56, Till Rohrmann <trohrm...@apache.org >>>> <mailto:trohrm...@apache.org>> wrote: >>>> >>>> Thanks for reporting this issue Mark. I'm pulling Klou into this >>>> conversation who knows more about the StreamingFileSink. @Klou does the >>>> StreamingFileSink relies on DeleteOnExitHooks to clean up files? >>>> >>>> Cheers, >>>> Till >>>> >>>> On Tue, Jan 21, 2020 at 3:38 PM Mark Harris <mark.har...@hivehome.com >>>> <mailto:mark.har...@hivehome.com>> wrote: >>>> Hi, >>>> >>>> We're using flink 1.7.2 on an EMR cluster v emr-5.22.0, which runs hadoop >>>> v "Amazon 2.8.5". We've recently noticed that some TaskManagers fail >>>> (causing all the jobs running on them to fail) with an >>>> "java.lang.OutOfMemoryError: GC overhead limit exceeded”. The taskmanager >>>> (and jobs that should be running on it) remain down until manually >>>> restarted. >>>> >>>> I managed to take and analyze a memory dump from one of the afflicted >>>> taskmanagers. >>>> >>>> It showed that 85% of the heap was made up of the >>>> java.io.DeleteOnExitHook.files hashset. The majority of the strings in >>>> that hashset (9041060 out of ~9041100) pointed to files that began >>>> /tmp/hadoop-yarn/s3a/s3ablock >>>> >>>> The problem seems to affect jobs that make use of the StreamingFileSink - >>>> all of the taskmanager crashes have been on the taskmaster running at >>>> least one job using this sink, and a cluster running only a single >>>> taskmanager / job that uses the StreamingFileSink crashed with the GC >>>> overhead limit exceeded error. >>>> >>>> I've had a look for advice on handling this error more broadly without >>>> luck. >>>> >>>> Any suggestions or advice gratefully received. >>>> >>>> Best regards, >>>> >>>> Mark Harris >>>> >>>> >>>> >>>> The information contained in or attached to this email is intended only >>>> for the use of the individual or entity to which it is addressed. If you >>>> are not the intended recipient, or a person responsible for delivering it >>>> to the intended recipient, you are not authorised to and must not >>>> disclose, copy, distribute, or retain this message or any part of it. It >>>> may contain information which is confidential and/or covered by legal >>>> professional or other privilege under applicable law. >>>> >>>> The views expressed in this email are not necessarily the views of >>>> Centrica plc or its subsidiaries, and the company, its directors, officers >>>> or employees make no representation or accept any liability for its >>>> accuracy or completeness unless expressly stated to the contrary. >>>> >>>> Additional regulatory disclosures may be found here: >>>> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email >>>> <https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email> >>>> >>>> PH Jones is a trading name of British Gas Social Housing Limited. British >>>> Gas Social Housing Limited (company no: 01026007), British Gas Trading >>>> Limited (company no: 03078711), British Gas Services Limited (company no: >>>> 3141243), British Gas Insurance Limited (company no: 06608316), British >>>> Gas New Heating Limited (company no: 06723244), British Gas Services >>>> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) >>>> Limited (company no: 02877397) are all wholly owned subsidiaries of >>>> Centrica plc (company no: 3033654). Each company is registered in England >>>> and Wales with a registered office at Millstream, Maidenhead Road, >>>> Windsor, Berkshire SL4 5GD. >>>> >>>> British Gas Insurance Limited is authorised by the Prudential Regulation >>>> Authority and regulated by the Financial Conduct Authority and the >>>> Prudential Regulation Authority. British Gas Services Limited and Centrica >>>> Energy (Trading) Limited are authorised and regulated by the Financial >>>> Conduct Authority. British Gas Trading Limited is an appointed >>>> representative of British Gas Services Limited which is authorised and >>>> regulated by the Financial Conduct Authority. >>> >>> >>> >>> The information contained in or attached to this email is intended only for >>> the use of the individual or entity to which it is addressed. If you are >>> not the intended recipient, or a person responsible for delivering it to >>> the intended recipient, you are not authorised to and must not disclose, >>> copy, distribute, or retain this message or any part of it. It may contain >>> information which is confidential and/or covered by legal professional or >>> other privilege under applicable law. >>> >>> The views expressed in this email are not necessarily the views of Centrica >>> plc or its subsidiaries, and the company, its directors, officers or >>> employees make no representation or accept any liability for its accuracy >>> or completeness unless expressly stated to the contrary. >>> >>> Additional regulatory disclosures may be found here: >>> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email >>> <https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email> >>> >>> PH Jones is a trading name of British Gas Social Housing Limited. British >>> Gas Social Housing Limited (company no: 01026007), British Gas Trading >>> Limited (company no: 03078711), British Gas Services Limited (company no: >>> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas >>> New Heating Limited (company no: 06723244), British Gas Services >>> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) >>> Limited (company no: 02877397) are all wholly owned subsidiaries of >>> Centrica plc (company no: 3033654). Each company is registered in England >>> and Wales with a registered office at Millstream, Maidenhead Road, Windsor, >>> Berkshire SL4 5GD. >>> >>> British Gas Insurance Limited is authorised by the Prudential Regulation >>> Authority and regulated by the Financial Conduct Authority and the >>> Prudential Regulation Authority. British Gas Services Limited and Centrica >>> Energy (Trading) Limited are authorised and regulated by the Financial >>> Conduct Authority. British Gas Trading Limited is an appointed >>> representative of British Gas Services Limited which is authorised and >>> regulated by the Financial Conduct Authority. >> >> >> >> The information contained in or attached to this email is intended only for >> the use of the individual or entity to which it is addressed. If you are not >> the intended recipient, or a person responsible for delivering it to the >> intended recipient, you are not authorised to and must not disclose, copy, >> distribute, or retain this message or any part of it. It may contain >> information which is confidential and/or covered by legal professional or >> other privilege under applicable law. >> >> The views expressed in this email are not necessarily the views of Centrica >> plc or its subsidiaries, and the company, its directors, officers or >> employees make no representation or accept any liability for its accuracy or >> completeness unless expressly stated to the contrary. >> >> Additional regulatory disclosures may be found here: >> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email >> <https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email> >> >> PH Jones is a trading name of British Gas Social Housing Limited. British >> Gas Social Housing Limited (company no: 01026007), British Gas Trading >> Limited (company no: 03078711), British Gas Services Limited (company no: >> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas >> New Heating Limited (company no: 06723244), British Gas Services >> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading) >> Limited (company no: 02877397) are all wholly owned subsidiaries of Centrica >> plc (company no: 3033654). Each company is registered in England and Wales >> with a registered office at Millstream, Maidenhead Road, Windsor, Berkshire >> SL4 5GD. >> >> British Gas Insurance Limited is authorised by the Prudential Regulation >> Authority and regulated by the Financial Conduct Authority and the >> Prudential Regulation Authority. British Gas Services Limited and Centrica >> Energy (Trading) Limited are authorised and regulated by the Financial >> Conduct Authority. British Gas Trading Limited is an appointed >> representative of British Gas Services Limited which is authorised and >> regulated by the Financial Conduct Authority. > > > > The information contained in or attached to this email is intended only for > the use of the individual or entity to which it is addressed. If you are not > the intended recipient, or a person responsible for delivering it to the > intended recipient, you are not authorised to and must not disclose, copy, > distribute, or retain this message or any part of it. It may contain > information which is confidential and/or covered by legal professional or > other privilege under applicable law. > > The views expressed in this email are not necessarily the views of Centrica > plc or its subsidiaries, and the company, its directors, officers or > employees make no representation or accept any liability for its accuracy or > completeness unless expressly stated to the contrary. > > Additional regulatory disclosures may be found here: > https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email > > PH Jones is a trading name of British Gas Social Housing Limited. British Gas > Social Housing Limited (company no: 01026007), British Gas Trading Limited > (company no: 03078711), British Gas Services Limited (company no: 3141243), > British Gas Insurance Limited (company no: 06608316), British Gas New Heating > Limited (company no: 06723244), British Gas Services (Commercial) Limited > (company no: 07385984) and Centrica Energy (Trading) Limited (company no: > 02877397) are all wholly owned subsidiaries of Centrica plc (company no: > 3033654). Each company is registered in England and Wales with a registered > office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD. > > British Gas Insurance Limited is authorised by the Prudential Regulation > Authority and regulated by the Financial Conduct Authority and the Prudential > Regulation Authority. British Gas Services Limited and Centrica Energy > (Trading) Limited are authorised and regulated by the Financial Conduct > Authority. British Gas Trading Limited is an appointed representative of > British Gas Services Limited which is authorised and regulated by the > Financial Conduct Authority. > > > The information contained in or attached to this email is intended only for > the use of the individual or entity to which it is addressed. If you are not > the intended recipient, or a person responsible for delivering it to the > intended recipient, you are not authorised to and must not disclose, copy, > distribute, or retain this message or any part of it. It may contain > information which is confidential and/or covered by legal professional or > other privilege under applicable law. > > The views expressed in this email are not necessarily the views of Centrica > plc or its subsidiaries, and the company, its directors, officers or > employees make no representation or accept any liability for its accuracy or > completeness unless expressly stated to the contrary. > > Additional regulatory disclosures may be found here: > https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email > <https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email> > > PH Jones is a trading name of British Gas Social Housing Limited. British Gas > Social Housing Limited (company no: 01026007), British Gas Trading Limited > (company no: 03078711), British Gas Services Limited (company no: 3141243), > British Gas Insurance Limited (company no: 06608316), British Gas New Heating > Limited (company no: 06723244), British Gas Services (Commercial) Limited > (company no: 07385984) and Centrica Energy (Trading) Limited (company no: > 02877397) are all wholly owned subsidiaries of Centrica plc (company no: > 3033654). Each company is registered in England and Wales with a registered > office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD. > > British Gas Insurance Limited is authorised by the Prudential Regulation > Authority and regulated by the Financial Conduct Authority and the Prudential > Regulation Authority. British Gas Services Limited and Centrica Energy > (Trading) Limited are authorised and regulated by the Financial Conduct > Authority. British Gas Trading Limited is an appointed representative of > British Gas Services Limited which is authorised and regulated by the > Financial Conduct Authority.