Hi Mark,

Have you tried to set your rolling policy to close inactive part files
after some time [1]?
If the part files in the buckets are inactive and there are no new part
files, then the state handle for those buckets will also be removed.

Cheers,
Kostas

https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/rollingpolicies/DefaultRollingPolicy.html



On Mon, Feb 3, 2020 at 3:54 PM Mark Harris <mark.har...@hivehome.com> wrote:

> Hi all,
>
> The out-of-memory heap dump had the answer - the job was failing with an
> OutOfMemoryError because the activeBuckets members of 3 instances of
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets were
> filling a significant enough part of the memory of the taskmanager that no
> progress was being made. Increasing the memory available to the TM seems to
> have fixed the problem.
>
> I think the DeleteOnExit problem will mean it needs to be restarted every
> few weeks, but that's acceptable for now.
>
> Thanks again,
>
> Mark
> ------------------------------
> *From:* Mark Harris <mark.har...@hivehome.com>
> *Sent:* 30 January 2020 14:36
> *To:* Piotr Nowojski <pi...@ververica.com>
> *Cc:* Cliff Resnick <cre...@gmail.com>; David Magalhães <
> speeddra...@gmail.com>; Till Rohrmann <trohrm...@apache.org>;
> flink-u...@apache.org <flink-u...@apache.org>; kkloudas <
> kklou...@apache.org>
> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
> hooks for S3a files
>
> Hi,
>
> Thanks for your help with this. 🙂
>
> The EMR cluster has 3 15GB VMs, and the flink cluster is started with:
>
> /usr/lib/flink/bin/yarn-session.sh -d -n 3 -tm 5760 -jm 5760 -s 3
>
> Usually the task runs for about 15 minutes before it restarts, usually due
> to with an "java.lang.OutOfMemoryError: Java heap space" exception.
>
> The figures came from a MemoryAnalyzer session on a manual memory dump
> from one of the taskmanagers. The total size of that heap was only 1.8gb.
> In that heap, 1.7gb is taken up by the static field "files" in
> DeleteOnExitHook, which is a linked hash set containing the 9 million
> strings.
>
> A full example of one the path is
> /tmp/hadoop-yarn/s3a/s3ablock-0001-6061210725685.tmp, at for 120 bytes per
> char[] for a solid 1.2gb of chars. Then 200mb for their String wrappers and
> another 361MB for LinkedHashMap$Entry objects. Despite valiantly holding
> on to an array of 16777216 HashMap$Node elements, the LinkedHashMap can
> only contribute another 20MB or so.
> I goofed in not taking that 85% figure from MemoryAnalyzer - it tells
> me DeleteOnExitHook is responsible for 96.98% of the heap dump.
>
> Looking at the files it managed to write before this started to happen
> regularly, it looks like they're being written approximately every 3
> minutes. I'll triple check our config, but I'm reasonably sure the job is
> configured to checkpoint every 15 minutes - could something else be causing
> it to write?
>
> This may all be a red herring - something else may be taking up the
> taskmanagers memory which didn't make it into that heap dump. I plan to
> repeat the analysis on a heapdump created
> by  -XX:+HeapDumpOnOutOfMemoryError shortly.
>
> Best regards,
>
> Mark
>
> ------------------------------
> *From:* Piotr Nowojski <pi...@ververica.com>
> *Sent:* 30 January 2020 13:44
> *To:* Mark Harris <mark.har...@hivehome.com>
> *Cc:* Cliff Resnick <cre...@gmail.com>; David Magalhães <
> speeddra...@gmail.com>; Till Rohrmann <trohrm...@apache.org>;
> flink-u...@apache.org <flink-u...@apache.org>; kkloudas <
> kklou...@apache.org>
> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
> hooks for S3a files
>
> Hi,
>
> What is your job setup? Size of the nodes, memory settings of the
> Flink/JVM?
>
> 9 041 060 strings is awfully small number to bring down a whole cluster.
> With each tmp string having ~30 bytes, that’s only 271MB. Is this really
> 85% of the heap? And also, with parallelism of 6 and checkpoints every 15
> minutes, 9 000 000 of leaked strings should happen only after one month
>  assuming 500-600 total number of buckets. (Also assuming that there is a
> separate file per each bucket).
>
> Piotrek
>
> On 30 Jan 2020, at 14:21, Mark Harris <mark.har...@hivehome.com> wrote:
>
> Trying a few different approaches to the fs.s3a.fast.upload settings has
> bought me no joy - the taskmanagers end up simply crashing or complaining
> of high GC load. Heap dumps suggest that this time they're clogged with
> buffers instead, which makes sense.
>
> Our job has parallelism of 6 and checkpoints every 15 minutes - if
> anything, we'd like to increase the frequency of that checkpoint duration.
> I suspect this could be affected by the partition structure we were
> bucketing to as well, and at any given moment we could be receiving data
> for up to 280 buckets at once.
> Could this be a factor?
>
> Best regards,
>
> Mark
> ------------------------------
> *From:* Piotr Nowojski <pi...@ververica.com>
> *Sent:* 27 January 2020 16:16
> *To:* Cliff Resnick <cre...@gmail.com>
> *Cc:* David Magalhães <speeddra...@gmail.com>; Mark Harris <
> mark.har...@hivehome.com>; Till Rohrmann <trohrm...@apache.org>;
> flink-u...@apache.org <flink-u...@apache.org>; kkloudas <
> kklou...@apache.org>
> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
> hooks for S3a files
>
> Hi,
>
> I think reducing the frequency of the checkpoints and decreasing
> parallelism of the things using the S3AOutputStream class, would help to
> mitigate the issue.
>
> I don’t know about other solutions. I would suggest to ask this question
> directly to Steve L. in the bug ticket [1], as he is the one that fixed the
> issue. If there is no workaround, maybe it would be possible to put a
> pressure on the Hadoop guys to back port the fix to older versions?
>
> Piotrek
>
> [1] https://issues.apache.org/jira/browse/HADOOP-15658
>
> On 27 Jan 2020, at 15:41, Cliff Resnick <cre...@gmail.com> wrote:
>
> I know from experience that Flink's shaded S3A FileSystem does not
> reference core-site.xml, though I don't remember offhand what file (s) it
> does reference. However since it's shaded, maybe this could be fixed by
> building a Flink FS referencing 3.3.0? Last I checked I think it referenced
> 3.1.0.
>
> On Mon, Jan 27, 2020, 8:48 AM David Magalhães <speeddra...@gmail.com>
> wrote:
>
> Does StreamingFileSink use core-site.xml ? When I was using it, it didn't
> load any configurations from core-site.xml.
>
> On Mon, Jan 27, 2020 at 12:08 PM Mark Harris <mark.har...@hivehome.com>
> wrote:
>
> Hi Piotr,
>
> Thanks for the link to the issue.
>
> Do you know if there's a workaround? I've tried setting the following in
> my core-site.xml:
>
> ​fs.s3a.fast.upload.buffer=true
>
> To try and avoid writing the buffer files, but the taskmanager breaks with
> the same problem.
>
> Best regards,
>
> Mark
> ------------------------------
> *From:* Piotr Nowojski <pi...@data-artisans.com> on behalf of Piotr
> Nowojski <pi...@ververica.com>
> *Sent:* 22 January 2020 13:29
> *To:* Till Rohrmann <trohrm...@apache.org>
> *Cc:* Mark Harris <mark.har...@hivehome.com>; flink-u...@apache.org <
> flink-u...@apache.org>; kkloudas <kklou...@apache.org>
> *Subject:* Re: GC overhead limit exceeded, memory full of DeleteOnExit
> hooks for S3a files
>
> Hi,
>
> This is probably a known issue of Hadoop [1]. Unfortunately it was only
> fixed in 3.3.0.
>
> Piotrek
>
> [1] https://issues.apache.org/jira/browse/HADOOP-15658
>
> On 22 Jan 2020, at 13:56, Till Rohrmann <trohrm...@apache.org> wrote:
>
> Thanks for reporting this issue Mark. I'm pulling Klou into this
> conversation who knows more about the StreamingFileSink. @Klou does the
> StreamingFileSink relies on DeleteOnExitHooks to clean up files?
>
> Cheers,
> Till
>
> On Tue, Jan 21, 2020 at 3:38 PM Mark Harris <mark.har...@hivehome.com>
> wrote:
>
> Hi,
>
> We're using flink 1.7.2 on an EMR cluster v emr-5.22.0, which runs hadoop
> v "Amazon 2.8.5". We've recently noticed that some TaskManagers fail
> (causing all the jobs running on them to fail) with an
> "java.lang.OutOfMemoryError: GC overhead limit exceeded”. The taskmanager
> (and jobs that should be running on it) remain down until manually
> restarted.
>
> I managed to take and analyze a memory dump from one of the afflicted
> taskmanagers.
>
> It showed that 85% of the heap was made up of
> the java.io.DeleteOnExitHook.files hashset. The majority of the strings in
> that hashset (9041060 out of ~9041100) pointed to files that began
> /tmp/hadoop-yarn/s3a/s3ablock
>
> The problem seems to affect jobs that make use of the StreamingFileSink -
> all of the taskmanager crashes have been on the taskmaster running at least
> one job using this sink, and a cluster running only a single taskmanager /
> job that uses the StreamingFileSink crashed with the GC overhead limit
> exceeded error.
>
> I've had a look for advice on handling this error more broadly without
> luck.
>
> Any suggestions or advice gratefully received.
>
> Best regards,
>
> Mark Harris
>
>
>
> The information contained in or attached to this email is intended only
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, or a person responsible for delivering it
> to the intended recipient, you are not authorised to and must not disclose,
> copy, distribute, or retain this message or any part of it. It may contain
> information which is confidential and/or covered by legal professional or
> other privilege under applicable law.
>
> The views expressed in this email are not necessarily the views of
> Centrica plc or its subsidiaries, and the company, its directors, officers
> or employees make no representation or accept any liability for its
> accuracy or completeness unless expressly stated to the contrary.
>
> Additional regulatory disclosures may be found here:
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>
> PH Jones is a trading name of British Gas Social Housing Limited. British
> Gas Social Housing Limited (company no: 01026007), British Gas Trading
> Limited (company no: 03078711), British Gas Services Limited (company no:
> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
> New Heating Limited (company no: 06723244), British Gas Services
> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
> Limited (company no: 02877397) are all wholly owned subsidiaries of
> Centrica plc (company no: 3033654). Each company is registered in England
> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
> Berkshire SL4 5GD.
>
> British Gas Insurance Limited is authorised by the Prudential Regulation
> Authority and regulated by the Financial Conduct Authority and the
> Prudential Regulation Authority. British Gas Services Limited and Centrica
> Energy (Trading) Limited are authorised and regulated by the Financial
> Conduct Authority. British Gas Trading Limited is an appointed
> representative of British Gas Services Limited which is authorised and
> regulated by the Financial Conduct Authority.
>
>
>
>
> The information contained in or attached to this email is intended only
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, or a person responsible for delivering it
> to the intended recipient, you are not authorised to and must not disclose,
> copy, distribute, or retain this message or any part of it. It may contain
> information which is confidential and/or covered by legal professional or
> other privilege under applicable law.
>
> The views expressed in this email are not necessarily the views of
> Centrica plc or its subsidiaries, and the company, its directors, officers
> or employees make no representation or accept any liability for its
> accuracy or completeness unless expressly stated to the contrary.
>
> Additional regulatory disclosures may be found here:
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>
> PH Jones is a trading name of British Gas Social Housing Limited. British
> Gas Social Housing Limited (company no: 01026007), British Gas Trading
> Limited (company no: 03078711), British Gas Services Limited (company no:
> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
> New Heating Limited (company no: 06723244), British Gas Services
> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
> Limited (company no: 02877397) are all wholly owned subsidiaries of
> Centrica plc (company no: 3033654). Each company is registered in England
> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
> Berkshire SL4 5GD.
>
> British Gas Insurance Limited is authorised by the Prudential Regulation
> Authority and regulated by the Financial Conduct Authority and the
> Prudential Regulation Authority. British Gas Services Limited and Centrica
> Energy (Trading) Limited are authorised and regulated by the Financial
> Conduct Authority. British Gas Trading Limited is an appointed
> representative of British Gas Services Limited which is authorised and
> regulated by the Financial Conduct Authority.
>
>
>
>
> The information contained in or attached to this email is intended only
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, or a person responsible for delivering it
> to the intended recipient, you are not authorised to and must not disclose,
> copy, distribute, or retain this message or any part of it. It may contain
> information which is confidential and/or covered by legal professional or
> other privilege under applicable law.
>
> The views expressed in this email are not necessarily the views of
> Centrica plc or its subsidiaries, and the company, its directors, officers
> or employees make no representation or accept any liability for its
> accuracy or completeness unless expressly stated to the contrary.
>
> Additional regulatory disclosures may be found here:
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>
> PH Jones is a trading name of British Gas Social Housing Limited. British
> Gas Social Housing Limited (company no: 01026007), British Gas Trading
> Limited (company no: 03078711), British Gas Services Limited (company no:
> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
> New Heating Limited (company no: 06723244), British Gas Services
> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
> Limited (company no: 02877397) are all wholly owned subsidiaries of
> Centrica plc (company no: 3033654). Each company is registered in England
> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
> Berkshire SL4 5GD.
>
> British Gas Insurance Limited is authorised by the Prudential Regulation
> Authority and regulated by the Financial Conduct Authority and the
> Prudential Regulation Authority. British Gas Services Limited and Centrica
> Energy (Trading) Limited are authorised and regulated by the Financial
> Conduct Authority. British Gas Trading Limited is an appointed
> representative of British Gas Services Limited which is authorised and
> regulated by the Financial Conduct Authority.
>
>
>
>
> The information contained in or attached to this email is intended only
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, or a person responsible for delivering it
> to the intended recipient, you are not authorised to and must not disclose,
> copy, distribute, or retain this message or any part of it. It may contain
> information which is confidential and/or covered by legal professional or
> other privilege under applicable law.
>
> The views expressed in this email are not necessarily the views of
> Centrica plc or its subsidiaries, and the company, its directors, officers
> or employees make no representation or accept any liability for its
> accuracy or completeness unless expressly stated to the contrary.
>
> Additional regulatory disclosures may be found here:
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>
> PH Jones is a trading name of British Gas Social Housing Limited. British
> Gas Social Housing Limited (company no: 01026007), British Gas Trading
> Limited (company no: 03078711), British Gas Services Limited (company no:
> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
> New Heating Limited (company no: 06723244), British Gas Services
> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
> Limited (company no: 02877397) are all wholly owned subsidiaries of
> Centrica plc (company no: 3033654). Each company is registered in England
> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
> Berkshire SL4 5GD.
>
> British Gas Insurance Limited is authorised by the Prudential Regulation
> Authority and regulated by the Financial Conduct Authority and the
> Prudential Regulation Authority. British Gas Services Limited and Centrica
> Energy (Trading) Limited are authorised and regulated by the Financial
> Conduct Authority. British Gas Trading Limited is an appointed
> representative of British Gas Services Limited which is authorised and
> regulated by the Financial Conduct Authority.
>
>
> The information contained in or attached to this email is intended only
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, or a person responsible for delivering it
> to the intended recipient, you are not authorised to and must not disclose,
> copy, distribute, or retain this message or any part of it. It may contain
> information which is confidential and/or covered by legal professional or
> other privilege under applicable law.
>
> The views expressed in this email are not necessarily the views of
> Centrica plc or its subsidiaries, and the company, its directors, officers
> or employees make no representation or accept any liability for its
> accuracy or completeness unless expressly stated to the contrary.
>
> Additional regulatory disclosures may be found here:
> https://www.centrica.com/privacy-cookies-and-legal-disclaimer#email
>
> PH Jones is a trading name of British Gas Social Housing Limited. British
> Gas Social Housing Limited (company no: 01026007), British Gas Trading
> Limited (company no: 03078711), British Gas Services Limited (company no:
> 3141243), British Gas Insurance Limited (company no: 06608316), British Gas
> New Heating Limited (company no: 06723244), British Gas Services
> (Commercial) Limited (company no: 07385984) and Centrica Energy (Trading)
> Limited (company no: 02877397) are all wholly owned subsidiaries of
> Centrica plc (company no: 3033654). Each company is registered in England
> and Wales with a registered office at Millstream, Maidenhead Road, Windsor,
> Berkshire SL4 5GD.
>
> British Gas Insurance Limited is authorised by the Prudential Regulation
> Authority and regulated by the Financial Conduct Authority and the
> Prudential Regulation Authority. British Gas Services Limited and Centrica
> Energy (Trading) Limited are authorised and regulated by the Financial
> Conduct Authority. British Gas Trading Limited is an appointed
> representative of British Gas Services Limited which is authorised and
> regulated by the Financial Conduct Authority.
>

Reply via email to