[ https://issues.apache.org/jira/browse/BEAM-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thomas Weise resolved BEAM-8962. -------------------------------- Resolution: Fixed > FlinkMetricContainer causes churn in the JobManager and lets the web frontend > malfunction > ----------------------------------------------------------------------------------------- > > Key: BEAM-8962 > URL: https://issues.apache.org/jira/browse/BEAM-8962 > Project: Beam > Issue Type: Bug > Components: runner-flink > Reporter: Maximilian Michels > Assignee: Maximilian Michels > Priority: Major > Fix For: 2.19.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > The {{FlinkMetricContainer}} wraps the Beam metric container for reporting > metrics, but also stores them as Flink accumulators. With high parallelism > jobs with over a thousand tasks and many built-in Beam metrics for every Beam > step, this can accumulate to over 100MB of serialized data which is stored in > the JobManager's ExecutionGraph. This then fails to even sent over the wire, > due to the akka.framesize limit (10MB by default), and manifests in {{500 > Internal Server Error}}s in the web frontend. > We need to introduce an option to disable the reporting via accumulators. It > is mostly useful for batch workloads where you can retrieve the final > accumulator values at the end of the job. It adds a lot of memory and network > overhead. > Perhaps we could even turn off the accumulators for streaming jobs, or > entirely and make them opt-in. -- This message was sent by Atlassian Jira (v8.3.4#803005)