[ https://issues.apache.org/jira/browse/SPARK-46383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan resolved SPARK-46383. --------------------------------- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44321 [https://github.com/apache/spark/pull/44321] > Reduce Driver Heap Usage by Reducing the Lifespan of `TaskInfo.accumulables()` > ------------------------------------------------------------------------------ > > Key: SPARK-46383 > URL: https://issues.apache.org/jira/browse/SPARK-46383 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 4.0.0 > Reporter: Utkarsh Agarwal > Assignee: Utkarsh Agarwal > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: Screenshot 2023-11-06 at 3.56.26 PM.png, screenshot-1.png > > > `AccumulableInfo` is one of the top heap consumers in driver's heap dumps for > stages with many tasks. For a stage with a large number of tasks > ({_}O(100k){_}), we saw {*}{{*}}30%{{*}}{*} of the heap usage stemming from > `TaskInfo.accumulables()`. > !screenshot-1.png|width=641,height=98! > The `TaskSetManager` today keeps around the TaskInfo objects > ([ref1|https://github.com/apache/spark/blob/c1ba963e64a22dea28e17b1ed954e6d03d38da1e/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L134], > > [ref2|https://github.com/apache/spark/blob/c1ba963e64a22dea28e17b1ed954e6d03d38da1e/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L192])) > and in turn the task metrics (`AccumulableInfo`) for every task attempt > until the stage is completed. This means that for stages with a large number > of tasks, we keep metrics for all the tasks (`AccumulableInfo`) around even > when the task has completed and its metrics have been aggregated. Given a > task has a large number of metrics, stages with many tasks end up with a > large heap usage in the form of task metrics. > Ideally, we should clear up a task's TaskInfo upon the task's completion, > thereby reducing the driver's heap usage. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org