[ https://issues.apache.org/jira/browse/SPARK-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ian updated SPARK-13872: ------------------------ Attachment: Screen Shot 2016-03-11 at 5.42.32 PM.png > Memory leak SortMergeOuterJoin > ------------------------------ > > Key: SPARK-13872 > URL: https://issues.apache.org/jira/browse/SPARK-13872 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.6.1 > Reporter: Ian > Attachments: Screen Shot 2016-03-11 at 5.42.32 PM.png > > > SortMergeJoin composes its partition/iterator from > org.apache.spark.sql.execution.Sort, which in turns designates the sorting to > UnsafeExternalRowSorter. > UnsafeExternalRowSorter's implementation cleans up the resources when: > 1. org.apache.spark.sql.catalyst.util.AbstractScalaRowIterator is fully > iterated. > 2. task is done execution. > In case of outer join case of SortMergeJoin, when the left or right iterator > is not fully iterated, the only only occasion for the recources to be cleaned > up is at the end of the spark task. This probably ok most of the time, > however when a SortMergeOuterJoin is nested within a CartesianProduct, the > "deferred" resources cleanup becomes an memory leak amplified by the loop > driven by the CartesianRdd's outter loop iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org