[ 
https://issues.apache.org/jira/browse/SPARK-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian updated SPARK-13872:
------------------------
    Attachment: Screen Shot 2016-03-11 at 5.42.32 PM.png

> Memory leak SortMergeOuterJoin
> ------------------------------
>
>                 Key: SPARK-13872
>                 URL: https://issues.apache.org/jira/browse/SPARK-13872
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1
>            Reporter: Ian
>         Attachments: Screen Shot 2016-03-11 at 5.42.32 PM.png
>
>
> SortMergeJoin composes its partition/iterator from 
> org.apache.spark.sql.execution.Sort, which in turns designates the sorting to 
> UnsafeExternalRowSorter.
> UnsafeExternalRowSorter's implementation cleans up the resources when:
> 1. org.apache.spark.sql.catalyst.util.AbstractScalaRowIterator is fully 
> iterated.
> 2. task is done execution.
> In case of outer join case of SortMergeJoin, when the left or right iterator 
> is not fully iterated, the only only occasion for the recources to be cleaned 
> up is at the end of the spark task. This probably ok most of the time, 
> however when a SortMergeOuterJoin is nested within a CartesianProduct, the 
> "deferred" resources cleanup becomes an memory leak amplified by the loop 
> driven by the CartesianRdd's outter loop iteration.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to