Repository: spark
Updated Branches:
  refs/heads/master 92e017fb8 -> 942847fd9


Bug Fix: without unpersist method in RandomForest.scala

During trainning Gradient Boosting Decision Tree on large-scale sparse data, 
spark spill hundreds of data onto disk. And find the bug below:
    In version 1.1.0 DecisionTree.scala, train Method, treeInput has been 
persisted in Memory, but without unpersist. It caused heavy DISK usage.
    In github version(1.2.0 maybe), RandomForest.scala, train Method, 
baggedInput has been persisted but without unpersisted too.

After added unpersist, it works right.
https://issues.apache.org/jira/browse/SPARK-3918

Author: omgteam <kimlong....@gmail.com>

Closes #2775 from omgteam/master and squashes the following commits:

815d543 [omgteam] adjust tab to spaces
1a36f83 [omgteam] Bug: fix without unpersist baggedInput in RandomForest.scala


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/942847fd
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/942847fd
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/942847fd

Branch: refs/heads/master
Commit: 942847fd94c920f7954ddf01f97263926e512b0e
Parents: 92e017f
Author: omgteam <kimlong....@gmail.com>
Authored: Mon Oct 13 09:59:41 2014 -0700
Committer: Xiangrui Meng <m...@databricks.com>
Committed: Mon Oct 13 09:59:41 2014 -0700

----------------------------------------------------------------------
 .../src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala  | 2 ++
 1 file changed, 2 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/942847fd/mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala
----------------------------------------------------------------------
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala
index fa7a26f..ebbd8e0 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala
@@ -176,6 +176,8 @@ private class RandomForest (
       timer.stop("findBestSplits")
     }
 
+    baggedInput.unpersist()
+
     timer.stop("total")
 
     logInfo("Internal timing for DecisionTree:")


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to