Repository: spark Updated Branches: refs/heads/master a90c5cd82 -> a819dab66
[SPARK-20670][ML] Simplify FPGrowth transform ## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK-20670 As suggested by Sean Owen in https://github.com/apache/spark/pull/17130, the transform code in FPGrowthModel can be simplified. As I tested on some public dataset http://fimi.ua.ac.be/data/, the performance of the new transform code is even or better than the old implementation. ## How was this patch tested? Existing unit test. Author: Yuhao Yang <yuhao.y...@intel.com> Closes #17912 from hhbyyh/fpgrowthTransform. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a819dab6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a819dab6 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a819dab6 Branch: refs/heads/master Commit: a819dab6681f3a16615039227865af188b3c3f2a Parents: a90c5cd Author: Yuhao Yang <yuhao.y...@intel.com> Authored: Tue May 9 23:39:26 2017 -0700 Committer: Felix Cheung <felixche...@apache.org> Committed: Tue May 9 23:39:26 2017 -0700 ---------------------------------------------------------------------- mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/a819dab6/mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---------------------------------------------------------------------- diff --git a/mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala b/mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala index 8f00daa..12804d0 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala @@ -269,12 +269,8 @@ class FPGrowthModel private[ml] ( val predictUDF = udf((items: Seq[_]) => { if (items != null) { val itemset = items.toSet - brRules.value.flatMap(rule => - if (items != null && rule._1.forall(item => itemset.contains(item))) { - rule._2.filter(item => !itemset.contains(item)) - } else { - Seq.empty - }).distinct + brRules.value.filter(_._1.forall(itemset.contains)) + .flatMap(_._2.filter(!itemset.contains(_))).distinct } else { Seq.empty }}, dt) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org