spark git commit: [SPARK-20670][ML] Simplify FPGrowth transform

felixcheung Tue, 09 May 2017 23:40:37 -0700

Repository: spark
Updated Branches:
  refs/heads/master a90c5cd82 -> a819dab66



[SPARK-20670][ML] Simplify FPGrowth transform

## What changes were proposed in this pull request?

jira: https://issues.apache.org/jira/browse/SPARK-20670
As suggested by Sean Owen in https://github.com/apache/spark/pull/17130, the 
transform code in FPGrowthModel can be simplified.

As I tested on some public dataset http://fimi.ua.ac.be/data/, the performance 
of the new transform code is even or better than the old implementation.

## How was this patch tested?

Existing unit test.

Author: Yuhao Yang <yuhao.y...@intel.com>

Closes #17912 from hhbyyh/fpgrowthTransform.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a819dab6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a819dab6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a819dab6

Branch: refs/heads/master
Commit: a819dab6681f3a16615039227865af188b3c3f2a
Parents: a90c5cd
Author: Yuhao Yang <yuhao.y...@intel.com>
Authored: Tue May 9 23:39:26 2017 -0700
Committer: Felix Cheung <felixche...@apache.org>
Committed: Tue May 9 23:39:26 2017 -0700

----------------------------------------------------------------------
 mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/a819dab6/mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala
----------------------------------------------------------------------
diff --git a/mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala 
b/mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala
index 8f00daa..12804d0 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala
@@ -269,12 +269,8 @@ class FPGrowthModel private[ml] (
     val predictUDF = udf((items: Seq[_]) => {
       if (items != null) {
         val itemset = items.toSet
-        brRules.value.flatMap(rule =>
-          if (items != null && rule._1.forall(item => itemset.contains(item))) 
{
-            rule._2.filter(item => !itemset.contains(item))
-          } else {
-            Seq.empty
-          }).distinct
+        brRules.value.filter(_._1.forall(itemset.contains))
+          .flatMap(_._2.filter(!itemset.contains(_))).distinct
       } else {
         Seq.empty
       }}, dt)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-20670][ML] Simplify FPGrowth transform

Reply via email to