[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/136#discussion_r10635557 --- Diff: core/src/main/scala/org/apache/spark/rdd/SlidedRDD.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37734634 It is hard to say what threshold to use. I couldn't think of a use case that requires a large window size, but I cannot say there is none. Another possible

[GitHub] spark pull request: SPARK-1240: handle the case of empty RDD when ...

2014-03-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/135#discussion_r10580151 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -310,6 +310,9 @@ abstract class RDD[T: ClassTag]( * Return a sampled subset

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-13 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/136 [SPARK-1241] Add sliding to RDD Sliding is useful for operations like creating n-grams, calculating total variation, numerical integration, etc. This is similar to https://github.com/apache

[GitHub] spark pull request: SPARK-1240: handle the case of empty RDD when ...

2014-03-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/135#discussion_r10580942 --- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala --- @@ -457,6 +457,10 @@ class RDDSuite extends FunSuite with SharedSparkContext

[GitHub] spark pull request: SPARK-1240: handle the case of empty RDD when ...

2014-03-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/135#discussion_r10583451 --- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala --- @@ -457,6 +457,10 @@ class RDDSuite extends FunSuite with SharedSparkContext

[GitHub] spark pull request: SPARK-1240: handle the case of empty RDD when ...

2014-03-13 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/135#issuecomment-37587059 LGTM. Waiting for Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1237, 1238] Improve the computation of ...

2014-03-12 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/131#issuecomment-37490541 @srowen , this continues the work from https://github.com/apache/incubator-spark/pull/629 . Would you please help review the changes? Thanks! --- If your project is set

[GitHub] spark pull request: [SPARK-1237, 1238] Improve the computation of ...

2014-03-12 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/131#issuecomment-37500737 @srowen , level 3 BLAS would certainly help improve the performance. DSYRK is for computing C - A^T A + C, but I don't know whether we have it in jblas. However

[GitHub] spark pull request: [MLLIB-18] [WIP] Adding sparse data support an...

2014-03-11 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/117#issuecomment-37320613 @fommil I didn't realize the bottom of https://github.com/fommil/netlib-java/blob/master/LICENSE.txt is 3-clause BSD. It is Apache authorized, so I don't need to mention

[GitHub] spark pull request: MLI-2: Start adding k-fold cross validation to...

2014-03-10 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/18#issuecomment-37208033 MLI is not part of the Spark distribution. @pwendell Is it okay to use MLI's jira? All changes look good to me. --- If your project is set up for it, you can reply

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10439393 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10439348 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10439706 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10439850 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10439997 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10440022 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10440050 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10440077 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10440273 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10440340 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10440961 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10441013 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10441324 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10441801 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10441994 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10442003 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10442083 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10442556 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10442463 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/model/DecisionTreeModel.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10442815 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10443033 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10443413 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10443049 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10443435 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10443452 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10443791 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10443898 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10443976 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10444261 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10444589 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,1055 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-10 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/79#issuecomment-37224328 @manishamde Thanks for updating the code style and adding more docs! I made a first pass over the code. For the code style, we do not have a good style checker

[GitHub] spark pull request: Principal Component Analysis

2014-03-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10450573 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/SVD.scala --- @@ -142,17 +155,138 @@ object SVD { val vsirdd = sc.makeRDD(Array.tabulate

[GitHub] spark pull request: [MLLIB-18] [WIP] Adding sparse data support an...

2014-03-10 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/117 [MLLIB-18] [WIP] Adding sparse data support and update KMeans Continue our discussions from https://github.com/apache/incubator-spark/pull/575 This PR is WIP because it depends on a SNAPSHOT

[GitHub] spark pull request: [MLLIB-18] [WIP] Adding sparse data support an...

2014-03-10 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/117#issuecomment-37252898 Okay, sbt was able to fetch breeze_2.10-0.7-SNAPSHOT from Sonatype, so tests passed. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10360358 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,915 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10360441 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,915 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10360465 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,915 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10360528 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,915 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10360539 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -0,0 +1,915 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/79#discussion_r10360640 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurity.scala --- @@ -0,0 +1,25 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: MLI-2: Start adding k-fold cross validation to...

2014-03-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/18#discussion_r10363910 --- Diff: core/src/test/scala/org/apache/spark/util/random/RandomSamplerSuite.scala --- @@ -48,6 +48,20 @@ class RandomSamplerSuite extends FunSuite

[GitHub] spark pull request: MLI-2: Start adding k-fold cross validation to...

2014-03-06 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/18#issuecomment-36944266 LGTM, except the extra empty line. Do you mind creating a Spark JIRA for this PR? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: MLI-2: Start adding k-fold cross validation to...

2014-02-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/18#discussion_r10174439 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala --- @@ -62,6 +67,20 @@ object MLUtils { } /** + * Return a k

[GitHub] spark pull request: Initialized the regVal for first iteration in ...

2014-02-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/40#discussion_r10178847 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala --- @@ -149,7 +149,13 @@ object GradientDescent extends Logging

[GitHub] spark pull request: Initialized the regVal for first iteration in ...

2014-02-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/40#discussion_r10178907 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/GradientDescentSuite.scala --- @@ -104,4 +104,45 @@ class GradientDescentSuite extends