[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1624#discussion_r15627471 --- Diff: python/pyspark/mllib/regression.py --- @@ -109,18 +109,45 @@ class LinearRegressionModel(LinearRegressionModelBase): True

[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1624#discussion_r15627468 --- Diff: python/pyspark/mllib/regression.py --- @@ -109,18 +109,45 @@ class LinearRegressionModel(LinearRegressionModelBase): True

[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1624#discussion_r15627480 --- Diff: python/pyspark/mllib/regression.py --- @@ -109,18 +109,45 @@ class LinearRegressionModel(LinearRegressionModelBase): True

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1673#issuecomment-50716977 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1673#discussion_r15627517 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala --- @@ -69,25 +73,32 @@ object DecisionTreeRunner { opt

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1673#discussion_r15627507 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala --- @@ -69,25 +73,32 @@ object DecisionTreeRunner { opt

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1673#discussion_r15627514 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala --- @@ -69,25 +73,32 @@ object DecisionTreeRunner { opt

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1673#discussion_r15627527 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala --- @@ -100,16 +111,57 @@ object DecisionTreeRunner

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1673#discussion_r15627524 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala --- @@ -100,16 +111,57 @@ object DecisionTreeRunner

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1673#discussion_r15627594 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala --- @@ -100,16 +111,57 @@ object DecisionTreeRunner

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1673#discussion_r15627787 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala --- @@ -100,16 +111,57 @@ object DecisionTreeRunner

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1673#discussion_r15627842 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala --- @@ -100,16 +111,57 @@ object DecisionTreeRunner

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1673#discussion_r15627893 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala --- @@ -48,11 +50,13 @@ object DecisionTreeRunner

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1673#discussion_r15627927 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -598,9 +598,12 @@ object DecisionTree extends Serializable with Logging

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1673#discussion_r15628159 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -612,27 +615,31 @@ object DecisionTree extends Serializable with Logging

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1673#discussion_r15628197 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -815,20 +822,10 @@ object DecisionTree extends Serializable with Logging

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1673#discussion_r15628238 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -845,33 +842,15 @@ object DecisionTree extends Serializable with Logging

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1673#discussion_r15628435 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/model/Node.scala --- @@ -91,4 +91,59 @@ class Node

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1673#discussion_r15628444 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/model/Node.scala --- @@ -91,4 +91,59 @@ class Node

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1673#discussion_r15628483 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/model/Node.scala --- @@ -91,4 +91,59 @@ class Node

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1673#discussion_r15628492 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/model/Node.scala --- @@ -91,4 +91,59 @@ class Node

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1673#discussion_r15628533 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/model/Node.scala --- @@ -91,4 +91,59 @@ class Node

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1673#discussion_r15628610 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala --- @@ -31,6 +30,18 @@ import

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1673#discussion_r15628659 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala --- @@ -602,12 +609,78 @@ class DecisionTreeSuite extends FunSuite

[GitHub] spark pull request: [SPARK-2511][MLLIB] add HashingTF and IDF

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1671#discussion_r15630475 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/HashingTF.scala --- @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15630544 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/regression/StreamingLinearRegressionSuite.scala --- @@ -0,0 +1,127 @@ +/* + * Licensed

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15630602 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingRegression.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15630645 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingLinearRegression.scala --- @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15630714 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingRegression.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15630905 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingRegression.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1673#discussion_r15631754 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -598,9 +598,12 @@ object DecisionTree extends Serializable with Logging

[GitHub] spark pull request: [SPARK-2511][MLLIB] add HashingTF and IDF

2014-07-31 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1671#issuecomment-50735679 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1687#discussion_r15645443 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala --- @@ -66,6 +66,42 @@ class MatrixFactorizationModel

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1687#discussion_r15645502 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala --- @@ -66,6 +66,42 @@ class MatrixFactorizationModel

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1687#discussion_r15645648 --- Diff: mllib/src/test/java/org/apache/spark/mllib/recommendation/JavaALSSuite.java --- @@ -29,6 +29,8 @@ import

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1687#discussion_r15645652 --- Diff: mllib/src/test/java/org/apache/spark/mllib/recommendation/JavaALSSuite.java --- @@ -44,21 +46,27 @@ public void tearDown() { sc = null

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1687#discussion_r15645878 --- Diff: mllib/src/test/java/org/apache/spark/mllib/recommendation/JavaALSSuite.java --- @@ -171,4 +180,29 @@ public void runImplicitALSWithNegativeWeight

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

2014-07-31 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1687#issuecomment-50769991 For the API, another option is to return `Array[Rating]` instead of `Array[(Int, Double)]`. This should help Java users and it is also compatible with batch predictions

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

2014-07-31 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1687#issuecomment-50785472 I meant the final `userFeatures` and `productFeatures` stored in the matrix factorization model. If those two RDDs are kicked out from memory by later jobs, we have

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1687#discussion_r15657110 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala --- @@ -66,6 +66,44 @@ class MatrixFactorizationModel

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1687#discussion_r15657224 --- Diff: mllib/src/test/java/org/apache/spark/mllib/recommendation/JavaALSSuite.java --- @@ -28,6 +28,8 @@ import org.apache.spark.api.java.JavaRDD

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1687#discussion_r15657246 --- Diff: mllib/src/test/java/org/apache/spark/mllib/recommendation/JavaALSSuite.java --- @@ -28,6 +28,8 @@ import org.apache.spark.api.java.JavaRDD

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1687#discussion_r15657282 --- Diff: mllib/src/test/java/org/apache/spark/mllib/recommendation/JavaALSSuite.java --- @@ -163,12 +173,42 @@ public void runImplicitALSWithNegativeWeight

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1687#discussion_r15657314 --- Diff: mllib/src/test/java/org/apache/spark/mllib/recommendation/JavaALSSuite.java --- @@ -163,12 +173,42 @@ public void runImplicitALSWithNegativeWeight

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

2014-07-31 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1687#issuecomment-50793121 @srowen For changing the storage level, I can submit another PR after this gets merged and ping you for review. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1628#discussion_r15657914 --- Diff: python/pyspark/__init__.py --- @@ -49,6 +49,12 @@ Main entry point for accessing data stored in Apache Hive

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1628#discussion_r15657959 --- Diff: python/pyspark/mllib/linalg.py --- @@ -255,4 +255,6 @@ def _test(): exit(-1) if __name__ == __main__: +import sys

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1628#discussion_r15658209 --- Diff: python/pyspark/mllib/random.py --- @@ -0,0 +1,222 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1673#discussion_r15665285 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala --- @@ -69,25 +73,32 @@ object DecisionTreeRunner { opt

[GitHub] spark pull request: [SPARK-2511][MLLIB] add HashingTF and IDF

2014-07-31 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1671#issuecomment-50810069 @mateiz Thanks for reviewing the code! I merged this into master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1673#discussion_r15667236 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala --- @@ -100,16 +109,59 @@ object DecisionTreeRunner

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1673#discussion_r15667261 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala --- @@ -100,16 +109,59 @@ object DecisionTreeRunner

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1673#discussion_r15667264 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala --- @@ -100,16 +109,59 @@ object DecisionTreeRunner

[GitHub] spark pull request: Add normalizeByCol method to mllib.util.MLUtil...

2014-07-31 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1698#issuecomment-50814109 @andy327 This is covered in @dbtsai's PR: https://github.com/apache/spark/pull/1207 , which is in review. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1673#discussion_r15667925 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -42,8 +42,8 @@ class DecisionTree (private val strategy: Strategy

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1673#issuecomment-50816349 @jkbradley The changes look good to me. Thank you for the bug fixes and adding more docs! Waiting for @manishamde to make a final pass, and Jenkins. --- If your project

[GitHub] spark pull request: [SPARK-2777][MLLIB] change ALS factors storage...

2014-07-31 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1700#issuecomment-50816428 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1628#discussion_r15670511 --- Diff: python/pyspark/mllib/random.py --- @@ -0,0 +1,182 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1628#discussion_r15670547 --- Diff: python/pyspark/mllib/random.py --- @@ -0,0 +1,182 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1628#discussion_r15670579 --- Diff: python/pyspark/mllib/random.py --- @@ -0,0 +1,182 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1628#discussion_r15670583 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -453,4 +455,98 @@ class PythonMLLibAPI extends Serializable

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1628#discussion_r15670585 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -453,4 +455,98 @@ class PythonMLLibAPI extends Serializable

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-31 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1628#issuecomment-50820621 LGTM except minor inline comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-2782][mllib] Bug fix for getRanks in Sp...

2014-07-31 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1710#issuecomment-50845244 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2782][mllib] Bug fix for getRanks in Sp...

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1710#discussion_r15681269 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/Statistics.scala --- @@ -55,20 +55,24 @@ object Statistics { /** * Compute

[GitHub] spark pull request: [SPARK-2782][mllib] Bug fix for getRanks in Sp...

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1710#discussion_r15681299 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/correlation/SpearmanCorrelation.scala --- @@ -89,20 +89,17 @@ private[stat] object

[GitHub] spark pull request: [SPARK-2782][mllib] Bug fix for getRanks in Sp...

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1710#discussion_r15681334 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/correlation/SpearmanCorrelation.scala --- @@ -89,20 +89,17 @@ private[stat] object

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-31 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1628#issuecomment-50845853 Merged into master. Thanks!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1673#discussion_r15681460 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala --- @@ -100,16 +109,57 @@ object DecisionTreeRunner

[GitHub] spark pull request: [SPARK-2756] [mllib] Decision tree bug fixes

2014-07-31 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1673#issuecomment-50846546 Merged into master. Thanks!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2782][mllib] Bug fix for getRanks in Sp...

2014-07-31 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1710#issuecomment-50847713 LGTM. Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [MLLIB] SPARK-2311: Added additional GLMs (Poi...

2014-07-31 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1237#issuecomment-50848559 Sorry, I'm still working on it and will put the design doc to JIRA soon. But unfortunately, it may not be able to catch the v1.1 release. --- If your project is set up

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-07-31 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-50848722 @bgreeven The filename `mllib/src/main/scala/org/apache/spark/mllib/ann/GeneralizedSteepestDescendAlgorithm` doesn't have `.scala` extension. --- If your project is set

[GitHub] spark pull request: Add normalizeByCol method to mllib.util.MLUtil...

2014-07-31 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1698#issuecomment-50849409 Your implementation calls `reduceByKey` and `cartesian`. Those are not cheap streamline operations. `map(x = (1, x)).reduceByKey` is the same as `reduce` except

[GitHub] spark pull request: [SPARK-1997] update breeze to version 0.8.1

2014-07-31 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/940#issuecomment-50850308 Yes, it is already a problem with breeze 0.7. But we didn't realized that hadoop 2.3 depends on commons-math3 in the Spark v1.0 release. If there is a way to avoid

[GitHub] spark pull request: [SPARK-1997] update breeze to version 0.8.1

2014-07-31 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/940#issuecomment-50850626 But is it needed for the v1.1 release? Spark v1.1 doesn't support Scala 2.11. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-1997] update breeze to version 0.8.1

2014-07-31 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/940#issuecomment-50851243 That sounds good to me but I'm not familiar with the tasks related to Scala 2.11. Please run the discussion on https://issues.apache.org/jira/browse/SPARK-1812

[GitHub] spark pull request: [SPARK-1997] update breeze to version 0.8.1

2014-08-01 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/940#issuecomment-50854138 @witgo Could you update the pom to exclude `commons-math3` from dependencies? I tried at local and LBFGS works well. It should be safe to remove `commons-math3

[GitHub] spark pull request: [SPARK-1997] update breeze to version 0.8.1

2014-08-01 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/940#discussion_r15684374 --- Diff: mllib/pom.xml --- @@ -60,6 +60,14 @@ groupIdjunit/groupId artifactIdjunit/artifactId /exclusion

[GitHub] spark pull request: [SPARK-2786][mllib] Python correlations

2014-08-01 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1713#discussion_r15684424 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/correlation/Correlation.scala --- @@ -49,43 +49,48 @@ private[stat] trait Correlation

[GitHub] spark pull request: [SPARK-2786][mllib] Python correlations

2014-08-01 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1713#discussion_r15684617 --- Diff: python/pyspark/mllib/stat.py --- @@ -0,0 +1,103 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request: [SPARK-2786][mllib] Python correlations

2014-08-01 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1713#discussion_r15684640 --- Diff: python/pyspark/mllib/stat.py --- @@ -0,0 +1,103 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request: [SPARK-2786][mllib] Python correlations

2014-08-01 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1713#discussion_r15685005 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -456,6 +458,37 @@ class PythonMLLibAPI extends Serializable

[GitHub] spark pull request: [SPARK-2786][mllib] Python correlations

2014-08-01 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1713#discussion_r15684998 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -456,6 +458,37 @@ class PythonMLLibAPI extends Serializable

[GitHub] spark pull request: [SPARK-2786][mllib] Python correlations

2014-08-01 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1713#discussion_r15685050 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/api/python/PythonMLLibAPISuite.scala --- @@ -59,10 +59,25 @@ class PythonMLLibAPISuite extends FunSuite

[GitHub] spark pull request: [SPARK-2786][mllib] Python correlations

2014-08-01 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1713#discussion_r15685067 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/api/python/PythonMLLibAPISuite.scala --- @@ -59,10 +59,25 @@ class PythonMLLibAPISuite extends FunSuite

[GitHub] spark pull request: [SPARK-1997] update breeze to version 0.8.1

2014-08-01 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/940#issuecomment-50892988 LGTM. Merged into master. Note that Jenkins didn't tell the full story because we have `commons-math3` in the test scope. I built the assembly jar and verified LBFGS work

[GitHub] spark pull request: [SPARK-1812] mllib - upgrade to breeze 0.8.1

2014-08-01 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1703#issuecomment-50893224 @avati #940 is merged. Do you mind closing this PR? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-1997] update breeze to version 0.8.1

2014-08-01 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/940#issuecomment-50896710 Ah, I see. Tests were against individual build instead of the assembly jar. We should have integration tests in the future. --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-1812] upgrade dependency to scala-loggi...

2014-08-01 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1701#issuecomment-50897364 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [HOTFIX] downgrade breeze version to 0.7

2014-08-01 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/1718 [HOTFIX] downgrade breeze version to 0.7 breeze-0.8.1 causes dependency issues, as discussed in #940 . You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-08-01 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1361#issuecomment-50898744 @freeman-lab Could you try to merge the latest master and resolve conflicts? It may be because of the change to constructors. --- If your project is set up for it, you

[GitHub] spark pull request: Add normalizeByCol method to mllib.util.MLUtil...

2014-08-01 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1698#issuecomment-50899568 What if you have 10M columns? I agree that not sending data to the driver is a good practice. But the current operations `reduceByKey` and `cartesian` are not optimized

[GitHub] spark pull request: Add normalizeByCol method to mllib.util.MLUtil...

2014-08-01 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1698#issuecomment-50900786 Yes, I tried to implement AllReduce without having driver in the middle in https://github.com/apache/spark/pull/506 but it introduced complex dependencies. So I fall back

[GitHub] spark pull request: [MLlib] word2vec: Distributed Representation o...

2014-08-01 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1719#issuecomment-50901102 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [MLlib] word2vec: Distributed Representation o...

2014-08-01 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1719#issuecomment-50901119 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [MLlib] word2vec: Distributed Representation o...

2014-08-01 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1719#discussion_r15703584 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -0,0 +1,376 @@ +/* +* Licensed to the Apache Software Foundation

[GitHub] spark pull request: [MLlib] word2vec: Distributed Representation o...

2014-08-01 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1719#discussion_r15703588 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -0,0 +1,376 @@ +/* +* Licensed to the Apache Software Foundation

[GitHub] spark pull request: [MLlib] word2vec: Distributed Representation o...

2014-08-01 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1719#discussion_r15703583 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -0,0 +1,376 @@ +/* +* Licensed to the Apache Software Foundation

[GitHub] spark pull request: [MLlib] word2vec: Distributed Representation o...

2014-08-01 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1719#discussion_r15703684 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/Word2VecSuite.scala --- @@ -0,0 +1,40 @@ +/* +* Licensed to the Apache Software

<    1   2   3   4   5   6   7   8   9   10   >