[GitHub] spark pull request: [SPARK-2568] RangePartitioner should run only ...

2014-07-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1562#discussion_r15439511 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -105,24 +108,91 @@ class RangePartitioner[K : Ordering : ClassTag, V

[GitHub] spark pull request: [SPARK-2568] RangePartitioner should run only ...

2014-07-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1562#discussion_r15439516 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -105,24 +108,91 @@ class RangePartitioner[K : Ordering : ClassTag, V

[GitHub] spark pull request: [SPARK-2568] RangePartitioner should run only ...

2014-07-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1562#discussion_r15439518 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -105,24 +108,91 @@ class RangePartitioner[K : Ordering : ClassTag, V

[GitHub] spark pull request: [SPARK-2568] RangePartitioner should run only ...

2014-07-27 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1562#issuecomment-50266091 @rxin @mateiz I have one question about using `rdd.id` as random seed shift to avoid sampling the same sequence in each partition. It is a constant within a session. But

[GitHub] spark pull request: [SPARK-2568] RangePartitioner should run only ...

2014-07-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1562#discussion_r15439751 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -105,24 +108,91 @@ class RangePartitioner[K : Ordering : ClassTag, V

[GitHub] spark pull request: [SPARK-2568] RangePartitioner should run only ...

2014-07-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1562#discussion_r15441176 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -105,24 +108,91 @@ class RangePartitioner[K : Ordering : ClassTag, V

[GitHub] spark pull request: [SPARK-2568] RangePartitioner should run only ...

2014-07-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1562#discussion_r15441180 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -105,24 +108,91 @@ class RangePartitioner[K : Ordering : ClassTag, V

[GitHub] spark pull request: [SPARK-2514] [mllib] Random RDD generator

2014-07-27 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1520#issuecomment-50289239 LGTM. Merged into master. Thanks for adding random RDD generators!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-27 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-50290028 @dorx I removed commons-math3 from dependencies, separated `sampleByKey` and `sampleByKeyExact`, and corrected the math in waitlisting in sampling with replacement

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1425#discussion_r15443033 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/clustering/KMeansSuite.scala --- @@ -40,27 +41,51 @@ class KMeansSuite extends FunSuite with

[GitHub] spark pull request: [SPARK-1997] update breeze to version 0.8.1

2014-07-27 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/940#issuecomment-50291675 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1997] update breeze to version 0.8.1

2014-07-27 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/940#issuecomment-50291855 @witgo Could you merge the latest master? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-1997] update breeze to version 0.8.1

2014-07-27 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/940#issuecomment-50303013 @witgo Thanks for checking the dependencies on the JIRA page! I list the dependency graph here so other people can see the difference easily. I think we need to figure out

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-27 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1425#issuecomment-50303211 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-27 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1425#issuecomment-50303204 There were some problems with pyspark. Let's call Jenkins again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitH

[GitHub] spark pull request: [SPARK-2174][MLLIB] treeReduce and treeAggrega...

2014-07-27 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1110#issuecomment-50303437 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1425#issuecomment-50363809 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1425#issuecomment-50379081 LGTM. I'm merging this into master! Thanks!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-1997] update breeze to version 0.8.1

2014-07-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/940#issuecomment-50379548 @dlwh Thanks for the quick reply! The `commons-math3` problem is not which version to use but how to match the version hadoop depends on. We can switch to 3.1.1 to match

[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-50380948 @witgo We don't need to checkpoint both users and products, but only the smaller one. For the initial version, it is fine to checkpoint either of them. We should al

[GitHub] spark pull request: Check if margin > 0, not if prob > 0.5

2014-07-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1057#issuecomment-50433121 I think we should keep it as it is now and add support for setting thresholds. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1628#discussion_r15506024 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/random/RandomRDDGenerators.scala --- @@ -35,6 +35,9 @@ object RandomRDDGenerators

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1628#discussion_r15506028 --- Diff: python/pyspark/mllib/randomRDD.py --- @@ -0,0 +1,213 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1628#discussion_r15506026 --- Diff: python/pyspark/mllib/randomRDD.py --- @@ -0,0 +1,213 @@ +# --- End diff -- Should the file name match Scala's? --- If your pr

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1628#discussion_r15506022 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -453,4 +454,74 @@ class PythonMLLibAPI extends Serializable

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1628#discussion_r15506030 --- Diff: python/pyspark/mllib/randomRDD.py --- @@ -0,0 +1,213 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1628#discussion_r15506031 --- Diff: python/pyspark/mllib/randomRDD.py --- @@ -0,0 +1,213 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1628#issuecomment-50435025 LGTM except minor inline comments. For the file name, it should be possible to have a package named `random`, for example, `numpy.random`: http://docs.scipy.org/doc/numpy

[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-07-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1624#issuecomment-50435465 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-07-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1624#discussion_r15506490 --- Diff: python/pyspark/mllib/regression.py --- @@ -120,6 +120,23 @@ def train(cls, data, iterations=100, step=1.0, d._jrdd, iterations, step

[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...

2014-07-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1518#issuecomment-50441485 @dbtsai I thought another way to do this and want to know your opinion. We can add an optional argument to `appendBias`: `appendBias(bias: Double = 1.0)`. If this is used

[GitHub] spark pull request: [SPARK-2174][MLLIB] treeReduce and treeAggrega...

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1110#discussion_r15508694 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/RDDFunctions.scala --- @@ -44,6 +47,65 @@ class RDDFunctions[T: ClassTag](self: RDD[T

[GitHub] spark pull request: [SPARK-2174][MLLIB] treeReduce and treeAggrega...

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1110#discussion_r15508695 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/RDDFunctions.scala --- @@ -44,6 +47,65 @@ class RDDFunctions[T: ClassTag](self: RDD[T

[GitHub] spark pull request: [SPARK-2174][MLLIB] treeReduce and treeAggrega...

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1110#discussion_r15508718 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/RDDFunctions.scala --- @@ -44,6 +47,65 @@ class RDDFunctions[T: ClassTag](self: RDD[T

[GitHub] spark pull request: [SPARK-2174][MLLIB] treeReduce and treeAggrega...

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1110#discussion_r15508725 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/RDDFunctions.scala --- @@ -44,6 +47,65 @@ class RDDFunctions[T: ClassTag](self: RDD[T

[GitHub] spark pull request: [SPARK-2174][MLLIB] treeReduce and treeAggrega...

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1110#discussion_r15508790 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/RDDFunctions.scala --- @@ -44,6 +47,65 @@ class RDDFunctions[T: ClassTag](self: RDD[T

[GitHub] spark pull request: [SPARK-2174][MLLIB] treeReduce and treeAggrega...

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1110#discussion_r15508808 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/RDDFunctions.scala --- @@ -44,6 +47,65 @@ class RDDFunctions[T: ClassTag](self: RDD[T

[GitHub] spark pull request: [SPARK-2174][MLLIB] treeReduce and treeAggrega...

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1110#discussion_r15508803 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/RDDFunctions.scala --- @@ -44,6 +47,65 @@ class RDDFunctions[T: ClassTag](self: RDD[T

[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/929#discussion_r15533428 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -255,6 +260,9 @@ class ALS private ( rank, lambda, alpha

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r15534415 --- Diff: core/src/main/scala/org/apache/spark/util/random/SamplingUtils.scala --- @@ -88,14 +91,73 @@ private[spark] object SamplingUtils

[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/929#discussion_r15534935 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -255,6 +255,9 @@ class ALS private ( rank, lambda, alpha

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-29 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-50528615 LGTM. Merged into master. Thanks!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-29 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1628#issuecomment-50529499 Yes, having directories is the way to organize packages in python. We can make a folder for `random` and include the python files in `mllib/pom.xml`. Otherwise, user

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-07-29 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-50557454 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-07-29 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-50558162 @akopich The failed tests might be irrelevant to this PR. It would be nice if you can make the public interfaces minimal and provide a summary of them. For example, You

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15561486 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/StreamingLinearRegression.scala --- @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15561528 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingLinearRegression.scala --- @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15565489 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingRegression.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15565590 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingLinearRegression.scala --- @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15565612 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingRegression.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15565687 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingRegression.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15565686 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingRegression.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15565706 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingRegression.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15565728 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingRegression.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15565768 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingRegression.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15565789 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLStreamingUtils.scala --- @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15565824 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/regression/StreamingLinearRegressionSuite.scala --- @@ -0,0 +1,127 @@ +/* + * Licensed to the

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15565834 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/regression/StreamingLinearRegressionSuite.scala --- @@ -0,0 +1,127 @@ +/* + * Licensed to the

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15565864 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/regression/StreamingLinearRegressionSuite.scala --- @@ -0,0 +1,127 @@ +/* + * Licensed to the

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15565941 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/regression/StreamingLinearRegressionSuite.scala --- @@ -0,0 +1,127 @@ +/* + * Licensed to the

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15565956 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/regression/StreamingLinearRegressionSuite.scala --- @@ -0,0 +1,127 @@ +/* + * Licensed to the

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15565983 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/regression/StreamingLinearRegressionSuite.scala --- @@ -0,0 +1,127 @@ +/* + * Licensed to the

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15566035 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingLinearRegression.scala --- @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15566050 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingLinearRegression.scala --- @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15566064 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingRegression.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15566077 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingRegression.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-07-29 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-50572101 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-07-29 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-50572111 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-07-29 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-50572128 @bgreeven Jenkins will be automatically triggered for future updates. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/929#discussion_r15566629 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -255,6 +255,9 @@ class ALS private ( rank, lambda, alpha

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-29 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1628#issuecomment-50574345 @dorx I tried `import pyspark.mllib.random` and it failed. It has to be `from pyspark.mllib import random`. And to use `RandomRDDGenerators`, I need to call

[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1624#discussion_r15566835 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -42,6 +43,16 @@ class PythonMLLibAPI extends Serializable

[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-29 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-50574626 LGTM. Waiting for Jenkins. Btw, @witgo if you have a big dataset to test, could you try to set the storage level of ratings and user/product in/out links to

[GitHub] spark pull request: [SPARK-2552][MLLIB] stabilize logistic functio...

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1493#discussion_r15567017 --- Diff: python/pyspark/mllib/classification.py --- @@ -63,7 +63,10 @@ class LogisticRegressionModel(LinearModel): def predict(self, x

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-29 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1628#issuecomment-50575901 We pinged Davies today. It seems to be a well-known problem with Python. There are ways to force import a standard module in Python 2, but they are all very messy

[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-29 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-50575931 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Avoid numerical instability

2014-07-30 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1652#issuecomment-50634715 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Avoid numerical instability

2014-07-30 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1652#issuecomment-50634700 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-2748 [MLLIB] [GRAPHX] Loss of precision ...

2014-07-30 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1659#issuecomment-50635193 LGTM. Merged into master. Thanks!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-2522] set default broadcast factory to ...

2014-07-30 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1437#issuecomment-50645071 @lianhuiwang I created a JIRA for it: https://issues.apache.org/jira/browse/SPARK-2755 . We can serialize the object to a stream instead of Array[Byte] directly. --- If

[GitHub] spark pull request: Avoid numerical instability

2014-07-30 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1652#issuecomment-50645844 LGTM. Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-2544][MLLIB] Improve ALS algorithm reso...

2014-07-30 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/929#issuecomment-50654846 Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-30 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1628#issuecomment-50662945 I tried in Python 2.7 but it didn't work: ~~~ Python 2.7.7 (default, Jun 2 2014, 01:41:14) [GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40

[GitHub] spark pull request: SPARK-2341 [MLLIB] loadLibSVMFile doesn't hand...

2014-07-30 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1663#issuecomment-50663331 LGTM. Waiting for Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-2341 [MLLIB] loadLibSVMFile doesn't hand...

2014-07-30 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1663#discussion_r15604802 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala --- @@ -102,36 +100,14 @@ object MLUtils { // Convenient methods for

[GitHub] spark pull request: SPARK-2341 [MLLIB] loadLibSVMFile doesn't hand...

2014-07-30 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1663#discussion_r15604829 --- Diff: python/pyspark/mllib/util.py --- @@ -29,15 +29,18 @@ class MLUtils: Helper methods to load, save and pre-process data used in MLlib

[GitHub] spark pull request: [SPARK-2511][MLLIB] add HashingTF and IDF

2014-07-30 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/1671 [SPARK-2511][MLLIB] add HashingTF and IDF This is roughly the TF-IDF implementation used in the Databricks Cloud Demo: http://databricks.com/cloud/ . Both `HashingTF` and `IDF` are

[GitHub] spark pull request: SPARK-2341 [MLLIB] loadLibSVMFile doesn't hand...

2014-07-30 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1663#issuecomment-50689249 Yes, MiMa doesn't recognize package private classes. Please add those exclusion rules manually: ~~~ [error] * o

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-30 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1628#issuecomment-50690034 @JoshRosen Ah ... I should copy it literally. Thanks! Do you know what is the oldest version of python that we support? --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-30 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1628#issuecomment-50691239 @JoshRosen If we don't support 2.5, could we use `from __future__ import absolute_import`? --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: Decision tree bug fixes

2014-07-30 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1673#issuecomment-50691630 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Decision tree bug fixes

2014-07-30 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1673#issuecomment-50691618 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...

2014-07-30 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1518#issuecomment-50691925 I think this is the approach LIBLINEAR uses. Yes, let's discuss tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appe

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2435#issuecomment-57116574 LGTM and tested with 1000 trees. I've merged it into master. Thanks @jkbradley for implementing RF and @codedeft @manishamde @chouqin for reviewing! --- If your pr

[GitHub] spark pull request: [MLlib] [SPARK-2885] DIMSUM: All-pairs similar...

2014-09-29 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1778#issuecomment-57204030 LGTM. Merged into master! Thanks @rezazadeh ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-3486][MLlib][PySpark] PySpark support f...

2014-09-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2356#discussion_r18184054 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -284,6 +285,54 @@ class PythonMLLibAPI extends Serializable

[GitHub] spark pull request: [SPARK-3486][MLlib][PySpark] PySpark support f...

2014-09-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2356#discussion_r18184065 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -284,6 +285,54 @@ class PythonMLLibAPI extends Serializable

[GitHub] spark pull request: [SPARK-3486][MLlib][PySpark] PySpark support f...

2014-09-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2356#discussion_r18184061 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -284,6 +285,54 @@ class PythonMLLibAPI extends Serializable

[GitHub] spark pull request: [SPARK-3486][MLlib][PySpark] PySpark support f...

2014-09-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2356#discussion_r18184091 --- Diff: python/pyspark/mllib/Word2Vec.py --- @@ -0,0 +1,124 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request: [SPARK-3486][MLlib][PySpark] PySpark support f...

2014-09-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2356#discussion_r18184098 --- Diff: python/pyspark/mllib/Word2Vec.py --- @@ -0,0 +1,151 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

<    10   11   12   13   14   15   16   17   18   19   >