[GitHub] spark pull request: specify unidocGenjavadocVersion of 0.8

2014-10-23 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2893#issuecomment-60306344 Verified that this change doesn't affect `unidoc` with Java 6 and 7. Merged into master. Thanks! --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-24 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-60351026 @derrickburns The features are useful, so please don't delete the PR. Since this is a major refactor of `KMeans`, I need to allocate a block of time to review the code

[GitHub] spark pull request: [SPARK-2652] [PySpark] donot use KyroSerialize...

2014-10-24 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2916#issuecomment-60351266 LGTM. Merged into both master and branch-1.1. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: SPARK-3359 [DOCS] sbt/sbt unidoc doesn't work ...

2014-10-24 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2909#issuecomment-60399506 @srowen Do you plan to fix more `unidoc` errors in this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

2014-10-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2928#discussion_r19362087 --- Diff: LICENSE --- @@ -1,4 +1,3 @@ - --- End diff -- The license file in Hadoop does have this empty line: https://github.com/apache

[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

2014-10-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2928#discussion_r19362095 --- Diff: core/src/main/scala/org/apache/spark/partial/StudentTCacher.scala --- @@ -35,7 +37,8 @@ private[spark] class StudentTCacher(confidence: Double

[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

2014-10-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2928#discussion_r19362101 --- Diff: core/src/main/scala/org/apache/spark/partial/SumEvaluator.scala --- @@ -55,9 +55,10 @@ private[spark] class SumEvaluator(totalOutputs: Int

[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

2014-10-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2928#discussion_r19362108 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSamplingUtils.scala --- @@ -245,9 +245,9 @@ private[spark] object

[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

2014-10-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2928#discussion_r19362102 --- Diff: core/src/main/scala/org/apache/spark/rdd/SampledRDD.scala --- @@ -53,9 +53,14 @@ private[spark] class SampledRDD[T: ClassTag

[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

2014-10-24 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2907#issuecomment-60445965 @srowen Unit tests are in https://github.com/numbnut/spark/blob/master/mllib/src/test/scala/org/apache/spark/mllib/rdd/RDDFunctionsSuite.scala I think

[GitHub] spark pull request: [SPARK-4084] Reuse sort key in Sorter

2014-10-24 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/2937 [SPARK-4084] Reuse sort key in Sorter Sorter uses generic-typed key for sorting. When data is large, it creates lots of key objects, which is not efficient. We should reuse the key in Sorter

[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

2014-10-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2928#discussion_r19368695 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -87,15 +87,19 @@ class BernoulliSampler[T](lb: Double, ub: Double

[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

2014-10-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2928#discussion_r19368747 --- Diff: core/src/main/scala/org/apache/spark/partial/StudentTCacher.scala --- @@ -35,7 +37,8 @@ private[spark] class StudentTCacher(confidence: Double

[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

2014-10-24 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2907#issuecomment-60459079 Sounds good. @numbnut Could you update the PR and change the following? 1) add @DeveloperApi to RDDFunctions 2) change the return type of `sliding` to `RDD

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-10-24 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2906#issuecomment-60460281 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-10-24 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2906#issuecomment-60460354 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-10-24 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2906#issuecomment-60463744 @yu-iskw I added you to the whitelist. Future commits from you should trigger Jenkins automatically. Just took a very brief scan over the code and really appreciate

[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] Replace colt depende...

2014-10-25 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2928#issuecomment-60505131 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-4084] Reuse sort key in Sorter

2014-10-25 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2937#discussion_r19378842 --- Diff: core/src/main/java/org/apache/spark/util/collection/Sorter.java --- @@ -587,10 +601,12 @@ private int gallopRight(K key, Buffer a, int base, int

[GitHub] spark pull request: [SPARK-4084] Reuse sort key in Sorter

2014-10-25 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2937#discussion_r19378852 --- Diff: core/src/main/scala/org/apache/spark/util/collection/SortDataFormat.scala --- @@ -34,9 +34,20 @@ import scala.reflect.ClassTag */ // TODO

[GitHub] spark pull request: [SPARK-4084] Reuse sort key in Sorter

2014-10-25 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2937#issuecomment-60505341 @aarondav I updated the PR based on your comment. See the description for renaming `Sorter.java` to `TimSort.Java`. --- If your project is set up for it, you can reply

[GitHub] spark pull request: SPARK-3359 [DOCS] sbt/sbt unidoc doesn't work ...

2014-10-25 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2909#issuecomment-60505492 LGTM. Verified that `lt;` shows up correctly in generated Scala and Java docs. Merged into master. Thanks! --- If your project is set up for it, you can reply

[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] Replace colt depende...

2014-10-25 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2928#issuecomment-60506206 @srowen Could you check `JavaAPISuite.sample`? We need to update that test as well. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-4084] Reuse sort key in Sorter

2014-10-25 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2937#issuecomment-60506230 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3838][examples][mllib][python] Word2Vec...

2014-10-27 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2952#issuecomment-60625545 @anantasty Could you also update the doc in `https://github.com/apache/spark/blob/master/docs/mllib-feature-extraction.md`? Thanks! --- If your project is set up

[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] Replace colt depende...

2014-10-27 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2928#issuecomment-60638201 LGTM. Verified that `commons.math3` is shaded in the assembly jar. Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [MLlib] SPARK-3987: add test case on objective...

2014-10-27 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2965#issuecomment-60702505 LGTM. Merged into both master and branch-1.1. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-3838][examples][mllib][python] Word2Vec...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2952#discussion_r19451852 --- Diff: docs/mllib-feature-extraction.md --- @@ -162,6 +162,28 @@ for((synonym, cosineSimilarity) - synonyms) { } {% endhighlight %} /div

[GitHub] spark pull request: [SPARK-3838][examples][mllib][python] Word2Vec...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2952#discussion_r19451869 --- Diff: docs/mllib-feature-extraction.md --- @@ -162,6 +162,28 @@ for((synonym, cosineSimilarity) - synonyms) { } {% endhighlight %} /div

[GitHub] spark pull request: [SPARK-3838][examples][mllib][python] Word2Vec...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2952#discussion_r19451872 --- Diff: docs/mllib-feature-extraction.md --- @@ -162,6 +162,28 @@ for((synonym, cosineSimilarity) - synonyms) { } {% endhighlight %} /div

[GitHub] spark pull request: [SPARK-3838][examples][mllib][python] Word2Vec...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2952#discussion_r19451976 --- Diff: docs/mllib-feature-extraction.md --- @@ -162,6 +162,28 @@ for((synonym, cosineSimilarity) - synonyms) { } {% endhighlight %} /div

[GitHub] spark pull request: [SPARK-3838][examples][mllib][python] Word2Vec...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2952#discussion_r19452204 --- Diff: examples/src/main/python/mllib/word2vec.py --- @@ -0,0 +1,36 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request: [SPARK-3838][examples][mllib][python] Word2Vec...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2952#discussion_r19452346 --- Diff: docs/mllib-feature-extraction.md --- @@ -162,6 +162,28 @@ for((synonym, cosineSimilarity) - synonyms) { } {% endhighlight %} /div

[GitHub] spark pull request: [SPARK-4084] Reuse sort key in Sorter

2014-10-27 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2937#issuecomment-60710392 Well, it won't pass `travis` ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-4084] Reuse sort key in Sorter

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2937#discussion_r19453604 --- Diff: core/src/test/scala/org/apache/spark/util/collection/SorterSuite.scala --- @@ -61,10 +65,33 @@ class SorterSuite extends FunSuite

[GitHub] spark pull request: [SPARK-3838][examples][mllib][python] Word2Vec...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2952#discussion_r19453782 --- Diff: docs/mllib-feature-extraction.md --- @@ -162,6 +162,38 @@ for((synonym, cosineSimilarity) - synonyms) { } {% endhighlight %} /div

[GitHub] spark pull request: [SPARK-3838][examples][mllib][python] Word2Vec...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2952#discussion_r19453785 --- Diff: examples/src/main/python/mllib/word2vec.py --- @@ -0,0 +1,47 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request: [SPARK-3838][examples][mllib][python] Word2Vec...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2952#discussion_r19453784 --- Diff: examples/src/main/python/mllib/word2vec.py --- @@ -0,0 +1,47 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19454162 --- Diff: docs/mllib-feature-extraction.md --- @@ -95,8 +95,50 @@ tf.cache() val idf = new IDF(minDocFreq = 2).fit(tf) val tfidf: RDD[Vector

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19454166 --- Diff: docs/mllib-feature-extraction.md --- @@ -267,4 +346,25 @@ val data1 = data.map(x = (x.label, normalizer1.transform(x.features))) val data2

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19454164 --- Diff: docs/mllib-feature-extraction.md --- @@ -162,6 +204,20 @@ for((synonym, cosineSimilarity) - synonyms) { } {% endhighlight %} /div

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19454179 --- Diff: python/pyspark/mllib/feature.py --- @@ -18,59 +18,348 @@ Python package for feature in MLlib. +import sys +import warnings

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19454172 --- Diff: docs/mllib-feature-extraction.md --- @@ -267,4 +346,25 @@ val data1 = data.map(x = (x.label, normalizer1.transform(x.features))) val data2

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19454181 --- Diff: python/pyspark/mllib/feature.py --- @@ -18,59 +18,348 @@ Python package for feature in MLlib. +import sys +import warnings

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19454187 --- Diff: python/pyspark/mllib/feature.py --- @@ -18,59 +18,348 @@ Python package for feature in MLlib. +import sys +import warnings

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19454177 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/VectorTransformer.scala --- @@ -20,6 +20,7 @@ package org.apache.spark.mllib.feature import

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19454180 --- Diff: python/pyspark/mllib/feature.py --- @@ -18,59 +18,348 @@ Python package for feature in MLlib. +import sys +import warnings

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19454191 --- Diff: python/pyspark/mllib/feature.py --- @@ -18,59 +18,348 @@ Python package for feature in MLlib. +import sys +import warnings

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19454184 --- Diff: python/pyspark/mllib/feature.py --- @@ -18,59 +18,348 @@ Python package for feature in MLlib. +import sys +import warnings

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19454193 --- Diff: python/pyspark/mllib/feature.py --- @@ -95,33 +385,26 @@ class Word2Vec(object): localDoc = [sentence, sentence] doc

[GitHub] spark pull request: [SPARK-4084] Reuse sort key in Sorter

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2937#discussion_r19454486 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -1237,12 +1237,27 @@ private[spark] object Utils extends Logging

[GitHub] spark pull request: [SPARK-4084] Reuse sort key in Sorter

2014-10-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2937#issuecomment-60713736 @aarondav Does `Jenkins` count? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19455165 --- Diff: python/pyspark/mllib/feature.py --- @@ -95,33 +385,26 @@ class Word2Vec(object): localDoc = [sentence, sentence] doc

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19455162 --- Diff: python/pyspark/mllib/feature.py --- @@ -18,59 +18,348 @@ Python package for feature in MLlib. +import sys +import warnings

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2819#issuecomment-60737839 LGTM. Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2455#issuecomment-60789726 @erikerlandson The feature freeze deadline for v1.2 is this Sat. Just want to check with you and see whether you are going to update the PR this week. --- If your

[GitHub] spark pull request: SPARK-4111 [MLlib] add regression metrics

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2978#discussion_r19486445 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4111 [MLlib] add regression metrics

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2978#discussion_r19486453 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4111 [MLlib] add regression metrics

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2978#discussion_r19486469 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4111 [MLlib] add regression metrics

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2978#discussion_r19486477 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/RegressionMetricsSuite.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4111 [MLlib] add regression metrics

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2978#discussion_r19486450 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4111 [MLlib] add regression metrics

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2978#discussion_r19486463 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4111 [MLlib] add regression metrics

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2978#discussion_r19486472 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/RegressionMetricsSuite.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4111 [MLlib] add regression metrics

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2978#discussion_r19486480 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/RegressionMetricsSuite.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-3838][examples][mllib][python] Word2Vec...

2014-10-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2952#issuecomment-60794001 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2455#issuecomment-60793728 @erikerlandson Great! Thanks for the heads up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-3838][examples][mllib][python] Word2Vec...

2014-10-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2952#issuecomment-60794164 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2942#issuecomment-60794676 @anantasty This PR is still in review. If you are interested in Python binding of streaming algorithms. Could you help add one for StreamingLinearRegression? Thanks

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2942#discussion_r19490145 --- Diff: docs/mllib-clustering.md --- @@ -153,3 +153,75 @@ provided in the [Self-Contained Applications](quick-start.html#self-contained-ap section

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2942#discussion_r19490241 --- Diff: docs/mllib-clustering.md --- @@ -153,3 +153,75 @@ provided in the [Self-Contained Applications](quick-start.html#self-contained-ap section

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2942#discussion_r19490261 --- Diff: docs/mllib-clustering.md --- @@ -153,3 +153,75 @@ provided in the [Self-Contained Applications](quick-start.html#self-contained-ap section

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2942#discussion_r19490254 --- Diff: docs/mllib-clustering.md --- @@ -153,3 +153,75 @@ provided in the [Self-Contained Applications](quick-start.html#self-contained-ap section

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2942#discussion_r19490284 --- Diff: docs/mllib-clustering.md --- @@ -153,3 +153,75 @@ provided in the [Self-Contained Applications](quick-start.html#self-contained-ap section

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2942#discussion_r19490351 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala --- @@ -0,0 +1,246 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2942#discussion_r19490369 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala --- @@ -0,0 +1,246 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2942#discussion_r19490345 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala --- @@ -0,0 +1,246 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2942#discussion_r19490338 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/StreamingKMeans.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2942#discussion_r19490483 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala --- @@ -0,0 +1,246 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2942#discussion_r19490476 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala --- @@ -0,0 +1,246 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2942#discussion_r19490470 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala --- @@ -0,0 +1,246 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2942#discussion_r19490486 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala --- @@ -0,0 +1,246 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2942#discussion_r19490467 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala --- @@ -0,0 +1,246 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2942#discussion_r19490527 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala --- @@ -0,0 +1,246 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2942#discussion_r19490523 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala --- @@ -0,0 +1,246 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2942#discussion_r19490587 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala --- @@ -0,0 +1,246 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2942#discussion_r19492147 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala --- @@ -0,0 +1,246 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2942#discussion_r19492141 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala --- @@ -0,0 +1,246 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2942#discussion_r19492205 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala --- @@ -0,0 +1,246 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2942#issuecomment-60806389 @freeman-lab I made a quick pass over the implementation. It looks great! I will check the math and the test code with someone who knows everything about streaming k

[GitHub] spark pull request: [SPARK-3161][MLLIB] Adding a node Id caching m...

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2868#discussion_r19518756 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/NodeIdCache.scala --- @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3161][MLLIB] Adding a node Id caching m...

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2868#discussion_r19518754 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/NodeIdCache.scala --- @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [FIX] disable benchmark code

2014-10-28 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/2990 [FIX] disable benchmark code I forgot to disable the benchmark code in #2937, which increased the Jenkins build time by couple minutes. @aarondav You can merge this pull request into a Git

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2942#issuecomment-60873448 Had an offline discussion with @freeman-lab . We decided to introduce the concept of `timeUnit` to describe decay. A `timeUnit` (like a second) could be either a `batch

[GitHub] spark pull request: [FIX] disable benchmark code

2014-10-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2990#issuecomment-60874554 The failed test from streaming is a known flaky test. @tdas I've merged this one into master (because it will speed up Jenkins builds). --- If your project

[GitHub] spark pull request: [SPARK-4129][MLlib] Performance tuning in Mult...

2014-10-29 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2992#issuecomment-60965172 LGTM. Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-4130][MLlib] Fixing libSVM parser bug w...

2014-10-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2996#discussion_r19554794 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala --- @@ -76,7 +76,7 @@ object MLUtils { .map { line = val items

[GitHub] spark pull request: SPARK-4111 [MLlib] add regression metrics

2014-10-29 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2978#issuecomment-60965726 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-4111 [MLlib] add regression metrics

2014-10-29 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2978#issuecomment-60965682 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-4148][PySpark] fix seed distribution an...

2014-10-29 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/3010 [SPARK-4148][PySpark] fix seed distribution and add some tests for rdd.sample The current way of seed distribution makes the sequences sampled from partition i and i+1 offset by 1

[GitHub] spark pull request: [SPARK-4150][PySpark] return self in rdd.setNa...

2014-10-29 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/3011 [SPARK-4150][PySpark] return self in rdd.setName Then we can do `rdd.setName('abc').cache().count()`. You can merge this pull request into a Git repository by running: $ git pull https

<    6   7   8   9   10   11   12   13   14   15   >