[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-20 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2667#issuecomment-59793960 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-20 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2667#issuecomment-59794008 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...

2014-10-20 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2780#issuecomment-59831875 Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3569][SQL] Add metadata field to Struct...

2014-10-20 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2701#discussion_r19125755 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/Metadata.scala --- @@ -0,0 +1,252 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3161][MLLIB] Adding a node Id caching m...

2014-10-20 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2868#issuecomment-59877524 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-3161][MLLIB] Adding a node Id caching m...

2014-10-20 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2868#issuecomment-59877519 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-20 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19129869 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-20 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19129854 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-20 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19129863 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-20 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19129850 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-20 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19129866 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-20 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19129859 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-20 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19129853 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-20 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19129857 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-20 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19129865 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4023] [MLlib] [PySpark] convert rdd int...

2014-10-20 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2870#discussion_r19130591 --- Diff: python/pyspark/mllib/tests.py --- @@ -202,6 +204,16 @@ def test_regression(self): self.assertTrue(dt_model.predict(features[3]) >

[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...

2014-10-21 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2636#issuecomment-59889793 @mdagost Thanks for working on the SerDe! I tested it locally and it works correctly, but the unit tests for the added methods are missing. Do you mind adding them? You

[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...

2014-10-21 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2636#issuecomment-59956077 this is ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...

2014-10-21 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2636#issuecomment-59956125 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-4023] [MLlib] [PySpark] convert rdd int...

2014-10-21 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2870#issuecomment-59956388 LGTM. Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-21 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19183393 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-21 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2667#issuecomment-60011373 LGTM. Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: Fix for sampling error in NumPy v1.9 [SPARK-39...

2014-10-22 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2889#issuecomment-60114196 LGTM. Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-4055][MLlib] Inconsistent spelling 'MLl...

2014-10-23 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2903#issuecomment-60265674 LGTM. Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: specify unidocGenjavadocVersion of 0.8

2014-10-23 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2893#issuecomment-60306344 Verified that this change doesn't affect `unidoc` with Java 6 and 7. Merged into master. Thanks! --- If your project is set up for it, you can reply to this emai

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-10-23 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2634#issuecomment-60351026 @derrickburns The features are useful, so please don't delete the PR. Since this is a major refactor of `KMeans`, I need to allocate a block of time to review the

[GitHub] spark pull request: [SPARK-2652] [PySpark] donot use KyroSerialize...

2014-10-23 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2916#issuecomment-60351266 LGTM. Merged into both master and branch-1.1. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: SPARK-3359 [DOCS] sbt/sbt unidoc doesn't work ...

2014-10-24 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2909#issuecomment-60399506 @srowen Do you plan to fix more `unidoc` errors in this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

2014-10-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2928#discussion_r19362087 --- Diff: LICENSE --- @@ -1,4 +1,3 @@ - --- End diff -- The license file in Hadoop does have this empty line: https://github.com/apache

[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

2014-10-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2928#discussion_r19362095 --- Diff: core/src/main/scala/org/apache/spark/partial/StudentTCacher.scala --- @@ -35,7 +37,8 @@ private[spark] class StudentTCacher(confidence: Double

[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

2014-10-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2928#discussion_r19362101 --- Diff: core/src/main/scala/org/apache/spark/partial/SumEvaluator.scala --- @@ -55,9 +55,10 @@ private[spark] class SumEvaluator(totalOutputs: Int

[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

2014-10-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2928#discussion_r19362108 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSamplingUtils.scala --- @@ -245,9 +245,9 @@ private[spark] object

[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

2014-10-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2928#discussion_r19362102 --- Diff: core/src/main/scala/org/apache/spark/rdd/SampledRDD.scala --- @@ -53,9 +53,14 @@ private[spark] class SampledRDD[T: ClassTag]( if

[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

2014-10-24 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2907#issuecomment-60445965 @srowen Unit tests are in https://github.com/numbnut/spark/blob/master/mllib/src/test/scala/org/apache/spark/mllib/rdd/RDDFunctionsSuite.scala I think

[GitHub] spark pull request: [SPARK-4084] Reuse sort key in Sorter

2014-10-24 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/2937 [SPARK-4084] Reuse sort key in Sorter Sorter uses generic-typed key for sorting. When data is large, it creates lots of key objects, which is not efficient. We should reuse the key in Sorter for

[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

2014-10-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2928#discussion_r19368695 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -87,15 +87,19 @@ class BernoulliSampler[T](lb: Double, ub: Double

[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

2014-10-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2928#discussion_r19368747 --- Diff: core/src/main/scala/org/apache/spark/partial/StudentTCacher.scala --- @@ -35,7 +37,8 @@ private[spark] class StudentTCacher(confidence: Double

[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

2014-10-24 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2907#issuecomment-60459079 Sounds good. @numbnut Could you update the PR and change the following? 1) add @DeveloperApi to RDDFunctions 2) change the return type of `sliding` to `RDD

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-10-24 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2906#issuecomment-60460281 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-10-24 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2906#issuecomment-60460354 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-10-24 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2906#issuecomment-60463744 @yu-iskw I added you to the whitelist. Future commits from you should trigger Jenkins automatically. Just took a very brief scan over the code and really appreciate the

[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] Replace colt depende...

2014-10-25 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2928#issuecomment-60505131 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-4084] Reuse sort key in Sorter

2014-10-25 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2937#discussion_r19378842 --- Diff: core/src/main/java/org/apache/spark/util/collection/Sorter.java --- @@ -587,10 +601,12 @@ private int gallopRight(K key, Buffer a, int base, int

[GitHub] spark pull request: [SPARK-4084] Reuse sort key in Sorter

2014-10-25 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2937#discussion_r19378852 --- Diff: core/src/main/scala/org/apache/spark/util/collection/SortDataFormat.scala --- @@ -34,9 +34,20 @@ import scala.reflect.ClassTag */ // TODO

[GitHub] spark pull request: [SPARK-4084] Reuse sort key in Sorter

2014-10-25 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2937#issuecomment-60505341 @aarondav I updated the PR based on your comment. See the description for renaming `Sorter.java` to `TimSort.Java`. --- If your project is set up for it, you can reply

[GitHub] spark pull request: SPARK-3359 [DOCS] sbt/sbt unidoc doesn't work ...

2014-10-25 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2909#issuecomment-60505492 LGTM. Verified that `<` shows up correctly in generated Scala and Java docs. Merged into master. Thanks! --- If your project is set up for it, you can reply to t

[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] Replace colt depende...

2014-10-25 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2928#issuecomment-60506206 @srowen Could you check `JavaAPISuite.sample`? We need to update that test as well. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-4084] Reuse sort key in Sorter

2014-10-25 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2937#issuecomment-60506230 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-3838][examples][mllib][python] Word2Vec...

2014-10-27 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2952#issuecomment-60625545 @anantasty Could you also update the doc in `https://github.com/apache/spark/blob/master/docs/mllib-feature-extraction.md`? Thanks! --- If your project is set up for it

[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] Replace colt depende...

2014-10-27 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2928#issuecomment-60638201 LGTM. Verified that `commons.math3` is shaded in the assembly jar. Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [MLlib] SPARK-3987: add test case on objective...

2014-10-27 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2965#issuecomment-60702505 LGTM. Merged into both master and branch-1.1. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-3838][examples][mllib][python] Word2Vec...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2952#discussion_r19451852 --- Diff: docs/mllib-feature-extraction.md --- @@ -162,6 +162,28 @@ for((synonym, cosineSimilarity) <- synonyms) { } {% endhighli

[GitHub] spark pull request: [SPARK-3838][examples][mllib][python] Word2Vec...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2952#discussion_r19451869 --- Diff: docs/mllib-feature-extraction.md --- @@ -162,6 +162,28 @@ for((synonym, cosineSimilarity) <- synonyms) { } {% endhighli

[GitHub] spark pull request: [SPARK-3838][examples][mllib][python] Word2Vec...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2952#discussion_r19451872 --- Diff: docs/mllib-feature-extraction.md --- @@ -162,6 +162,28 @@ for((synonym, cosineSimilarity) <- synonyms) { } {% endhighli

[GitHub] spark pull request: [SPARK-3838][examples][mllib][python] Word2Vec...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2952#discussion_r19451976 --- Diff: docs/mllib-feature-extraction.md --- @@ -162,6 +162,28 @@ for((synonym, cosineSimilarity) <- synonyms) { } {% endhighli

[GitHub] spark pull request: [SPARK-3838][examples][mllib][python] Word2Vec...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2952#discussion_r19452204 --- Diff: examples/src/main/python/mllib/word2vec.py --- @@ -0,0 +1,36 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request: [SPARK-3838][examples][mllib][python] Word2Vec...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2952#discussion_r19452346 --- Diff: docs/mllib-feature-extraction.md --- @@ -162,6 +162,28 @@ for((synonym, cosineSimilarity) <- synonyms) { } {% endhighli

[GitHub] spark pull request: [SPARK-4084] Reuse sort key in Sorter

2014-10-27 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2937#issuecomment-60710392 Well, it won't pass `travis` ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project doe

[GitHub] spark pull request: [SPARK-4084] Reuse sort key in Sorter

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2937#discussion_r19453604 --- Diff: core/src/test/scala/org/apache/spark/util/collection/SorterSuite.scala --- @@ -61,10 +65,33 @@ class SorterSuite extends FunSuite

[GitHub] spark pull request: [SPARK-3838][examples][mllib][python] Word2Vec...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2952#discussion_r19453782 --- Diff: docs/mllib-feature-extraction.md --- @@ -162,6 +162,38 @@ for((synonym, cosineSimilarity) <- synonyms) { } {% endhighli

[GitHub] spark pull request: [SPARK-3838][examples][mllib][python] Word2Vec...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2952#discussion_r19453785 --- Diff: examples/src/main/python/mllib/word2vec.py --- @@ -0,0 +1,47 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request: [SPARK-3838][examples][mllib][python] Word2Vec...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2952#discussion_r19453784 --- Diff: examples/src/main/python/mllib/word2vec.py --- @@ -0,0 +1,47 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19454162 --- Diff: docs/mllib-feature-extraction.md --- @@ -95,8 +95,50 @@ tf.cache() val idf = new IDF(minDocFreq = 2).fit(tf) val tfidf: RDD[Vector

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19454166 --- Diff: docs/mllib-feature-extraction.md --- @@ -267,4 +346,25 @@ val data1 = data.map(x => (x.label, normalizer1.transform(x.features))) val da

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19454164 --- Diff: docs/mllib-feature-extraction.md --- @@ -162,6 +204,20 @@ for((synonym, cosineSimilarity) <- synonyms) { } {% endhighli

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19454179 --- Diff: python/pyspark/mllib/feature.py --- @@ -18,59 +18,348 @@ """ Python package for feature in MLlib. ""&quo

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19454172 --- Diff: docs/mllib-feature-extraction.md --- @@ -267,4 +346,25 @@ val data1 = data.map(x => (x.label, normalizer1.transform(x.features))) val da

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19454181 --- Diff: python/pyspark/mllib/feature.py --- @@ -18,59 +18,348 @@ """ Python package for feature in MLlib. ""&quo

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19454187 --- Diff: python/pyspark/mllib/feature.py --- @@ -18,59 +18,348 @@ """ Python package for feature in MLlib. ""&quo

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19454177 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/VectorTransformer.scala --- @@ -20,6 +20,7 @@ package org.apache.spark.mllib.feature import

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19454180 --- Diff: python/pyspark/mllib/feature.py --- @@ -18,59 +18,348 @@ """ Python package for feature in MLlib. ""&quo

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19454191 --- Diff: python/pyspark/mllib/feature.py --- @@ -18,59 +18,348 @@ """ Python package for feature in MLlib. ""&quo

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19454184 --- Diff: python/pyspark/mllib/feature.py --- @@ -18,59 +18,348 @@ """ Python package for feature in MLlib. ""&quo

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19454193 --- Diff: python/pyspark/mllib/feature.py --- @@ -95,33 +385,26 @@ class Word2Vec(object): >>> localDoc = [sentence, sentence]

[GitHub] spark pull request: [SPARK-4084] Reuse sort key in Sorter

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2937#discussion_r19454486 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -1237,12 +1237,27 @@ private[spark] object Utils extends Logging

[GitHub] spark pull request: [SPARK-4084] Reuse sort key in Sorter

2014-10-27 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2937#issuecomment-60713736 @aarondav Does `Jenkins` count? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19455165 --- Diff: python/pyspark/mllib/feature.py --- @@ -95,33 +385,26 @@ class Word2Vec(object): >>> localDoc = [sentence, sentence]

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2819#discussion_r19455162 --- Diff: python/pyspark/mllib/feature.py --- @@ -18,59 +18,348 @@ """ Python package for feature in MLlib. ""&quo

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2819#issuecomment-60737839 LGTM. Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2455#issuecomment-60789726 @erikerlandson The feature freeze deadline for v1.2 is this Sat. Just want to check with you and see whether you are going to update the PR this week. --- If your

[GitHub] spark pull request: SPARK-4111 [MLlib] add regression metrics

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2978#discussion_r19486445 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4111 [MLlib] add regression metrics

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2978#discussion_r19486453 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4111 [MLlib] add regression metrics

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2978#discussion_r19486469 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4111 [MLlib] add regression metrics

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2978#discussion_r19486477 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/RegressionMetricsSuite.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4111 [MLlib] add regression metrics

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2978#discussion_r19486450 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4111 [MLlib] add regression metrics

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2978#discussion_r19486463 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4111 [MLlib] add regression metrics

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2978#discussion_r19486472 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/RegressionMetricsSuite.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4111 [MLlib] add regression metrics

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2978#discussion_r19486480 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/RegressionMetricsSuite.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-3838][examples][mllib][python] Word2Vec...

2014-10-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2952#issuecomment-60794001 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2455#issuecomment-60793728 @erikerlandson Great! Thanks for the heads up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-3838][examples][mllib][python] Word2Vec...

2014-10-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2952#issuecomment-60794164 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2942#issuecomment-60794676 @anantasty This PR is still in review. If you are interested in Python binding of streaming algorithms. Could you help add one for StreamingLinearRegression? Thanks

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2942#issuecomment-60795975 It should be in a separate JIRA (and hence a separate PR). Please create a JIRA for `StreamingLinearRegression` and ping me there. Thanks! --- If your project is set up

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2942#discussion_r19490145 --- Diff: docs/mllib-clustering.md --- @@ -153,3 +153,75 @@ provided in the [Self-Contained Applications](quick-start.html#self-contained-ap section of

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2942#discussion_r19490241 --- Diff: docs/mllib-clustering.md --- @@ -153,3 +153,75 @@ provided in the [Self-Contained Applications](quick-start.html#self-contained-ap section of

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2942#discussion_r19490261 --- Diff: docs/mllib-clustering.md --- @@ -153,3 +153,75 @@ provided in the [Self-Contained Applications](quick-start.html#self-contained-ap section of

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2942#discussion_r19490254 --- Diff: docs/mllib-clustering.md --- @@ -153,3 +153,75 @@ provided in the [Self-Contained Applications](quick-start.html#self-contained-ap section of

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2942#discussion_r19490284 --- Diff: docs/mllib-clustering.md --- @@ -153,3 +153,75 @@ provided in the [Self-Contained Applications](quick-start.html#self-contained-ap section of

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2942#discussion_r19490351 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala --- @@ -0,0 +1,246 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]

2014-10-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2942#discussion_r19490369 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala --- @@ -0,0 +1,246 @@ +/* + * Licensed to the Apache Software

  1   2   3   4   5   6   7   8   9   10   >