Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2893#issuecomment-60306344
Verified that this change doesn't affect `unidoc` with Java 6 and 7. Merged
into master. Thanks!
---
If your project is set up for it, you can reply to this email
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2634#issuecomment-60351026
@derrickburns The features are useful, so please don't delete the PR. Since
this is a major refactor of `KMeans`, I need to allocate a block of time to
review the code
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2916#issuecomment-60351266
LGTM. Merged into both master and branch-1.1. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2909#issuecomment-60399506
@srowen Do you plan to fix more `unidoc` errors in this PR?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2928#discussion_r19362087
--- Diff: LICENSE ---
@@ -1,4 +1,3 @@
-
--- End diff --
The license file in Hadoop does have this empty line:
https://github.com/apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2928#discussion_r19362095
--- Diff: core/src/main/scala/org/apache/spark/partial/StudentTCacher.scala
---
@@ -35,7 +37,8 @@ private[spark] class StudentTCacher(confidence: Double
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2928#discussion_r19362101
--- Diff: core/src/main/scala/org/apache/spark/partial/SumEvaluator.scala
---
@@ -55,9 +55,10 @@ private[spark] class SumEvaluator(totalOutputs: Int
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2928#discussion_r19362108
--- Diff:
core/src/main/scala/org/apache/spark/util/random/StratifiedSamplingUtils.scala
---
@@ -245,9 +245,9 @@ private[spark] object
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2928#discussion_r19362102
--- Diff: core/src/main/scala/org/apache/spark/rdd/SampledRDD.scala ---
@@ -53,9 +53,14 @@ private[spark] class SampledRDD[T: ClassTag
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2907#issuecomment-60445965
@srowen Unit tests are in
https://github.com/numbnut/spark/blob/master/mllib/src/test/scala/org/apache/spark/mllib/rdd/RDDFunctionsSuite.scala
I think
GitHub user mengxr opened a pull request:
https://github.com/apache/spark/pull/2937
[SPARK-4084] Reuse sort key in Sorter
Sorter uses generic-typed key for sorting. When data is large, it creates
lots of key objects, which is not efficient. We should reuse the key in Sorter
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2928#discussion_r19368695
--- Diff:
core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala ---
@@ -87,15 +87,19 @@ class BernoulliSampler[T](lb: Double, ub: Double
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2928#discussion_r19368747
--- Diff: core/src/main/scala/org/apache/spark/partial/StudentTCacher.scala
---
@@ -35,7 +37,8 @@ private[spark] class StudentTCacher(confidence: Double
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2907#issuecomment-60459079
Sounds good. @numbnut Could you update the PR and change the following?
1) add @DeveloperApi to RDDFunctions
2) change the return type of `sliding` to `RDD
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2906#issuecomment-60460281
Jenkins, add to whitelist.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2906#issuecomment-60460354
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2906#issuecomment-60463744
@yu-iskw I added you to the whitelist. Future commits from you should
trigger Jenkins automatically. Just took a very brief scan over the code and
really appreciate
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2928#issuecomment-60505131
test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2937#discussion_r19378842
--- Diff: core/src/main/java/org/apache/spark/util/collection/Sorter.java
---
@@ -587,10 +601,12 @@ private int gallopRight(K key, Buffer a, int base,
int
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2937#discussion_r19378852
--- Diff:
core/src/main/scala/org/apache/spark/util/collection/SortDataFormat.scala ---
@@ -34,9 +34,20 @@ import scala.reflect.ClassTag
*/
// TODO
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2937#issuecomment-60505341
@aarondav I updated the PR based on your comment. See the description for
renaming `Sorter.java` to `TimSort.Java`.
---
If your project is set up for it, you can reply
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2909#issuecomment-60505492
LGTM. Verified that `lt;` shows up correctly in generated Scala and Java
docs. Merged into master. Thanks!
---
If your project is set up for it, you can reply
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2928#issuecomment-60506206
@srowen Could you check `JavaAPISuite.sample`? We need to update that test
as well.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2937#issuecomment-60506230
test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2952#issuecomment-60625545
@anantasty Could you also update the doc in
`https://github.com/apache/spark/blob/master/docs/mllib-feature-extraction.md`?
Thanks!
---
If your project is set up
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2928#issuecomment-60638201
LGTM. Verified that `commons.math3` is shaded in the assembly jar. Merged
into master. Thanks!
---
If your project is set up for it, you can reply to this email and have
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2965#issuecomment-60702505
LGTM. Merged into both master and branch-1.1. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2952#discussion_r19451852
--- Diff: docs/mllib-feature-extraction.md ---
@@ -162,6 +162,28 @@ for((synonym, cosineSimilarity) - synonyms) {
}
{% endhighlight %}
/div
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2952#discussion_r19451869
--- Diff: docs/mllib-feature-extraction.md ---
@@ -162,6 +162,28 @@ for((synonym, cosineSimilarity) - synonyms) {
}
{% endhighlight %}
/div
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2952#discussion_r19451872
--- Diff: docs/mllib-feature-extraction.md ---
@@ -162,6 +162,28 @@ for((synonym, cosineSimilarity) - synonyms) {
}
{% endhighlight %}
/div
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2952#discussion_r19451976
--- Diff: docs/mllib-feature-extraction.md ---
@@ -162,6 +162,28 @@ for((synonym, cosineSimilarity) - synonyms) {
}
{% endhighlight %}
/div
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2952#discussion_r19452204
--- Diff: examples/src/main/python/mllib/word2vec.py ---
@@ -0,0 +1,36 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2952#discussion_r19452346
--- Diff: docs/mllib-feature-extraction.md ---
@@ -162,6 +162,28 @@ for((synonym, cosineSimilarity) - synonyms) {
}
{% endhighlight %}
/div
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2937#issuecomment-60710392
Well, it won't pass `travis` ...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2937#discussion_r19453604
--- Diff:
core/src/test/scala/org/apache/spark/util/collection/SorterSuite.scala ---
@@ -61,10 +65,33 @@ class SorterSuite extends FunSuite
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2952#discussion_r19453782
--- Diff: docs/mllib-feature-extraction.md ---
@@ -162,6 +162,38 @@ for((synonym, cosineSimilarity) - synonyms) {
}
{% endhighlight %}
/div
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2952#discussion_r19453785
--- Diff: examples/src/main/python/mllib/word2vec.py ---
@@ -0,0 +1,47 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2952#discussion_r19453784
--- Diff: examples/src/main/python/mllib/word2vec.py ---
@@ -0,0 +1,47 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2819#discussion_r19454162
--- Diff: docs/mllib-feature-extraction.md ---
@@ -95,8 +95,50 @@ tf.cache()
val idf = new IDF(minDocFreq = 2).fit(tf)
val tfidf: RDD[Vector
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2819#discussion_r19454166
--- Diff: docs/mllib-feature-extraction.md ---
@@ -267,4 +346,25 @@ val data1 = data.map(x = (x.label,
normalizer1.transform(x.features)))
val data2
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2819#discussion_r19454164
--- Diff: docs/mllib-feature-extraction.md ---
@@ -162,6 +204,20 @@ for((synonym, cosineSimilarity) - synonyms) {
}
{% endhighlight %}
/div
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2819#discussion_r19454179
--- Diff: python/pyspark/mllib/feature.py ---
@@ -18,59 +18,348 @@
Python package for feature in MLlib.
+import sys
+import warnings
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2819#discussion_r19454172
--- Diff: docs/mllib-feature-extraction.md ---
@@ -267,4 +346,25 @@ val data1 = data.map(x = (x.label,
normalizer1.transform(x.features)))
val data2
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2819#discussion_r19454181
--- Diff: python/pyspark/mllib/feature.py ---
@@ -18,59 +18,348 @@
Python package for feature in MLlib.
+import sys
+import warnings
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2819#discussion_r19454187
--- Diff: python/pyspark/mllib/feature.py ---
@@ -18,59 +18,348 @@
Python package for feature in MLlib.
+import sys
+import warnings
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2819#discussion_r19454177
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/VectorTransformer.scala ---
@@ -20,6 +20,7 @@ package org.apache.spark.mllib.feature
import
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2819#discussion_r19454180
--- Diff: python/pyspark/mllib/feature.py ---
@@ -18,59 +18,348 @@
Python package for feature in MLlib.
+import sys
+import warnings
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2819#discussion_r19454191
--- Diff: python/pyspark/mllib/feature.py ---
@@ -18,59 +18,348 @@
Python package for feature in MLlib.
+import sys
+import warnings
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2819#discussion_r19454184
--- Diff: python/pyspark/mllib/feature.py ---
@@ -18,59 +18,348 @@
Python package for feature in MLlib.
+import sys
+import warnings
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2819#discussion_r19454193
--- Diff: python/pyspark/mllib/feature.py ---
@@ -95,33 +385,26 @@ class Word2Vec(object):
localDoc = [sentence, sentence]
doc
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2937#discussion_r19454486
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -1237,12 +1237,27 @@ private[spark] object Utils extends Logging
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2937#issuecomment-60713736
@aarondav Does `Jenkins` count?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2819#discussion_r19455165
--- Diff: python/pyspark/mllib/feature.py ---
@@ -95,33 +385,26 @@ class Word2Vec(object):
localDoc = [sentence, sentence]
doc
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2819#discussion_r19455162
--- Diff: python/pyspark/mllib/feature.py ---
@@ -18,59 +18,348 @@
Python package for feature in MLlib.
+import sys
+import warnings
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2819#issuecomment-60737839
LGTM. Merged into master. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2455#issuecomment-60789726
@erikerlandson The feature freeze deadline for v1.2 is this Sat. Just want
to check with you and see whether you are going to update the PR this week.
---
If your
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2978#discussion_r19486445
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala
---
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2978#discussion_r19486453
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala
---
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2978#discussion_r19486469
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala
---
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2978#discussion_r19486477
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/evaluation/RegressionMetricsSuite.scala
---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2978#discussion_r19486450
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala
---
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2978#discussion_r19486463
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala
---
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2978#discussion_r19486472
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/evaluation/RegressionMetricsSuite.scala
---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2978#discussion_r19486480
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/evaluation/RegressionMetricsSuite.scala
---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2952#issuecomment-60794001
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2455#issuecomment-60793728
@erikerlandson Great! Thanks for the heads up.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2952#issuecomment-60794164
test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-60794676
@anantasty This PR is still in review. If you are interested in Python
binding of streaming algorithms. Could you help add one for
StreamingLinearRegression? Thanks
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490145
--- Diff: docs/mllib-clustering.md ---
@@ -153,3 +153,75 @@ provided in the [Self-Contained
Applications](quick-start.html#self-contained-ap
section
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490241
--- Diff: docs/mllib-clustering.md ---
@@ -153,3 +153,75 @@ provided in the [Self-Contained
Applications](quick-start.html#self-contained-ap
section
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490261
--- Diff: docs/mllib-clustering.md ---
@@ -153,3 +153,75 @@ provided in the [Self-Contained
Applications](quick-start.html#self-contained-ap
section
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490254
--- Diff: docs/mllib-clustering.md ---
@@ -153,3 +153,75 @@ provided in the [Self-Contained
Applications](quick-start.html#self-contained-ap
section
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490284
--- Diff: docs/mllib-clustering.md ---
@@ -153,3 +153,75 @@ provided in the [Self-Contained
Applications](quick-start.html#self-contained-ap
section
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490351
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490369
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490345
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490338
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/StreamingKMeans.scala
---
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490483
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490476
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490470
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490486
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490467
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490527
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490523
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490587
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19492147
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19492141
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19492205
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-60806389
@freeman-lab I made a quick pass over the implementation. It looks great! I
will check the math and the test code with someone who knows everything about
streaming k
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2868#discussion_r19518756
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impl/NodeIdCache.scala ---
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2868#discussion_r19518754
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impl/NodeIdCache.scala ---
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software
GitHub user mengxr opened a pull request:
https://github.com/apache/spark/pull/2990
[FIX] disable benchmark code
I forgot to disable the benchmark code in #2937, which increased the
Jenkins build time by couple minutes.
@aarondav
You can merge this pull request into a Git
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-60873448
Had an offline discussion with @freeman-lab . We decided to introduce the
concept of `timeUnit` to describe decay. A `timeUnit` (like a second) could be
either a `batch
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2990#issuecomment-60874554
The failed test from streaming is a known flaky test. @tdas
I've merged this one into master (because it will speed up Jenkins builds).
---
If your project
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2992#issuecomment-60965172
LGTM. Merged into master. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2996#discussion_r19554794
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala ---
@@ -76,7 +76,7 @@ object MLUtils {
.map { line =
val items
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2978#issuecomment-60965726
test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2978#issuecomment-60965682
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
GitHub user mengxr opened a pull request:
https://github.com/apache/spark/pull/3010
[SPARK-4148][PySpark] fix seed distribution and add some tests for
rdd.sample
The current way of seed distribution makes the sequences sampled from
partition i and i+1 offset by 1
GitHub user mengxr opened a pull request:
https://github.com/apache/spark/pull/3011
[SPARK-4150][PySpark] return self in rdd.setName
Then we can do `rdd.setName('abc').cache().count()`.
You can merge this pull request into a Git repository by running:
$ git pull https
1001 - 1100 of 8762 matches
Mail list logo