Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/2942
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enab
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-61358857
LGTM. Merged into master. Thanks for adding streaming k-means!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as we
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-61358352
[Test build #22677 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22677/consoleFull)
for PR 2942 at commit
[`b2e5b4a`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-61358356
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-61356950
[Test build #22677 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22677/consoleFull)
for PR 2942 at commit
[`b2e5b4a`](https://githu
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-61356758
@mengxr great updates! LGMT. Just need to update the doc/examples in a
couple places I think.
---
If your project is set up for it, you can reply to this email and h
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-61356547
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-61356545
[Test build #22673 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22673/consoleFull)
for PR 2942 at commit
[`078617c`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-61354018
[Test build #22673 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22673/consoleFull)
for PR 2942 at commit
[`078617c`](https://githu
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-61319592
@freeman-lab I made some changes:
https://github.com/freeman-lab/spark/pull/1 , which includes the following:
1. discount on previous counts
2. detecting dying
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-61241985
[Test build #22607 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22607/consoleFull)
for PR 2942 at commit
[`0411bf5`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-61241988
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-61234935
[Test build #22607 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22607/consoleFull)
for PR 2942 at commit
[`0411bf5`](https://githu
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-61234517
@mengxr I implemented the new parameterization (and tried to make the docs
on it more intuitive), see what you think!
---
If your project is set up for it, you can r
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-60880665
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-60880661
[Test build #22428 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22428/consoleFull)
for PR 2942 at commit
[`9f7aea9`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-60876198
[Test build #22428 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22428/consoleFull)
for PR 2942 at commit
[`9f7aea9`](https://githu
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-60875507
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-60875506
[Test build #22426 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22426/consoleFull)
for PR 2942 at commit
[`374a706`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-60875441
[Test build #22426 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22426/consoleFull)
for PR 2942 at commit
[`374a706`](https://githu
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-60873448
Had an offline discussion with @freeman-lab . We decided to introduce the
concept of `timeUnit` to describe decay. A `timeUnit` (like a second) could be
either a `batch` o
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-60850276
@mengxr @coderxiang @rxin Thanks all for the feedback! I'm implementing
these changes.
---
If your project is set up for it, you can reply to this email and have you
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-60806389
@freeman-lab I made a quick pass over the implementation. It looks great! I
will check the math and the test code with someone who knows everything about
streaming k-means
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19492205
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software Fo
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19492141
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software Fo
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19492147
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software Fo
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490587
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software Fo
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490523
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software Fo
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490527
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software Fo
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490486
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software Fo
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490467
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software Fo
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490470
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software Fo
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490476
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software Fo
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490483
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software Fo
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490338
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/StreamingKMeans.scala
---
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software F
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490345
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software Fo
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490369
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software Fo
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490351
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software Fo
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490284
--- Diff: docs/mllib-clustering.md ---
@@ -153,3 +153,75 @@ provided in the [Self-Contained
Applications](quick-start.html#self-contained-ap
section of th
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490261
--- Diff: docs/mllib-clustering.md ---
@@ -153,3 +153,75 @@ provided in the [Self-Contained
Applications](quick-start.html#self-contained-ap
section of th
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490254
--- Diff: docs/mllib-clustering.md ---
@@ -153,3 +153,75 @@ provided in the [Self-Contained
Applications](quick-start.html#self-contained-ap
section of th
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490241
--- Diff: docs/mllib-clustering.md ---
@@ -153,3 +153,75 @@ provided in the [Self-Contained
Applications](quick-start.html#self-contained-ap
section of th
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19490145
--- Diff: docs/mllib-clustering.md ---
@@ -153,3 +153,75 @@ provided in the [Self-Contained
Applications](quick-start.html#self-contained-ap
section of th
Github user freeman-lab commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-60796301
@anantasty Agreed, should be separate, but would be very cool to have! Ping
me as well, happy to provide feedback.
---
If your project is set up for it, you can repl
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-60795975
It should be in a separate JIRA (and hence a separate PR). Please create a
JIRA for `StreamingLinearRegression` and ping me there. Thanks!
---
If your project is set up f
Github user anantasty commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-60795596
I would certainly be interested in doing that. I just wasn't sure if it was
better to do it as a separate PR/ task.
On Oct 28, 2014 11:19 AM, "Xiangrui Meng" wro
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-60794676
@anantasty This PR is still in review. If you are interested in Python
binding of streaming algorithms. Could you help add one for
StreamingLinearRegression? Thanks!
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19454435
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software Foun
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19454416
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software Foun
Github user coderxiang commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19446734
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Softwar
Github user coderxiang commented on a diff in the pull request:
https://github.com/apache/spark/pull/2942#discussion_r19446715
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Softwar
Github user anantasty commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-60554980
Should we create another PR for the python bindings/example?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as w
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-60477107
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-60477105
[Test build #22209 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22209/consoleFull)
for PR 2942 at commit
[`2086bdc`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2942#issuecomment-60475562
[Test build #22209 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22209/consoleFull)
for PR 2942 at commit
[`2086bdc`](https://githu
GitHub user freeman-lab opened a pull request:
https://github.com/apache/spark/pull/2942
Streaming KMeans [MLLIB][SPARK-3254]
This adds a Streaming KMeans algorithm to MLlib. It uses an update rule
that generalizes the mini-batch KMeans update to incorporate a decay factor,
which a
56 matches
Mail list logo