[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2017-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11974 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77397/ Test PASSed. ---

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2017-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11974 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2017-05-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11974 **[Test build #77397 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77397/testReport)** for PR 11974 at commit

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2017-05-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11974 **[Test build #77397 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77397/testReport)** for PR 11974 at commit

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2017-05-25 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/11974 @sethah In general, MiniBatchKMeans generate a worse model, but in many case the difference is small, while the speedup can be significant. --- If your project is set up for it, you can reply

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2017-05-25 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/11974 @sethah Agree that if I/O is the bottleneck, the speedup should be small. The cost in results doc is computed on the whole dataset, no the sampled ones. Since I think comparing cost on

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2017-05-25 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/11974 Mini-batching in Spark generally isn't that efficient, since to extract a mini-batch you still need to iterate over the entire dataset - and that means reading it from disk if it doesn't fit into

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2017-05-25 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/11974 cc @srowen @setha also --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2017-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11974 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77287/ Test PASSed. ---

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2017-05-23 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11974 **[Test build #77287 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77287/testReport)** for PR 11974 at commit

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2017-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11974 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2017-05-23 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11974 **[Test build #77287 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77287/testReport)** for PR 11974 at commit

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2017-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11974 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2017-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11974 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77244/ Test PASSed. ---

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2017-05-23 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11974 **[Test build #77244 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77244/testReport)** for PR 11974 at commit

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2017-05-23 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11974 **[Test build #77244 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77244/testReport)** for PR 11974 at commit

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2017-05-23 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/11974 @MLnick `@Since` updated. And the performance test result is attached in the JIRA. Thanks for your reviewing! --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2017-05-17 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/11974 I commented on the JIRA about posting performance test results. This should also be updated for `@Since` tags to `2.3.0`. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2016-12-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11974 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70743/ Test PASSed. ---

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2016-12-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11974 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2016-12-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11974 **[Test build #70743 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70743/testReport)** for PR 11974 at commit

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2016-12-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11974 **[Test build #70743 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70743/testReport)** for PR 11974 at commit

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11974 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69405/ Test PASSed. ---

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11974 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11974 **[Test build #69405 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69405/consoleFull)** for PR 11974 at commit

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11974 **[Test build #69405 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69405/consoleFull)** for PR 11974 at commit

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2016-11-30 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/11974 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11974 **[Test build #69402 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69402/consoleFull)** for PR 11974 at commit

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11974 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69402/ Test FAILed. ---

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11974 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11974 **[Test build #69402 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69402/consoleFull)** for PR 11974 at commit