Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-86228166
LGTM. Merged into master. Thanks!!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/4986
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enab
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-86628986
@mengxr thanks for the merge! For supporting this in PySpark, we would need
support for MatrixUDT, which would need support for sparse matrices right? I
could not find
GitHub user MechCoder opened a pull request:
https://github.com/apache/spark/pull/4986
[SPARK-5987] [MLlib] Save/load for GaussianMixtureModels
Should be self explanatory.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MechCoder/
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-78360616
cc: @mengxr @jkbradley
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have thi
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-78360647
[Test build #28481 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28481/consoleFull)
for PR 4986 at commit
[`4898d57`](https://githu
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-78377018
[Test build #28481 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28481/consoleFull)
for PR 4986 at commit
[`4898d57`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-78377037
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4986#discussion_r26355062
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/clustering/GaussianMixtureSuite.scala
---
@@ -138,4 +126,36 @@ class GaussianMixtureSuite extends FunSui
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4986#discussion_r26355058
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -83,5 +95,87 @@ class GaussianMixtureModel(
p(i)
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4986#discussion_r26355052
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -83,5 +95,87 @@ class GaussianMixtureModel(
p(i)
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4986#discussion_r26355055
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -83,5 +95,87 @@ class GaussianMixtureModel(
p(i)
Github user MechCoder commented on a diff in the pull request:
https://github.com/apache/spark/pull/4986#discussion_r26415328
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/clustering/GaussianMixtureSuite.scala
---
@@ -138,4 +126,36 @@ class GaussianMixtureSuite extends Fun
Github user MechCoder commented on a diff in the pull request:
https://github.com/apache/spark/pull/4986#discussion_r26416521
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -83,5 +95,87 @@ class GaussianMixtureModel(
p
Github user shaneknapp commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-79307244
jenkins, test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-79307000
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-79310014
@mengxr I am not sure if we should flatten it or not, would it be worth if
the number of clusters is large? Also I think it would be better if we deal
with MatrixUDT af
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-79314317
[Test build #28584 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28584/consoleFull)
for PR 4986 at commit
[`9aaa535`](https://githu
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-79350101
The number of clusters won't be very large. Flattening an
`Array[Array[Double]]` doesn't copy the data, so there is no overhead. The
content of parquet file is easy to ins
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-79373376
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28
Github user shaneknapp commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-79375733
jenkins, test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-79381832
[Test build #28588 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28588/consoleFull)
for PR 4986 at commit
[`9aaa535`](https://githu
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-79471594
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-79471469
[Test build #28588 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28588/consoleFull)
for PR 4986 at commit
[`9aaa535`](https://gith
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-79934810
@mengxr Fixed! Anything else?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not h
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-79939969
[Test build #28607 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28607/consoleFull)
for PR 4986 at commit
[`4321743`](https://githu
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-80037314
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-80037247
[Test build #28607 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28607/consoleFull)
for PR 4986 at commit
[`4321743`](https://gith
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-84269574
@mengxr I rebased over master and used MatrixUDT. Please review! :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitH
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-84269931
[Test build #28937 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28937/consoleFull)
for PR 4986 at commit
[`23d707e`](https://githu
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-84272950
[Test build #28938 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28938/consoleFull)
for PR 4986 at commit
[`505bd57`](https://githu
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-84279647
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-84279643
[Test build #28937 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28937/consoleFull)
for PR 4986 at commit
[`23d707e`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-84283018
[Test build #28938 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28938/consoleFull)
for PR 4986 at commit
[`505bd57`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-84283019
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-84913707
What would be the reason to add a Save Load Version 1.0. What are the
expected changes to be done in further versions?
---
If your project is set up for it, you can re
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-85086722
We want to allow the model data to be extended (with defaults to allow
backwards compatibility). There might be unforeseeable reasons to change the
format, too.
---
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4986#discussion_r26976590
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -83,5 +95,82 @@ class GaussianMixtureModel(
p(i)
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-85493045
@mengxr Sorry, I misunderstood your comment before. Should look good now.
---
If your project is set up for it, you can reply to this email and have your
reply appear o
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-85495571
[Test build #29089 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29089/consoleFull)
for PR 4986 at commit
[`33c84f9`](https://githu
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-85537761
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-85537736
[Test build #29089 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29089/consoleFull)
for PR 4986 at commit
[`33c84f9`](https://gith
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4986#discussion_r27061444
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -83,5 +95,81 @@ class GaussianMixtureModel(
p(i)
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4986#discussion_r27061601
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -83,5 +95,81 @@ class GaussianMixtureModel(
p(i)
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4986#discussion_r27061662
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -83,5 +95,81 @@ class GaussianMixtureModel(
p(i)
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4986#discussion_r27061886
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/clustering/GaussianMixtureSuite.scala
---
@@ -138,4 +126,36 @@ class GaussianMixtureSuite extends FunSui
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-85660194
@mengxr fixed !
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this featu
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-85661121
[Test build #29101 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29101/consoleFull)
for PR 4986 at commit
[`e7a14cb`](https://githu
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4986#discussion_r27068166
--- Diff: docs/mllib-clustering.md ---
@@ -182,6 +183,10 @@ val parsedData = data.map(s =>
Vectors.dense(s.trim.split(' ').map(_.toDouble)))
// Cluster th
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4986#discussion_r27068186
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -83,5 +95,81 @@ class GaussianMixtureModel(
p(i)
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4986#discussion_r27068180
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -83,5 +95,81 @@ class GaussianMixtureModel(
p(i)
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4986#discussion_r27068202
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -83,5 +95,81 @@ class GaussianMixtureModel(
p(i)
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4986#discussion_r27068181
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -83,5 +95,81 @@ class GaussianMixtureModel(
p(i)
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4986#discussion_r27068205
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -83,5 +95,81 @@ class GaussianMixtureModel(
p(i)
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4986#discussion_r27068179
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -83,5 +95,81 @@ class GaussianMixtureModel(
p(i)
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4986#discussion_r27068197
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -83,5 +95,81 @@ class GaussianMixtureModel(
p(i)
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4986#discussion_r27068176
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -41,10 +47,16 @@ import org.apache.spark.rdd.RDD
@Expe
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4986#discussion_r27068206
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -83,5 +95,81 @@ class GaussianMixtureModel(
p(i)
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4986#discussion_r27068191
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -83,5 +95,81 @@ class GaussianMixtureModel(
p(i)
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-85686561
[Test build #29101 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29101/consoleFull)
for PR 4986 at commit
[`e7a14cb`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-85686572
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29
Github user MechCoder commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-85830433
@mengxr I have addressed your comments. Please have a look !
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as w
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-85830511
[Test build #29148 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29148/consoleFull)
for PR 4986 at commit
[`7d2cd56`](https://githu
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-85865530
[Test build #29148 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29148/consoleFull)
for PR 4986 at commit
[`7d2cd56`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4986#issuecomment-85865557
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29
65 matches
Mail list logo