Github user FlytxtRnD commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-72604505
Thanks @mengxr for your help.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not h
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/4059
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enab
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-72604421
LGTM. Merged into master. Thanks for adding GMM Python API!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-72602454
[Test build #26607 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26607/consoleFull)
for PR 4059 at commit
[`c973ab3`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-72602459
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-72595692
[Test build #26607 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26607/consoleFull)
for PR 4059 at commit
[`c973ab3`](https://githu
Github user FlytxtRnD commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23982975
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -285,6 +286,59 @@ class PythonMLLibAPI extends Serializable {
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23982793
--- Diff: python/pyspark/mllib/clustering.py ---
@@ -86,6 +89,98 @@ def train(cls, rdd, k, maxIterations=100, runs=1,
initializationMode="k-means||"
Github user FlytxtRnD commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23981473
--- Diff: python/pyspark/mllib/clustering.py ---
@@ -86,6 +89,98 @@ def train(cls, rdd, k, maxIterations=100, runs=1,
initializationMode="k-means||"
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23943651
--- Diff: python/pyspark/mllib/clustering.py ---
@@ -86,6 +89,98 @@ def train(cls, rdd, k, maxIterations=100, runs=1,
initializationMode="k-means||"
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23943538
--- Diff: python/pyspark/mllib/stat/distribution.py ---
@@ -0,0 +1,25 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+#
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23943359
--- Diff: python/pyspark/mllib/clustering.py ---
@@ -86,6 +89,98 @@ def train(cls, rdd, k, maxIterations=100, runs=1,
initializationMode="k-means||"
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23943114
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -285,6 +286,59 @@ class PythonMLLibAPI extends Serializable {
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23943111
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -285,6 +286,59 @@ class PythonMLLibAPI extends Serializable {
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23943123
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -285,6 +286,59 @@ class PythonMLLibAPI extends Serializable {
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23943118
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -285,6 +286,59 @@ class PythonMLLibAPI extends Serializable {
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23943115
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -285,6 +286,59 @@ class PythonMLLibAPI extends Serializable {
Github user FlytxtRnD commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-72458892
Please review and merge..
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-72458186
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-72458177
[Test build #26515 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26515/consoleFull)
for PR 4059 at commit
[`fa0a142`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-72449146
[Test build #26515 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26515/consoleFull)
for PR 4059 at commit
[`fa0a142`](https://githu
Github user FlytxtRnD commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-72445559
Is it possible to start a test build in Jenkins without updating the PR?
---
If your project is set up for it, you can reply to this email and have your
reply appear o
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-72442051
[Test build #26509 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26509/consoleFull)
for PR 4059 at commit
[`d5b36ab`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-72442062
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-72434076
[Test build #26509 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26509/consoleFull)
for PR 4059 at commit
[`d5b36ab`](https://githu
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-72431354
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-72431346
[Test build #26502 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26502/consoleFull)
for PR 4059 at commit
[`ac134f1`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-72423418
[Test build #26502 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26502/consoleFull)
for PR 4059 at commit
[`ac134f1`](https://githu
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-72420713
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-72420711
[Test build #26499 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26499/consoleFull)
for PR 4059 at commit
[`2e9f12a`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-72420623
[Test build #26499 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26499/consoleFull)
for PR 4059 at commit
[`2e9f12a`](https://githu
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-72408935
Yes, Array should work.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user FlytxtRnD commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-72406403
So I will go with the current approach. I tried to change Array to
ArrayBuffer but is ending up in exceptions. So can I go with array itself ?
---
If your project is s
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-72405813
They are not attributes but public methods. Did you try `mu()` and
`sigma()`? I think the current approach looks good except minor issues
commented. We can try other appro
Github user FlytxtRnD commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-72404786
Instead of passing mu & sigma as arrays, I tried to directly pass
"gaussians "(Array[MultivariateGaussian]) from PythonMLLibAPI. But I was not
able to access the attrib
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-72289335
@FlytxtRnD #4290 is merged. So please fetch and merge master, and rename
`GaussianMixtureEM` to `GaussianMixture` in your PR. Thanks!
---
If your project is set up for it
Github user FlytxtRnD commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23828921
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -284,6 +285,59 @@ class PythonMLLibAPI extends Serializable {
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-72083636
Btw, I wanted to make sure you knew about
[https://issues.apache.org/jira/browse/SPARK-5400], which I plan to do soon.
---
If your project is set up for it, you can re
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-72075492
@FlytxtRnD I've merged the PR that refactors `mllib.stat`. It should be
straightforward to add `distribution.py` under `mllib/stat/` now.
---
If your project is set up fo
Github user FlytxtRnD commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23754619
--- Diff: python/pyspark/mllib/tests.py ---
@@ -167,6 +167,32 @@ def test_kmeans_deterministic(self):
# TODO: Allow small numeric difference
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23754373
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -284,6 +285,59 @@ class PythonMLLibAPI extends Serializable {
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23754387
--- Diff: python/pyspark/mllib/clustering.py ---
@@ -86,6 +88,84 @@ def train(cls, rdd, k, maxIterations=100, runs=1,
initializationMode="k-means||"
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23754398
--- Diff: python/pyspark/mllib/tests.py ---
@@ -167,6 +167,32 @@ def test_kmeans_deterministic(self):
# TODO: Allow small numeric difference.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23754380
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -284,6 +285,59 @@ class PythonMLLibAPI extends Serializable {
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23754369
--- Diff: examples/src/main/python/mllib/gaussian_mixture_model.py ---
@@ -0,0 +1,67 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23754367
--- Diff: examples/src/main/python/mllib/gaussian_mixture_model.py ---
@@ -0,0 +1,67 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23754375
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -284,6 +285,59 @@ class PythonMLLibAPI extends Serializable {
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23754381
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -284,6 +285,59 @@ class PythonMLLibAPI extends Serializable {
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23754368
--- Diff: examples/src/main/python/mllib/gaussian_mixture_model.py ---
@@ -0,0 +1,67 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23754372
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -284,6 +285,59 @@ class PythonMLLibAPI extends Serializable {
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23754389
--- Diff: python/pyspark/mllib/clustering.py ---
@@ -86,6 +88,84 @@ def train(cls, rdd, k, maxIterations=100, runs=1,
initializationMode="k-means||"
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23754395
--- Diff: python/pyspark/mllib/clustering.py ---
@@ -86,6 +88,84 @@ def train(cls, rdd, k, maxIterations=100, runs=1,
initializationMode="k-means||"
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23753388
--- Diff: python/pyspark/mllib/clustering.py ---
@@ -86,6 +88,84 @@ def train(cls, rdd, k, maxIterations=100, runs=1,
initializationMode="k-means||"
Github user FlytxtRnD commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23752384
--- Diff: python/pyspark/mllib/clustering.py ---
@@ -86,6 +88,84 @@ def train(cls, rdd, k, maxIterations=100, runs=1,
initializationMode="k-means||"
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23751738
--- Diff: python/pyspark/mllib/clustering.py ---
@@ -86,6 +88,84 @@ def train(cls, rdd, k, maxIterations=100, runs=1,
initializationMode="k-means||"
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-71984393
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-71984377
[Test build #26303 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26303/consoleFull)
for PR 4059 at commit
[`2e14d82`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-71978609
[Test build #26303 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26303/consoleFull)
for PR 4059 at commit
[`2e14d82`](https://githu
Github user FlytxtRnD commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-71978435
The PR is updated according to https://github.com/apache/spark/pull/4088
which modifies GaussianMixtureModel to expose instances of MutlivariateGaussian
rather than se
Github user FlytxtRnD commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-71146302
@mengxr Thank you for the review and comments. I am changing the code
according to #3923 (tgaloppo).
---
If your project is set up for it, you can reply to this emai
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23399968
--- Diff: examples/src/main/python/mllib/gaussian_mixture_model.py ---
@@ -0,0 +1,65 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23399808
--- Diff: python/pyspark/mllib/clustering.py ---
@@ -86,6 +86,68 @@ def train(cls, rdd, k, maxIterations=100, runs=1,
initializationMode="k-means||"
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23399900
--- Diff: python/pyspark/mllib/clustering.py ---
@@ -86,6 +86,68 @@ def train(cls, rdd, k, maxIterations=100, runs=1,
initializationMode="k-means||"
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23399765
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -280,6 +280,48 @@ class PythonMLLibAPI extends Serializable {
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23399804
--- Diff: python/pyspark/mllib/clustering.py ---
@@ -86,6 +86,68 @@ def train(cls, rdd, k, maxIterations=100, runs=1,
initializationMode="k-means||"
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23399756
--- Diff: examples/src/main/python/mllib/gaussian_mixture_model.py ---
@@ -0,0 +1,65 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23399719
--- Diff: examples/src/main/python/mllib/gaussian_mixture_model.py ---
@@ -0,0 +1,65 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23399766
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -280,6 +280,48 @@ class PythonMLLibAPI extends Serializable {
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23399715
--- Diff: examples/src/main/python/mllib/gaussian_mixture_model.py ---
@@ -0,0 +1,65 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4059#discussion_r23399730
--- Diff: examples/src/main/python/mllib/gaussian_mixture_model.py ---
@@ -0,0 +1,65 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-70230909
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-70230901
[Test build #25648 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25648/consoleFull)
for PR 4059 at commit
[`c1d4c71`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-70223642
[Test build #25648 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25648/consoleFull)
for PR 4059 at commit
[`c1d4c71`](https://githu
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-70208901
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-70208896
[Test build #25634 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25634/consoleFull)
for PR 4059 at commit
[`f82750b`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-70205823
[Test build #25634 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25634/consoleFull)
for PR 4059 at commit
[`f82750b`](https://githu
Github user FlytxtRnD commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-70205738
@jkbradley py4j serialization issue has been solved by the commit
https://github.com/apache/spark/commit/8ead999fd627b12837fb2f082a0e76e9d121d269
---
If your project
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-70142339
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-70142336
[Test build #25609 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25609/consoleFull)
for PR 4059 at commit
[`5c83825`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-70142185
[Test build #25609 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25609/consoleFull)
for PR 4059 at commit
[`5c83825`](https://githu
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-70141761
add to whitelist
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4059#issuecomment-70077652
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your pro
GitHub user FlytxtRnD opened a pull request:
https://github.com/apache/spark/pull/4059
[SPARK-5012][MLLib][PySpark]Python API for Gaussian Mixture Model
Python API for the Gaussian Mixture Model clustering algorithm in MLLib.
You can merge this pull request into a Git repository by
83 matches
Mail list logo