Github user tgaloppo commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-68475553
@jkbradley Please assign me SPARK-5017, and I will take care of this in
preparation for 5018 and 5019.
---
If your project is set up for it, you can reply to this
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-68476266
Done :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-68398244
@tgaloppo @FlytxtRnD I made some JIRAs for the to-do items above.
I'd say the most important are:
* [Change predictMembership() to take an RDD, not the
Github user tgaloppo commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-68401923
@jkbradley Please assign 5017, 5018, 5019, and 5020 to me. Regarding 5018,
can you refer me to other PR's that are bringing in common distributions? I
can work toward
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-68414406
@tgaloppo It's ideal if we assign fix one JIRA at a time (as separate
PRs). Can I start by assigning one of your choosing?
For 5018, there is only [one
Github user tgaloppo commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-68415536
@jkbradley No problem. Let's start with 5020, and I'll move on from there.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-68299685
[Test build #555 has
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/555/consoleFull)
for PR 3022 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-68307573
[Test build #555 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/555/consoleFull)
for PR 3022 at commit
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-68312524
@tgaloppo Thanks for the updates, and thanks for all of your work in
getting this ready!
LGTM
CC: @mengxr
After this is merged, I'll make
Github user tgaloppo commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-68313864
@jkbradley Thank you for your help and feedback along the way. Please
assign some (or all) of those tickets to me and I will continue to improve the
implementation.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/3022
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-68315285
@tgaloppo I've merged this into master. Thanks for contributing GMM!
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user FlytxtRnD commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-68335194
@tgaloppo Good Work
@mengxr Thanks for giving us a chance to be a part of this contribution
---
If your project is set up for it, you can reply to this email and
Github user FlytxtRnD commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-67816287
Sorry for late reply.predictLabels() and predictMembership() looks fine.But
what about moving the computeSoftAssignments() to GaussianMixtureModelEM
class(in KMeans,
Github user FlytxtRnD commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22163213
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala ---
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software
Github user FlytxtRnD commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22163250
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22184915
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22185023
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22185641
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,242 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-67880399
@tgaloppo MLUtils.EPSILON is actually private[util]. I think it would be
fine to change it to be private[mllib]. CC: @mengxr
@tgaloppo I strongly recommend
Github user tgaloppo commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-67885369
Ok. I changed the privacy of EPSILON and am now using it in this code.
I changed the name from GaussianMixtureModelEM to GaussianMixtureEM.
I've changed
Github user tgaloppo commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22136408
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache
Github user tgaloppo commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22136877
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache
Github user tgaloppo commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-67717252
I've performed most of the requested changes. I do not see the BLAS
function mentioned (dsyr), so I left this as a TODO. Also, I could not find
EPSILON in MLUtils.
Github user tgaloppo commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-67486366
Ok, I have addressed (I think) all of those issues, with the exception of
modifying GaussianMixtureModel to carry instances of MultivariateGaussian. I
do like that
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22058276
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,284 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22058461
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-67527000
OK, that sounds good. Feel free to make a JIRA for that issue. Thanks for
the updates! I'll take a look.
---
If your project is set up for it, you can reply to
Github user tgaloppo commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22059411
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,284 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22061331
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/impl/MultivariateGaussian.scala
---
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22061399
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/impl/MultivariateGaussian.scala
---
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache
Github user tgaloppo commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22066566
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/impl/MultivariateGaussian.scala
---
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22067710
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-67549445
@tgaloppo Thanks for the updates! It looks quite good to me. My main
remaining question is: What do you think about having predict() return the
cluster centers
Github user tgaloppo commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-67550199
Sorry, I forgot to comment on this issue. That would be fine with me. The
prediction methods were contributed by @FlytxtRnD , so perhaps we can solicit
their opinion
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-67580723
Here are some results I got using the text8-100 dataset. It's just a local
test (1 worker), but we can do larger-scale tests in the future.
numInstances
Github user tgaloppo commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-67582826
Excellent. 100 features is probably a bit of a stretch for the
algorithm,,, the density at any point (especially with respect to the initial
random gaussians) is going
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22083547
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala ---
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22083563
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,244 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22083551
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala ---
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22083546
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala ---
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22083566
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,244 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22083570
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,244 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22083574
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,244 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22083578
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,244 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-67583382
I agree about 100 features being too big for clustering, but I wanted to
get some sense of scaling w.r.t. features. (It basically makes the matrix
inverses take a
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-67583648
OK, I believe those are my last comments!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user tgaloppo commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22084037
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,244 @@
+/*
+ * Licensed to the Apache
Github user tgaloppo commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22084185
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala ---
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software
Github user tgaloppo commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-67586420
Great! I've pushed the requested changes. I will open a ticket on Jira
about making the MultivariateGaussian more widely applicable.
---
If your project is set up
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092909
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala ---
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092917
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092919
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092921
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092926
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092915
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092908
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala ---
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092933
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092923
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092920
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092918
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092924
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092941
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092938
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092962
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/impl/MultivariateGaussian.scala
---
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092955
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092934
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092952
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092954
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092956
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092927
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092942
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092953
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092935
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092959
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092957
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092963
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/impl/MultivariateGaussian.scala
---
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092960
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092939
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22092937
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache
Github user tgaloppo commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-67387864
I have replaced the accumulators with RDD.aggregate functionality.
I added functionality allowing the user to provide their own initial GMM,
bypassing the random
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22016170
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala ---
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22016173
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala ---
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22016175
--- Diff:
examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala ---
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22016191
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,284 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22016185
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,284 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22016184
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,284 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22016180
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,284 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22016178
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,284 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22016195
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,284 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22016189
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,284 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22016194
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,284 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22016187
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,284 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22016183
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,284 @@
+/*
+ * Licensed to the Apache
Github user tgaloppo commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22017550
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,284 @@
+/*
+ * Licensed to the Apache
Github user tgaloppo commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r22018162
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache
Github user tgaloppo commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-67445452
Working on these changes; still a few left.
Great feedback; really helping to improve my scala!
---
If your project is set up for it, you can reply to this email
Github user tgaloppo commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-67158536
I've merged in the predict() method from @FlytxtRnD
I am working on the changeover from accumulators to RDD.aggregate; I should
have this up soon.
---
If your
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21859900
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala
---
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3022#discussion_r21859898
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
---
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache
1 - 100 of 158 matches
Mail list logo