[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-31 Thread tgaloppo
Github user tgaloppo commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-68475553 @jkbradley Please assign me SPARK-5017, and I will take care of this in preparation for 5018 and 5019. --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-31 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-68476266 Done :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-30 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-68398244 @tgaloppo @FlytxtRnD I made some JIRAs for the to-do items above. I'd say the most important are: * [Change predictMembership() to take an RDD, not the

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-30 Thread tgaloppo
Github user tgaloppo commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-68401923 @jkbradley Please assign 5017, 5018, 5019, and 5020 to me. Regarding 5018, can you refer me to other PR's that are bringing in common distributions? I can work toward

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-30 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-68414406 @tgaloppo It's ideal if we assign fix one JIRA at a time (as separate PRs). Can I start by assigning one of your choosing? For 5018, there is only [one

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-30 Thread tgaloppo
Github user tgaloppo commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-68415536 @jkbradley No problem. Let's start with 5020, and I'll move on from there. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-68299685 [Test build #555 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/555/consoleFull) for PR 3022 at commit

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-68307573 [Test build #555 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/555/consoleFull) for PR 3022 at commit

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-29 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-68312524 @tgaloppo Thanks for the updates, and thanks for all of your work in getting this ready! LGTM CC: @mengxr After this is merged, I'll make

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-29 Thread tgaloppo
Github user tgaloppo commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-68313864 @jkbradley Thank you for your help and feedback along the way. Please assign some (or all) of those tickets to me and I will continue to improve the implementation.

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-29 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3022 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-29 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-68315285 @tgaloppo I've merged this into master. Thanks for contributing GMM! --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-29 Thread FlytxtRnD
Github user FlytxtRnD commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-68335194 @tgaloppo Good Work @mengxr Thanks for giving us a chance to be a part of this contribution --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-22 Thread FlytxtRnD
Github user FlytxtRnD commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-67816287 Sorry for late reply.predictLabels() and predictMembership() looks fine.But what about moving the computeSoftAssignments() to GaussianMixtureModelEM class(in KMeans,

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-22 Thread FlytxtRnD
Github user FlytxtRnD commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22163213 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala --- @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-22 Thread FlytxtRnD
Github user FlytxtRnD commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22163250 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala --- @@ -0,0 +1,93 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22184915 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22185023 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22185641 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,242 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-22 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-67880399 @tgaloppo MLUtils.EPSILON is actually private[util]. I think it would be fine to change it to be private[mllib]. CC: @mengxr @tgaloppo I strongly recommend

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-22 Thread tgaloppo
Github user tgaloppo commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-67885369 Ok. I changed the privacy of EPSILON and am now using it in this code. I changed the name from GaussianMixtureModelEM to GaussianMixtureEM. I've changed

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-19 Thread tgaloppo
Github user tgaloppo commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22136408 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-19 Thread tgaloppo
Github user tgaloppo commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22136877 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-19 Thread tgaloppo
Github user tgaloppo commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-67717252 I've performed most of the requested changes. I do not see the BLAS function mentioned (dsyr), so I left this as a TODO. Also, I could not find EPSILON in MLUtils.

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread tgaloppo
Github user tgaloppo commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-67486366 Ok, I have addressed (I think) all of those issues, with the exception of modifying GaussianMixtureModel to carry instances of MultivariateGaussian. I do like that

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22058276 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,284 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22058461 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala --- @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-67527000 OK, that sounds good. Feel free to make a JIRA for that issue. Thanks for the updates! I'll take a look. --- If your project is set up for it, you can reply to

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread tgaloppo
Github user tgaloppo commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22059411 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,284 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22061331 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/impl/MultivariateGaussian.scala --- @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22061399 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/impl/MultivariateGaussian.scala --- @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread tgaloppo
Github user tgaloppo commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22066566 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/impl/MultivariateGaussian.scala --- @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22067710 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-67549445 @tgaloppo Thanks for the updates! It looks quite good to me. My main remaining question is: What do you think about having predict() return the cluster centers

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread tgaloppo
Github user tgaloppo commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-67550199 Sorry, I forgot to comment on this issue. That would be fine with me. The prediction methods were contributed by @FlytxtRnD , so perhaps we can solicit their opinion

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-67580723 Here are some results I got using the text8-100 dataset. It's just a local test (1 worker), but we can do larger-scale tests in the future. numInstances

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread tgaloppo
Github user tgaloppo commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-67582826 Excellent. 100 features is probably a bit of a stretch for the algorithm,,, the density at any point (especially with respect to the initial random gaussians) is going

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22083547 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala --- @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22083563 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,244 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22083551 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala --- @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22083546 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala --- @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22083566 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,244 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22083570 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,244 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22083574 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,244 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22083578 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,244 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-67583382 I agree about 100 features being too big for clustering, but I wanted to get some sense of scaling w.r.t. features. (It basically makes the matrix inverses take a

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-67583648 OK, I believe those are my last comments! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread tgaloppo
Github user tgaloppo commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22084037 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,244 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread tgaloppo
Github user tgaloppo commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22084185 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala --- @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread tgaloppo
Github user tgaloppo commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-67586420 Great! I've pushed the requested changes. I will open a ticket on Jira about making the MultivariateGaussian more widely applicable. --- If your project is set up

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092909 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala --- @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092917 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092919 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092921 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092926 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092915 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala --- @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092908 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala --- @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092933 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092923 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092920 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092918 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092924 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092941 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092938 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092962 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/impl/MultivariateGaussian.scala --- @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092955 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092934 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092952 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092954 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092956 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092927 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092942 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092953 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092935 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092959 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092957 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092963 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/impl/MultivariateGaussian.scala --- @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092960 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092939 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22092937 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-17 Thread tgaloppo
Github user tgaloppo commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-67387864 I have replaced the accumulators with RDD.aggregate functionality. I added functionality allowing the user to provide their own initial GMM, bypassing the random

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22016170 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala --- @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22016173 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala --- @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22016175 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/DenseGmmEM.scala --- @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22016191 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,284 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22016185 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,284 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22016184 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,284 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22016180 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,284 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22016178 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,284 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22016195 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,284 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22016189 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,284 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22016194 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,284 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22016187 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,284 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22016183 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,284 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-17 Thread tgaloppo
Github user tgaloppo commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22017550 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,284 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-17 Thread tgaloppo
Github user tgaloppo commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r22018162 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala --- @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-17 Thread tgaloppo
Github user tgaloppo commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-67445452 Working on these changes; still a few left. Great feedback; really helping to improve my scala! --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-16 Thread tgaloppo
Github user tgaloppo commented on the pull request: https://github.com/apache/spark/pull/3022#issuecomment-67158536 I've merged in the predict() method from @FlytxtRnD I am working on the changeover from accumulators to RDD.aggregate; I should have this up soon. --- If your

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21859900 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModelEM.scala --- @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-12-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3022#discussion_r21859898 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala --- @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache

  1   2   >