[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2015-03-10 Thread akopich
Github user akopich closed the pull request at: https://github.com/apache/spark/pull/1269 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is ena

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2015-03-10 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-78190574 @akopich Since this is no longer an active PR, could you please close it? It was very helpful to have this PR as a major basis for the initial LDA PR. If you

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2015-03-10 Thread akopich
Github user akopich commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-78184948 @renchengchang What do you mean by "topic vector"? A vector of p(t|d) \forall t? If so, you can find these vectors in `RDD[DocumentParameters]` which is returned by

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2015-03-10 Thread renchengchang
Github user renchengchang commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-78184357 Thanks. I have a question: if there is not document id ,how can I know the relation between topic vector and raw text? 发件人: Avanesov Valeriy [

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2015-03-10 Thread akopich
Github user akopich commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-78050367 @renchengchang 1. Hi. 2. Don't use code from this PR. Use either LDA (which is merged with mllib) or https://github.com/akopich/dplsa which is a further developme

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2015-03-10 Thread renchengchang
Github user renchengchang commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-78023208 @akopich how to assign document id? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2015-03-10 Thread renchengchang
Github user renchengchang commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-78023104 how to assign document id? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2015-02-02 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-72565428 @IlyaKozlov Would you like your email included in the git commit for the initial LDA PR? If so, please let me (or @mengxr ) know ASAP. Thanks! --- If your project i

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2015-01-13 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-69802157 @akopich We'll make sure to do that. Thanks for letting us know. --- If your project is set up for it, you can reply to this email and have your reply appear on GitH

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2015-01-12 Thread akopich
Github user akopich commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-69560296 @jkbradley, @mengxr, please, include @IlyaKozlov as author too. He's helped a lot with the implementation. --- If your project is set up for it, you can repl

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2015-01-08 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-69236610 @akopich I had hoped to get this into MLlib, but after more consideration, I believe it is too complex. Currently, what would be ideal is a simple implementation of L

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-30 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-68401156 @akopich The right way to do pseudo-randomness is to do: ``` val randomSeed = ... // if you want to pass in a seed indx.mapPartitionsWithIndex { case (partiti

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-67673044 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-67673031 [Test build #24650 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24650/consoleFull) for PR 1269 at commit [`e0fcc6f`](https://gith

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-19 Thread akopich
Github user akopich commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-67664969 By the way. May be it's off top, but this is related to initial approximation generation. Suppose, one has `indxs : RDD[Int]` and is about to create an RDD of ran

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-67662012 [Test build #24650 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24650/consoleFull) for PR 1269 at commit [`e0fcc6f`](https://githu

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-19 Thread akopich
Github user akopich commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-67661902 I've fixed perplexity for robust plsa and updates perplexity value in the comment above. Now they are almost the same. --- If your project is set up for it, you can rep

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-19 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-67659544 @akopich I've filed a JIRA to investigate that test failure, since it looks like a flaky streaming test: https://issues.apache.org/jira/browse/SPARK-4905 --- If your p

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-19 Thread akopich
Github user akopich commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-67656496 And tests fail again in obscure manner... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-67656167 [Test build #24647 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24647/consoleFull) for PR 1269 at commit [`0d7469b`](https://gith

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-67656178 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-67647224 [Test build #24647 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24647/consoleFull) for PR 1269 at commit [`0d7469b`](https://githu

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-19 Thread akopich
Github user akopich commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-67643630 I've performed sanity check on the dataset i've described above. PLSA: tm project obtains perplexity of `2358` and this implementation ends up with `2311`.

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-18 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-67535645 Yes, "accuracy" meant some kind of metric like perplexity. I agree perplexity does not correlate exactly with human perception, but it's as good as it gets (assuming n

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-18 Thread akopich
Github user akopich commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-67493934 How do you compare accuracy? Perplexity means nothing but perplexity -- topic models may be reliably compared only via application task (e.g. classification, recommendati

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-17 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-67427124 I've been looking at the various topic modeling PRs (3 currently) to try to get a sense of how they compare in terms of accuracy and speed. By "scaling," I really mean

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-17 Thread akopich
Github user akopich closed the pull request at: https://github.com/apache/spark/pull/1269 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is ena

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-17 Thread akopich
GitHub user akopich reopened a pull request: https://github.com/apache/spark/pull/1269 [SPARK-2199] [mllib] topic modeling I have implemented Probabilistic Latent Semantic Analysis (PLSA) and Robust PLSA with support of additive regularization (that actually means that I've impleme

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-17 Thread akopich
Github user akopich commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-67415235 What do you mean by scaling tests? Tests measuring the dependence of computation time on numer of machines? Are there scaling tests for GraphX LDA implementations? Or sho

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-17 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-67414223 @akopich I don't think you need to add the tm project to Spark, but a comparison of perplexity would be a good sanity check. Other implementations might be better for

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-17 Thread akopich
Github user akopich commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-67410274 ``` - filter pushdown - boolean *** FAILED *** (249 milliseconds)``` I have no idea why this could happen. Should I rebase again? --- If your project is set up

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-67409456 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-67409452 [Test build #24555 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24555/consoleFull) for PR 1269 at commit [`b6f852e`](https://gith

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-17 Thread akopich
Github user akopich commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-67399691 @jkbradley Thank you for explanation about setters. tm implementation was tested (it was succesfully used in one of my project) but it was tested with scala 2.11,

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-67398235 [Test build #24555 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24555/consoleFull) for PR 1269 at commit [`b6f852e`](https://githu

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-17 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/1269#discussion_r22003887 --- Diff: mllib/pom.xml --- @@ -112,6 +112,11 @@ test-jar test + +colt --- End diff -- Use comm

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-17 Thread akopich
Github user akopich commented on a diff in the pull request: https://github.com/apache/spark/pull/1269#discussion_r22003692 --- Diff: mllib/pom.xml --- @@ -112,6 +112,11 @@ test-jar test + +colt --- End diff -- In Diri

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/1269#discussion_r22003577 --- Diff: mllib/pom.xml --- @@ -112,6 +112,11 @@ test-jar test + +colt --- End diff -- Where

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-15 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-67083325 @akopich Thanks for the updates! (Much easier to see the diff now) The decision about setters vs. constructor arguments was from [this JIRA (design doc linke

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66642585 [Test build #24371 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24371/consoleFull) for PR 1269 at commit [`0764aaa`](https://gith

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66642598 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66629318 [Test build #24371 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24371/consoleFull) for PR 1269 at commit [`0764aaa`](https://githu

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66627541 [Test build #24369 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24369/consoleFull) for PR 1269 at commit [`c54afc9`](https://gith

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66627555 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66619759 [Test build #24369 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24369/consoleFull) for PR 1269 at commit [`c54afc9`](https://githu

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66617936 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66617927 [Test build #24367 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24367/consoleFull) for PR 1269 at commit [`8e953e7`](https://gith

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66616637 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66616629 [Test build #24365 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24365/consoleFull) for PR 1269 at commit [`4ac42d1`](https://gith

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread akopich
Github user akopich commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66613030 @jkbradley I moved Dirichlet to mllib/stats and added setters to `TokenEnumerator`. BTW, why was it decided to use setter instead of constructors? We can

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66612337 [Test build #24367 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24367/consoleFull) for PR 1269 at commit [`8e953e7`](https://githu

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66611223 [Test build #24366 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24366/consoleFull) for PR 1269 at commit [`e5f4a7b`](https://gith

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66611226 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66611133 [Test build #24366 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24366/consoleFull) for PR 1269 at commit [`e5f4a7b`](https://githu

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66608522 [Test build #24365 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24365/consoleFull) for PR 1269 at commit [`4ac42d1`](https://githu

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-10 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66529072 @akopich Thanks for the updates. It looks like rebasing did not work correctly (looking at the 10K+ lines in this PR!). It should be possible to fix with rebase + co

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66517421 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66517414 [Test build #24316 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24316/consoleFull) for PR 1269 at commit [`0a0a1ca`](https://gith

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66500637 [Test build #24316 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24316/consoleFull) for PR 1269 at commit [`0a0a1ca`](https://githu

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66499176 [Test build #24315 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24315/consoleFull) for PR 1269 at commit [`cd22b86`](https://gith

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66499180 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66498980 [Test build #24315 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24315/consoleFull) for PR 1269 at commit [`cd22b86`](https://githu

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-10 Thread akopich
Github user akopich commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66498633 (5) Enumerator BTW, names `TokenIndexer` and `TokenIndex` look confusive (though, these classes rely on `breeze.util.Index`). So I renamed it to `TokenE

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-10 Thread akopich
Github user akopich commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66478011 Succeeded at the third attempt. (5) Enumerator @jkbradley, as you can see, I moved `Enumerator` to `mllib/features` folder and renamed it to `TokenIndexer`.

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66475387 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66475370 [Test build #24311 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24311/consoleFull) for PR 1269 at commit [`b3f7a0d`](https://gith

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66458603 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66458584 [Test build #24309 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24309/consoleFull) for PR 1269 at commit [`af9bcc8`](https://gith

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66456752 [Test build #24311 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24311/consoleFull) for PR 1269 at commit [`b3f7a0d`](https://githu

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66450671 [Test build #24308 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24308/consoleFull) for PR 1269 at commit [`7f9b7c3`](https://gith

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66450681 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66448735 [Test build #24309 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24309/consoleFull) for PR 1269 at commit [`af9bcc8`](https://githu

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66442416 [Test build #24308 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24308/consoleFull) for PR 1269 at commit [`7f9b7c3`](https://githu

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-09 Thread akopich
Github user akopich commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66325635 @chazchandler, thank you very much for your help. I shouldn't have rebase on master. Rebase on 1.2 was successful. --- If your project is set up for it, you can reply t

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-09 Thread akopich
Github user akopich commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66321327 @karlhigley, yes I've heard something about abstract classes. Though, I see no way to employ this concept here. --- If your project is set up for it, you can reply to t

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-09 Thread chazchandler
Github user chazchandler commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66311502 @akopich , rebasing can be tricky, especially if you've been off on a branch for a while. `git reflog` can be helpful in getting back to previous states if you end

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-09 Thread karlhigley
Github user karlhigley commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66309244 Re: > (2) Regular and Robust in the same class >It's possible to implement, but I don't want to turn class hierarchy inside out. It just violates OOP principles

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-09 Thread akopich
Github user akopich commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66306112 It seems like something went wrong. I've got multiple compilation errors like ``` [error] /home/valerij/contribute/spark/core/src/main/scala/org/apache/spar

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-09 Thread akopich
Github user akopich commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66299321 @chazchandler, thank you very much for your quick reply! It did the trick. Now I'm a bit confused about ml/ folder. What's it for? --- If your project is set u

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-09 Thread chazchandler
Github user chazchandler commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66296113 re: (5) you can add a remote: `git remote add upstream https://github.com/apache/spark.git` fetch the latest state: `git fetch upstream`

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-09 Thread akopich
Github user akopich commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66292305 @jkbradley Tests fail again... Stab in the dark: looks like something is changed in the testing environment. (2) Regular and Robust in the same cl

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66219789 QA results for PR 1269:- This patch FAILED unit tests.For more information see test ouptut:https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/542/conso

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66211199 QA tests have started for PR 1269. This patch DID NOT merge cleanly! View progress: https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/542/consoleFull

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-08 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66210825 @akopich The test failure seems unrelated (from a Python SQL test). I'll re-run the tests. (2) Regular and Robust in the same class Would i

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-08 Thread akopich
Github user akopich commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66127106 @jkbradley, could you please have a look at logs -- a have no idea why PySpark tests failed. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66126247 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66126236 QA results for PR 1269:- This patch FAILED unit tests.For more information see test ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24223/consol

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66114412 QA tests have started for PR 1269. This patch DID NOT merge cleanly! View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24223/consoleFull

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66113681 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66113672 [Test build #24222 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24222/consoleFull) for PR 1269 at commit [`4a4a4f8`](https://gith

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66112914 [Test build #24222 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24222/consoleFull) for PR 1269 at commit [`4a4a4f8`](https://githu

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66112025 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66112020 [Test build #24221 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24221/consoleFull) for PR 1269 at commit [`24b11a5`](https://gith

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66111941 [Test build #24221 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24221/consoleFull) for PR 1269 at commit [`24b11a5`](https://githu

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-08 Thread akopich
Github user akopich commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66109601 (1) Users implementing their own regularizers OK. I'd prefer to set all the methods private[mllib] for regularizers. (2) Regular and Robust in the same c

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-04 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-65682412 @akopich Thanks for the responses! Follow-ups: (1) Users implementing their own regularizers You're right that this would be nice to have. If we incl

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-02 Thread akopich
Github user akopich commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-65223123 @jkbradley, thank you for you comments! It seems like we should discuss API for this set of models first. As far as I can understand, you are not about to pro

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-11-06 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-62070537 @akopich This would be a great set of models to add to MLlib! I wanted to follow up about some comments I saw before, and see if we can get this PR moving again.

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-10-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-58441096 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/2

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-10-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-58441093 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21486/consoleFull) for PR 1269 at commit [`4d36c74`](https://github.com/a

  1   2   >