Github user akopich commented on the pull request:
https://github.com/apache/spark/pull/1269#issuecomment-65223123
@jkbradley, thank you for you comments!
It seems like we should discuss API for this set of models first.
As far as I can understand, you are not about to
Github user akopich commented on the pull request:
https://github.com/apache/spark/pull/1269#issuecomment-66109601
(1) Users implementing their own regularizers
OK. I'd prefer to set all the methods private[mllib] for regularizers.
(2) Regular and Robust in the
Github user akopich commented on the pull request:
https://github.com/apache/spark/pull/1269#issuecomment-66127106
@jkbradley, could you please have a look at logs -- a have no idea why
PySpark tests failed.
---
If your project is set up for it, you can reply to this email and have
Github user akopich commented on the pull request:
https://github.com/apache/spark/pull/1269#issuecomment-66292305
@jkbradley
Tests fail again...
Stab in the dark: looks like something is changed in the testing
environment.
(2) Regular and Robust in the same
Github user akopich commented on the pull request:
https://github.com/apache/spark/pull/1269#issuecomment-66299321
@chazchandler, thank you very much for your quick reply! It did the trick.
Now I'm a bit confused about ml/ folder. What's it for?
---
If your proj
Github user akopich commented on the pull request:
https://github.com/apache/spark/pull/1269#issuecomment-66306112
It seems like something went wrong. I've got multiple compilation errors
like
```
[error]
/home/valerij/contribute/spark/core/src/main/scala/org/a
Github user akopich commented on the pull request:
https://github.com/apache/spark/pull/1269#issuecomment-66321327
@karlhigley, yes I've heard something about abstract classes. Though, I see
no way to employ this concept here.
---
If your project is set up for it, you can rep
Github user akopich commented on the pull request:
https://github.com/apache/spark/pull/1269#issuecomment-66325635
@chazchandler, thank you very much for your help. I shouldn't have rebase
on master. Rebase on 1.2 was successful.
---
If your project is set up for it, you can
Github user akopich commented on the pull request:
https://github.com/apache/spark/pull/1269#issuecomment-66478011
Succeeded at the third attempt.
(5) Enumerator
@jkbradley, as you can see, I moved `Enumerator` to `mllib/features` folder
and renamed it to `TokenIndexer
Github user akopich commented on the pull request:
https://github.com/apache/spark/pull/1269#issuecomment-66498633
(5) Enumerator
BTW, names `TokenIndexer` and `TokenIndex` look confusive (though, these
classes rely on `breeze.util.Index`).
So I renamed it to
Github user akopich commented on the pull request:
https://github.com/apache/spark/pull/1269#issuecomment-66613030
@jkbradley
I moved Dirichlet to mllib/stats and added setters to `TokenEnumerator`.
BTW, why was it decided to use setter instead of constructors? We
Github user akopich commented on a diff in the pull request:
https://github.com/apache/spark/pull/1269#discussion_r22003692
--- Diff: mllib/pom.xml ---
@@ -112,6 +112,11 @@
test-jar
test
+
+colt
--- End diff --
In
Github user akopich commented on the pull request:
https://github.com/apache/spark/pull/1269#issuecomment-67399691
@jkbradley Thank you for explanation about setters.
tm implementation was tested (it was succesfully used in one of my project)
but it was tested with scala 2.11
Github user akopich commented on the pull request:
https://github.com/apache/spark/pull/1269#issuecomment-67410274
``` - filter pushdown - boolean *** FAILED *** (249 milliseconds)```
I have no idea why this could happen. Should I rebase again?
---
If your project is set up
Github user akopich commented on the pull request:
https://github.com/apache/spark/pull/1269#issuecomment-67415235
What do you mean by scaling tests? Tests measuring the dependence of
computation time on numer of machines? Are there scaling tests for GraphX LDA
implementations? Or
GitHub user akopich reopened a pull request:
https://github.com/apache/spark/pull/1269
[SPARK-2199] [mllib] topic modeling
I have implemented Probabilistic Latent Semantic Analysis (PLSA) and Robust
PLSA with support of additive regularization (that actually means that I
Github user akopich closed the pull request at:
https://github.com/apache/spark/pull/1269
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user akopich commented on the pull request:
https://github.com/apache/spark/pull/1269#issuecomment-67493934
How do you compare accuracy? Perplexity means nothing but perplexity --
topic models may be reliably compared only via application task (e.g.
classification
Github user akopich commented on the pull request:
https://github.com/apache/spark/pull/1269#issuecomment-67643630
I've performed sanity check on the dataset i've described above.
PLSA: tm project obtains perplexity of `2358` and this implementation ends
up
Github user akopich commented on the pull request:
https://github.com/apache/spark/pull/1269#issuecomment-67656496
And tests fail again in obscure manner...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user akopich commented on the pull request:
https://github.com/apache/spark/pull/1269#issuecomment-67661902
I've fixed perplexity for robust plsa and updates perplexity value in the
comment above. Now they are almost the same.
---
If your project is set up for it, yo
Github user akopich commented on the pull request:
https://github.com/apache/spark/pull/1269#issuecomment-67664969
By the way. May be it's off top, but this is related to initial
approximation generation.
Suppose, one has `indxs : RDD[Int]` and is about to create an R
Github user akopich commented on the pull request:
https://github.com/apache/spark/pull/1269#issuecomment-78050367
@renchengchang
1. Hi.
2. Don't use code from this PR. Use either LDA (which is merged with mllib)
or https://github.com/akopich/dplsa which is a fu
Github user akopich commented on the pull request:
https://github.com/apache/spark/pull/1269#issuecomment-78184948
@renchengchang
What do you mean by "topic vector"? A vector of p(t|d) \forall t? If so,
you can find these vectors in `RDD[DocumentParameters]` which is r
Github user akopich closed the pull request at:
https://github.com/apache/spark/pull/1269
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
101 - 125 of 125 matches
Mail list logo