Re: [scikit-learn] Using perplexity from LatentDirichletAllocation for cross validation of Topic Models

2017-10-06 Thread chyi-kwei yau
Hi Markus, I find that in current LDA implementation we included "E[log p(beta | eta) - log q (beta | lambda)]" in the approx bound function and use it to calculate perplexity. But this part was not included in the likelihood function in Blei's C implementation. Maybe this caused some difference.

Re: [scikit-learn] LatentDirichletAllocation failing to find topics in NLTK Gutenberg corpus?

2017-09-17 Thread chyi-kwei yau
Hi Markus, I tried your code and find the issue might be there are only 18 docs in the Gutenberg corpus. if you print out transformed doc topic distribution, you will see a lot of topics are not used. And since there is no words assigned to those topics, the weights will be equal to`topic_word_pri