Re: [scikit-learn] LatentDirichletAllocation failing to find topics in NLTK Gutenberg corpus?

2017-09-18 Thread Andreas Mueller
For very few documents, Gibbs sampling is likely to work better - or rather, Gibbs sampling usually works better given enough runtime, and for so few documents, runtime is not an issue. The length of the documents don't matter, only the size of the vocabulary. Also, hyper parameter choices might

Re: [scikit-learn] LatentDirichletAllocation failing to find topics in NLTK Gutenberg corpus?

2017-09-18 Thread Markus Konrad
Hi Chyi-Kwei, thanks for digging into this. I made similar observations with Gensim when using only a small number of (big) documents. Gensim also uses the Online Variational Bayes approach (Hoffman et al.). So could it be that the Hoffman et al. method is problematic in such scenarios? I found th