For very few documents, Gibbs sampling is likely to work better - or
rather, Gibbs sampling usually works
better given enough runtime, and for so few documents, runtime is not an
issue.
The length of the documents don't matter, only the size of the vocabulary.
Also, hyper parameter choices might
Hi Chyi-Kwei,
thanks for digging into this. I made similar observations with Gensim
when using only a small number of (big) documents. Gensim also uses the
Online Variational Bayes approach (Hoffman et al.). So could it be that
the Hoffman et al. method is problematic in such scenarios? I found th