Hi,
I don't have practical experience with the MLlib LDA implementation, but
regarding the variations in the topic matrix: LDA make use of stochastic
processes. If you use setSeed(seed) with the same value for seed during
initialization, your results should be identical though.
May I ask what
Hi,
We're considering using Spark MLlib (v >= 1.5) LDA implementation for topic
modelling. We plan to train the model using a data set of about 12 M
documents and vocabulary size of 200-300 k items. Documents are relatively
short, typically containing less than 10 words, but the number can range