Re: MLlib LDA implementation questions

2015-09-11 Thread Carsten Schnober
Hi, I don't have practical experience with the MLlib LDA implementation, but regarding the variations in the topic matrix: LDA make use of stochastic processes. If you use setSeed(seed) with the same value for seed during initialization, your results should be identical though. May I ask what

Fwd: MLlib LDA implementation questions

2015-09-11 Thread Marko Asplund
Hi, We're considering using Spark MLlib (v >= 1.5) LDA implementation for topic modelling. We plan to train the model using a data set of about 12 M documents and vocabulary size of 200-300 k items. Documents are relatively short, typically containing less than 10 words, but the number can range