Hi Marko, I haven't looked into your case in much detail but one immediate thought is: have you tried the OnlineLDAOptimizer? It's implementation and resulting LDA model (LocalLDAModel) is quite different (doesn't depend on GraphX, assumes the model fits on a single machine) so you may see performance differences.
Feynman On Tue, Sep 15, 2015 at 6:37 AM, Marko Asplund <marko.aspl...@gmail.com> wrote: > > While doing some more testing I noticed that loading the persisted model > from disk (~2 minutes) as well as querying LDA model topic distributions > (~4 seconds for one document) are quite slow operations. > > Our application is querying LDA model topic distribution (for one doc at a > time) as part of end-user operation execution flow, so a ~4 second > execution time is very problematic. Am I using the MLlib LDA API correctly > or is this just reflecting the current performance characteristics of the > LDA implementation? My code can be found here: > > > https://github.com/marko-asplund/tech-protos/blob/master/mllib-lda/src/main/scala/fi/markoa/proto/mllib/LDADemo.scala#L56-L57 > > For what kinds of use cases are people currently using the LDA > implementation? > > > marko >