I also wonder what memory limitations it may have as compared to the Mahout implementation. (with regards to number of terms/topics/documents)
Ian >So I tried Yahoo LDA on 52 M documents with 1000 topics. > >Yahoo LDA with a dictionary of 100k terms does 1 iteration every 30 minutes >on a single machine using 4 cores. > >Mahout LDA using 20 nodes and a dictionary of 30k takes more than a day for >an iteration and didn't complete (something about output error during the >reduce step - this may be a CDHbeta3 issue not sure, since reuters clusters >fine). > >Hopefully the ideas from the Yahoo version can be incorporated into the >Mahout LDA. > >On Fri, Jun 10, 2011 at 6:49 AM, Federico Castanedo <[email protected] >> wrote: > >> Hi all, >> >> i got through the referenced paper and seems that besides all the >> distributed tasks the way the inference for \alpha and \beta >> is performed was the key element on improved the LDA trained performance. >> They use SGD for the hyperparameter adjustment of \alpha. >> >> bests, >> Federico >> >> 2011/6/10 Jake Mannix <[email protected]> >> >> > It's all c++, custom distributed processing, custom distributed >> > coordination >> > and storage. >> > >> > We can certainly try to port over the algorithmic ideas, but the >> > distributed >> > systems stuff would be a significant departure from our current setup - >> > it's >> > not a web service and it's not hadoop, and it's not a command line >> utility >> > - >> > it's a cluster of long-running processes all intercommunicating. Sounds >> > awesome, but that's a way's off from where we are now. >> > >> > -jake >> > >> > On Thu, Jun 9, 2011 at 7:52 PM, Stanley Xu <[email protected]> wrote: >> > >> > > Awesome! Guess it would be much faster than then current version in >> > Mahout. >> > > Is that possible to just use this version in mahout? >> > > >> > > On Fri, Jun 10, 2011 at 8:12 AM, <[email protected]> wrote: >> > > >> > > > Yahoo released its hadoop code for LDA >> > > > >> > > > >> > > >> > >> http://blog.smola.org/post/6359713161/speeding-up-latent-dirichlet-allocation >> > > > >> > > > >> > > > >> > > > >> > > > >> > > >> > >>
