Hi all, i got through the referenced paper and seems that besides all the distributed tasks the way the inference for \alpha and \beta is performed was the key element on improved the LDA trained performance. They use SGD for the hyperparameter adjustment of \alpha.
bests, Federico 2011/6/10 Jake Mannix <[email protected]> > It's all c++, custom distributed processing, custom distributed > coordination > and storage. > > We can certainly try to port over the algorithmic ideas, but the > distributed > systems stuff would be a significant departure from our current setup - > it's > not a web service and it's not hadoop, and it's not a command line utility > - > it's a cluster of long-running processes all intercommunicating. Sounds > awesome, but that's a way's off from where we are now. > > -jake > > On Thu, Jun 9, 2011 at 7:52 PM, Stanley Xu <[email protected]> wrote: > > > Awesome! Guess it would be much faster than then current version in > Mahout. > > Is that possible to just use this version in mahout? > > > > On Fri, Jun 10, 2011 at 8:12 AM, <[email protected]> wrote: > > > > > Yahoo released its hadoop code for LDA > > > > > > > > > http://blog.smola.org/post/6359713161/speeding-up-latent-dirichlet-allocation > > > > > > > > > > > > > > > > > >
