Hey Robin,

  Vowpal Wabbit is scalable for numDocs by being a streaming system, and
scalable for numFeatures by using hashing, and for time by being blazingly
fast.

   I'm unfortunately just a novice LDA coder, so my attempts around
deciphering VW's LDA impl (to see if there is anything we can learn from it
which we aren't doing yet) have been... slow.

  One thing we could do is write a streaming form of our current MR LDA, and
see at what scale it actually starts to help.

  -jake

On Jan 3, 2011 9:33 PM, "Robin Anil" <[email protected]> wrote:

Jake, take a look at Vowpal Wabbit 5.0. I saw an incremental LDA
implementation there. Might be scalable

On Tue, Jan 4, 2011 at 6:21 AM, Jake Mannix <[email protected]> wrote: >
Hey all, > > tl;dr ...
> MAHOUT-458 <https://issues.apache.org/jira/browse/MAHOUT-458> among other

> things, which seems to have been closed even though it was never
committed, > nor was its function...
> Wikipedia<http://markmail.org/message/ua5hckybpkj3stdl>),

> this puts an absolute cap on the size of the possible vocabulary (numTerms
> * > numTopics * 8byte...

Reply via email to