Hey Robin, Vowpal Wabbit is scalable for numDocs by being a streaming system, and scalable for numFeatures by using hashing, and for time by being blazingly fast.
I'm unfortunately just a novice LDA coder, so my attempts around deciphering VW's LDA impl (to see if there is anything we can learn from it which we aren't doing yet) have been... slow. One thing we could do is write a streaming form of our current MR LDA, and see at what scale it actually starts to help. -jake On Jan 3, 2011 9:33 PM, "Robin Anil" <[email protected]> wrote: Jake, take a look at Vowpal Wabbit 5.0. I saw an incremental LDA implementation there. Might be scalable On Tue, Jan 4, 2011 at 6:21 AM, Jake Mannix <[email protected]> wrote: > Hey all, > > tl;dr ... > MAHOUT-458 <https://issues.apache.org/jira/browse/MAHOUT-458> among other > things, which seems to have been closed even though it was never committed, > nor was its function... > Wikipedia<http://markmail.org/message/ua5hckybpkj3stdl>), > this puts an absolute cap on the size of the possible vocabulary (numTerms > * > numTopics * 8byte...
