I agree here. Perplexity is probably the best measure of whether LDA is still capturing the information it needs.
On Thu, Feb 3, 2011 at 8:58 AM, Federico Castanedo <[email protected]>wrote: > Hi, > > Joined a bit late this discussion, but, what about the perplexity measure > as > reported on section 7.1. of Blei's LDA paper. it seems to be the metric > which is commonly used to obtain the best value of "k" (topics) when > training a LDA model. > > bests, > Federico > > 2011/1/4 Jake Mannix <[email protected]> > > > Saying we have hashing is different than saying we know what will happen > to > > an algorithm once its running over hashed features (as the continuing > work > > on our Stochastic SVD demonstrates). > > > > I can certainly try to run LDA over a hashed vector set, but I'm not sure > > what criteria for correctness / quality of the topic model I should use > if > > I > > do. > > > > -jake > > > > On Jan 4, 2011 7:21 AM, "Robin Anil" <[email protected]> wrote: > > > > We already have the second part - the hashing trick. Thanks to Ted, and > he > > has a mechanism to partially reverse engineer the feature as well. You > > might > > be able to drop it directly in the job itself or even vectorize and then > > run > > LDA. > > > > Robin > > > > On Tue, Jan 4, 2011 at 8:44 PM, Jake Mannix <[email protected]> > wrote: > > > > > Hey Robin, > > Vowp... > > >
