Re: LDA in Mahout

Ted Dunning Thu, 03 Feb 2011 09:09:11 -0800

I agree here.  Perplexity is probably the best measure of whether LDA is
still capturing the information it needs.


On Thu, Feb 3, 2011 at 8:58 AM, Federico Castanedo <[email protected]>wrote:

> Hi,
>
> Joined a bit late this discussion, but, what about the perplexity measure
> as
> reported on section 7.1. of Blei's LDA paper. it seems to be the metric
> which is commonly used to obtain the best value of "k" (topics) when
> training a LDA model.
>
> bests,
> Federico
>
> 2011/1/4 Jake Mannix <[email protected]>
>
> > Saying we have hashing is different than saying we know what will happen
> to
> > an algorithm once its running over hashed features (as the continuing
> work
> > on our Stochastic SVD demonstrates).
> >
> > I can certainly try to run LDA over a hashed vector set, but I'm not sure
> > what criteria for correctness / quality of the topic model I should use
> if
> > I
> > do.
> >
> >  -jake
> >
> > On Jan 4, 2011 7:21 AM, "Robin Anil" <[email protected]> wrote:
> >
> > We already have the second part - the hashing trick. Thanks to Ted, and
> he
> > has a mechanism to partially reverse engineer the feature as well. You
> > might
> > be able to drop it directly in the job itself or even vectorize and then
> > run
> > LDA.
> >
> > Robin
> >
> > On Tue, Jan 4, 2011 at 8:44 PM, Jake Mannix <[email protected]>
> wrote:
> > >
> > Hey Robin, > > Vowp...
> >
>

Re: LDA in Mahout

Reply via email to