Re: LDA and evaluating topic number

Stephen Boesch Thu, 07 Dec 2017 00:16:40 -0800

I have been testing on the 20 NewsGroups dataset - which the Spark docs
themselves reference.  I can confirm that perplexity increases and
likelihood decreases as topics increase - and am similarly confused by
these results.


2017-09-28 10:50 GMT-07:00 Cody Buntain <cbunt...@cs.umd.edu>:

> Hi, all!
>
> Is there an example somewhere on using LDA’s logPerplexity()/logLikelihood()
> functions to evaluate topic counts? The existing MLLib LDA examples show
> calling them, but I can’t find any documentation about how to interpret the
> outputs. Graphing the outputs for logs of perplexity and likelihood aren’t
> consistent with what I expected (perplexity increases and likelihood
> decreases as topics increase, which seem odd to me).
>
> An example of what I’m doing is here: http://www.cs.umd.edu/~
> cbuntain/FindTopicK-pyspark-regex.html
>
> Thanks very much in advance! If I can figure this out, I can post example
> code online, so others can see how this process is done.
>
> -Best regards,
> Cody
> _________________
> Cody Buntain, PhD
> Postdoc, @UMD_CS
> Intelligence Community Postdoctoral Fellow
> cbunt...@cs.umd.edu
> www.cs.umd.edu/~cbuntain
>
>

Re: LDA and evaluating topic number

Reply via email to