The only reasonable quick and dirty test of this sort is to look at the
terms most related to each topic and heuristically assign human tags.

Unless...

Perhaps I misunderstood you in the first place.  If you include tags in the
LDA training, then you can look at distance (aka 1-dot product) in LDA space
between tags versus as a predictor of how often the tags cooccur.
 Alternatively, you can look at dot product between test documents and the
tags that are on the test document.  Then you can define AUC as the
probability that tags that are actually present have higher dot product than
randomly selected tags.  Higher AUC is good.

On Thu, Jan 6, 2011 at 1:03 PM, Neal Richter <[email protected]> wrote:

> I did not intent to propose a theoretically sound way to test LDA as an
> extractor/labeler of human tags.  The intent was simple suggestion towards
> doing a quick-n-dirty test to see what the overlap of LDA extracted topics
> and human tags on a well tagged document set.
>

Reply via email to