Re: LDA in Mahout

Ted Dunning Thu, 06 Jan 2011 13:49:46 -0800

On Thu, Jan 6, 2011 at 1:33 PM, Neal Richter <[email protected]> wrote:


>
> Have you looked at transductive learning (an algorithm within
> semi-supervised learning)?
>

Yes.

Variants on this are widely used in fraud modeling.  One variant is to
simply train on a small labeled set and then
use the output of that model to train on a much larger set.  This is classic
transduction.

A second variant is to build the model on training data and use that to
simply relabel the training data and build a second model.  This is
different from the first case in that the original data is re-used with the
new labels.  This works well when many cases are mis-marked and good
regularization on the first and second model allows obviously whacky
training data to be excluded.  The second model can then be much simpler
since it isn't being distracted by the goofy training examples.


> IMO it would be very interesting to see what degree a bit of human labeled
> data would improve LDA topic extraction.
>

Definitely interesting.  I have no opinion about useful.


>
> Essentially one can take a large body of unlabeled documents and augment
> with a smaller set of labeled documents. Under certain conditions the
> addition can greatly boost the accuracy of the assigned labels/topics.
>

Yes.  Definitely could help.

Re: LDA in Mahout

Reply via email to