On Thu, Jan 6, 2011 at 1:33 PM, Neal Richter <[email protected]> wrote:
> > Have you looked at transductive learning (an algorithm within > semi-supervised learning)? > Yes. Variants on this are widely used in fraud modeling. One variant is to simply train on a small labeled set and then use the output of that model to train on a much larger set. This is classic transduction. A second variant is to build the model on training data and use that to simply relabel the training data and build a second model. This is different from the first case in that the original data is re-used with the new labels. This works well when many cases are mis-marked and good regularization on the first and second model allows obviously whacky training data to be excluded. The second model can then be much simpler since it isn't being distracted by the goofy training examples. > IMO it would be very interesting to see what degree a bit of human labeled > data would improve LDA topic extraction. > Definitely interesting. I have no opinion about useful. > > Essentially one can take a large body of unlabeled documents and augment > with a smaller set of labeled documents. Under certain conditions the > addition can greatly boost the accuracy of the assigned labels/topics. > Yes. Definitely could help.
