The very question at hand is how to label the data as "relevant" and "not
relevant" results. The question exists because this is not given, which is
why I would not call this a supervised problem. That may just be semantics,
but the point I wanted to make is that the reasons choosing a random
training set are correct for a supervised learning problem are not reasons
to determine the labels randomly from among the given data. It is a good
idea if you're doing, say, logistic regression. It's not the best way here.
This also seems to reflect the difference between whatever you want to call
this and your garden variety supervised learning problem.

On Sat, Feb 16, 2013 at 11:15 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:

> Sean
>
> I think it is still a supervised learning problem in that there is a
> labelled training data set and an unlabeled test data set.
>
> Learning a ranking doesn't change the basic dichotomy between supervised
> and unsupervised.  It just changes the desired figure of merit.
>

Reply via email to