I think you  bring up a good point inadvertently, I have run into this
before, my use case was that I wanted a probability that the input text
matched my samples for one class...sometimes you just need one.... I ended
up just using a simple feature generator and just using a similarity
measure. I can see a use case for a fuzzy scorer against a set of samples
for only one category. I believe right now in the Doccat if you only have
one category you always get a score of 1 for anything you pass
in...regardless of how it matches any of the samples simply because it's
the only one, which is really not so good.

On Mon, Oct 27, 2014 at 6:03 PM, Joern Kottmann <[email protected]> wrote:

> On Mon, 2014-10-27 at 19:26 +0000, [email protected]
> wrote:
> > So in other words, for this model, there is just one class (in a more
> > complex example, there would be a number of classes). I trained the
> > model and did some testing, but everything is classified as "MyClass".
>
> The model can only assign the classes it sees in the training data. If
> you only have one class in your training data, then that is the only
> class the model can assign. Actually the model always computes the
> probability for each class, and many applications then just look for the
> best class.
>
> We should probably add a warning to the trainer which says that training
> with only one class doesn't make sense.
>
> I suggest that you try to train with a couple of classes, but at least
> two.
>
> Here are two tips on how to create a model, maybe they are useful.
>
> - Make sure to use a good amount of training data. You probably need a
> few hundred samples to get a model that somehow works.
>
> - And to determine how well the model works you should prepare some test
> data to be able to evaluate on many samples and not just a few hand
> picked ones. This can be done with the evaluation tool.
>
> HTH,
> Jörn
>
>

Reply via email to