Hello All,

I appreciate the advice. I did try training a larger model (800ish samples, 8 
categories) and it performed better. Still, if type absolute non-sense like 
"asdfasdfsadf", the evaluation `opennlp Doccat <model>` must return a category 
-- so I'd like to be able to programmatically determine some confidence level. 
Perhaps I can reject all categories in the app if the confidence score is below 
a threshold? Is that possible right now?

Thanks again for your help!

Patrick Baggett
Online Engineer - Search Team
e: [email protected]
p: +1 (214) 202-8964

-----Original Message-----
From: Mark G [mailto:[email protected]]
Sent: Monday, October 27, 2014 7:47 PM
To: [email protected]
Subject: Re: Getting started with OpenNLP

I think you  bring up a good point inadvertently, I have run into this before, 
my use case was that I wanted a probability that the input text matched my 
samples for one class...sometimes you just need one.... I ended up just using a 
simple feature generator and just using a similarity measure. I can see a use 
case for a fuzzy scorer against a set of samples for only one category. I 
believe right now in the Doccat if you only have one category you always get a 
score of 1 for anything you pass in...regardless of how it matches any of the 
samples simply because it's the only one, which is really not so good.

On Mon, Oct 27, 2014 at 6:03 PM, Joern Kottmann <[email protected]> wrote:

> On Mon, 2014-10-27 at 19:26 +0000, [email protected]
> wrote:
> > So in other words, for this model, there is just one class (in a
> > more complex example, there would be a number of classes). I trained
> > the model and did some testing, but everything is classified as "MyClass".
>
> The model can only assign the classes it sees in the training data. If
> you only have one class in your training data, then that is the only
> class the model can assign. Actually the model always computes the
> probability for each class, and many applications then just look for
> the best class.
>
> We should probably add a warning to the trainer which says that
> training with only one class doesn't make sense.
>
> I suggest that you try to train with a couple of classes, but at least
> two.
>
> Here are two tips on how to create a model, maybe they are useful.
>
> - Make sure to use a good amount of training data. You probably need a
> few hundred samples to get a model that somehow works.
>
> - And to determine how well the model works you should prepare some
> test data to be able to evaluate on many samples and not just a few
> hand picked ones. This can be done with the evaluation tool.
>
> HTH,
> Jörn
>
>

________________________________

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

Reply via email to