I think if nothing matches the model at all each cat will have the same score 
associated.



> On Oct 28, 2014, at 10:03 AM, <[email protected]> wrote:
> 
> Hello All,
> 
> I appreciate the advice. I did try training a larger model (800ish samples, 8 
> categories) and it performed better. Still, if type absolute non-sense like 
> "asdfasdfsadf", the evaluation `opennlp Doccat <model>` must return a 
> category -- so I'd like to be able to programmatically determine some 
> confidence level. Perhaps I can reject all categories in the app if the 
> confidence score is below a threshold? Is that possible right now?
> 
> Thanks again for your help!
> 
> Patrick Baggett
> Online Engineer - Search Team
> e: [email protected]
> p: +1 (214) 202-8964
> 
> -----Original Message-----
> From: Mark G [mailto:[email protected]]
> Sent: Monday, October 27, 2014 7:47 PM
> To: [email protected]
> Subject: Re: Getting started with OpenNLP
> 
> I think you  bring up a good point inadvertently, I have run into this 
> before, my use case was that I wanted a probability that the input text 
> matched my samples for one class...sometimes you just need one.... I ended up 
> just using a simple feature generator and just using a similarity measure. I 
> can see a use case for a fuzzy scorer against a set of samples for only one 
> category. I believe right now in the Doccat if you only have one category you 
> always get a score of 1 for anything you pass in...regardless of how it 
> matches any of the samples simply because it's the only one, which is really 
> not so good.
> 
>> On Mon, Oct 27, 2014 at 6:03 PM, Joern Kottmann <[email protected]> wrote:
>> 
>> On Mon, 2014-10-27 at 19:26 +0000, [email protected]
>> wrote:
>>> So in other words, for this model, there is just one class (in a
>>> more complex example, there would be a number of classes). I trained
>>> the model and did some testing, but everything is classified as "MyClass".
>> 
>> The model can only assign the classes it sees in the training data. If
>> you only have one class in your training data, then that is the only
>> class the model can assign. Actually the model always computes the
>> probability for each class, and many applications then just look for
>> the best class.
>> 
>> We should probably add a warning to the trainer which says that
>> training with only one class doesn't make sense.
>> 
>> I suggest that you try to train with a couple of classes, but at least
>> two.
>> 
>> Here are two tips on how to create a model, maybe they are useful.
>> 
>> - Make sure to use a good amount of training data. You probably need a
>> few hundred samples to get a model that somehow works.
>> 
>> - And to determine how well the model works you should prepare some
>> test data to be able to evaluate on many samples and not just a few
>> hand picked ones. This can be done with the evaluation tool.
>> 
>> HTH,
>> Jörn
> 
> ________________________________
> 
> The information in this Internet Email is confidential and may be legally 
> privileged. It is intended solely for the addressee. Access to this Email by 
> anyone else is unauthorized. If you are not the intended recipient, any 
> disclosure, copying, distribution or any action taken or omitted to be taken 
> in reliance on it, is prohibited and may be unlawful. When addressed to our 
> clients any opinions or advice contained in this Email are subject to the 
> terms and conditions expressed in any applicable governing The Home Depot 
> terms of business or client engagement letter. The Home Depot disclaims all 
> responsibility and liability for the accuracy and content of this attachment 
> and for any damages or losses arising from any inaccuracies, errors, viruses, 
> e.g., worms, trojan horses, etc., or other items of a destructive nature, 
> which may be contained in this attachment and shall not be liable for direct, 
> indirect, consequential or special damages in connection with this e-mail 
> message or its attachment.

Reply via email to