The numbers should probably slightly lower. It depends a bit on the data
you are processing.

I know of two ways to get very high performance numbers in that range:
- Use very little data which gets tagged perfectly by the model.
- Run the model over the training data.

HTH,
Jörn

On Sat, 2015-02-28 at 02:34 +0000, Richard Head Jr. wrote:
> > Add -misclassified true
> Very handy
> > To evaluate you need a annotated corpus...
> This was my problem. 
> No that I can run it I see measurements of 0.99XXXXX, but I noticed that the 
> better models -as determined by my separate unit tests, which check what was 
> actually classified- have lower measurements.
> According to my test cases this is a very good model: 
> Precision: 0.9905921169966114Recall: 0.9946277476832162F-Measure: 
> 0.9926058304478945 
> While this one is not so great:
> Precision: 0.9951354487436962Recall: 0.9982540179970453F-Measure: 
> 0.9966922939388522
> Am I missing something here? 
> Thanks
>      On Wednesday, February 25, 2015 11:48 PM, William Colen 
> <[email protected]> wrote:
>    
> 
>  Add 
> -misclassified trueto the command to output what was misclassified.But I have 
> a guess. To evaluate you need a annotated corpus. Is the file /tmp/db-raw.txt 
> annotated? It should look like this:<START:person> Pierre Vinken <END> , 61 
> years old , will join the board as a nonexecutive director Nov. 29 .
> Mr . <START:person> Vinken <END> is chairman of Elsevier N.V. , the Dutch 
> publishing group .
> <START:person> Rudolph Agnew <END> , 55 years old and former chairman of 
> Consolidated Gold Fields PLC ,
>     was named a director of this British industrial conglomerate .
> Regards,William2015-02-26 1:24 GMT-03:00 Richard Head Jr. 
> <[email protected]>:
> 
> > Are you using 1.5.3?
> 
> Yes. 
> 
> > Can you send a small sample?
> I can't send the model. Any other options? What format is the file given to 
> the -data option supposed to be in? 
> Thanks
>      On Friday, February 20, 2015 2:14 PM, William Colen 
> <[email protected]> wrote:
> 
> 
>  Are you using 1.5.3? Can you send a small sample?
> 
> Em segunda-feira, 16 de fevereiro de 2015, Richard Head Jr. 
> <[email protected]> escreveu:
> 
> I ran the command line evaluator several times on tokenized/untokenized and 
> large/small input but get no results (see below). The model appears to be 
> finding tokens quite well, I'd just like to evaluate *how* well:
> 
> opennlp TokenNameFinderEvaluator  -data some-data.txt -model a-model.bin
> Loading Token Name Finder model ... done (0.111s)
> 
> 
> Average: 104.2 sent/s
> Total: 15 sent
> Runtime: 0.144s
> 
> Precision: 0.0
> Recall: 0.0
> F-Measure: -1.0
> 
> Now on a larger set of data:
> 
> opennlp TokenNameFinderEvaluator -encoding latin1 -data /tmp/db-raw.txt 
> -model a-model.bin
> Loading Token Name Finder model ... done (0.156s)
> current: 364.9 sent/s avg: 364.9 sent/s total: 366 sent
> current: 427.4 sent/s avg: 396.1 sent/s total: 793 sent
> 
> 
> Average: 477.7 sent/s
> Total: 1434 sent
> Runtime: 3.002s
> 
> Precision: 0.0
> Recall: 0.0
> F-Measure: -1.0
> 
> 
> 
> What am I doing wrong?
> 
> 
> Thanks
> 
> 
> 
> --
> William Colen
> 
> 
>    
> 
> 
> 
>    

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to