The numbers should probably slightly lower. It depends a bit on the data you are processing.
I know of two ways to get very high performance numbers in that range: - Use very little data which gets tagged perfectly by the model. - Run the model over the training data. HTH, Jörn On Sat, 2015-02-28 at 02:34 +0000, Richard Head Jr. wrote: > > Add -misclassified true > Very handy > > To evaluate you need a annotated corpus... > This was my problem. > No that I can run it I see measurements of 0.99XXXXX, but I noticed that the > better models -as determined by my separate unit tests, which check what was > actually classified- have lower measurements. > According to my test cases this is a very good model: > Precision: 0.9905921169966114Recall: 0.9946277476832162F-Measure: > 0.9926058304478945 > While this one is not so great: > Precision: 0.9951354487436962Recall: 0.9982540179970453F-Measure: > 0.9966922939388522 > Am I missing something here? > Thanks > On Wednesday, February 25, 2015 11:48 PM, William Colen > <[email protected]> wrote: > > > Add > -misclassified trueto the command to output what was misclassified.But I have > a guess. To evaluate you need a annotated corpus. Is the file /tmp/db-raw.txt > annotated? It should look like this:<START:person> Pierre Vinken <END> , 61 > years old , will join the board as a nonexecutive director Nov. 29 . > Mr . <START:person> Vinken <END> is chairman of Elsevier N.V. , the Dutch > publishing group . > <START:person> Rudolph Agnew <END> , 55 years old and former chairman of > Consolidated Gold Fields PLC , > was named a director of this British industrial conglomerate . > Regards,William2015-02-26 1:24 GMT-03:00 Richard Head Jr. > <[email protected]>: > > > Are you using 1.5.3? > > Yes. > > > Can you send a small sample? > I can't send the model. Any other options? What format is the file given to > the -data option supposed to be in? > Thanks > On Friday, February 20, 2015 2:14 PM, William Colen > <[email protected]> wrote: > > > Are you using 1.5.3? Can you send a small sample? > > Em segunda-feira, 16 de fevereiro de 2015, Richard Head Jr. > <[email protected]> escreveu: > > I ran the command line evaluator several times on tokenized/untokenized and > large/small input but get no results (see below). The model appears to be > finding tokens quite well, I'd just like to evaluate *how* well: > > opennlp TokenNameFinderEvaluator -data some-data.txt -model a-model.bin > Loading Token Name Finder model ... done (0.111s) > > > Average: 104.2 sent/s > Total: 15 sent > Runtime: 0.144s > > Precision: 0.0 > Recall: 0.0 > F-Measure: -1.0 > > Now on a larger set of data: > > opennlp TokenNameFinderEvaluator -encoding latin1 -data /tmp/db-raw.txt > -model a-model.bin > Loading Token Name Finder model ... done (0.156s) > current: 364.9 sent/s avg: 364.9 sent/s total: 366 sent > current: 427.4 sent/s avg: 396.1 sent/s total: 793 sent > > > Average: 477.7 sent/s > Total: 1434 sent > Runtime: 3.002s > > Precision: 0.0 > Recall: 0.0 > F-Measure: -1.0 > > > > What am I doing wrong? > > > Thanks > > > > -- > William Colen > > > > > > >
signature.asc
Description: This is a digitally signed message part
