Using 1.5.2. My training data looks like this: 

Guacamole Dip: 5 Hass <start:term> Avocados <end>, <start:term>
Jalapeno <end> Puree with <start:term> Salt <end> and <start:term> BHT <end> 
(preservative).

Here's the command I'm using: 

opennlp TokenNameFinderTrainer -encoding UTF-8 -lang en -data terms.train 
-model terms.bin

I found a message on this list acknowledging this as a bug that should have 
been fixed in 1.5.1: 
http://www.mail-archive.com/[email protected]/msg00162.html

I should also note that the docs and the above message say that entities should 
be marked using the "<START:xxx> <END>" format. When I use uppercase tags I 
receive the following error: 

Computing event counts...  java.io.IOException: Found unexpected annotation 
while handling a name sequence: meal <END>, ###<START:term>### sugar <END>,
Incorporating indexed data for training...  
Exception in thread "main" java.lang.NullPointerException
        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
        at opennlp.maxent.GIS.trainModel(GIS.java:256)
        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
        at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:360)
        at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:426)
        at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:458)
        at 
opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNameFinderTrainerTool.java:201)
        at opennlp.tools.cmdline.CLI.main(CLI.java:191)

   

Reply via email to