Using 1.5.2. My training data looks like this: Guacamole Dip: 5 Hass <start:term> Avocados <end>, <start:term> Jalapeno <end> Puree with <start:term> Salt <end> and <start:term> BHT <end> (preservative).
Here's the command I'm using: opennlp TokenNameFinderTrainer -encoding UTF-8 -lang en -data terms.train -model terms.bin I found a message on this list acknowledging this as a bug that should have been fixed in 1.5.1: http://www.mail-archive.com/[email protected]/msg00162.html I should also note that the docs and the above message say that entities should be marked using the "<START:xxx> <END>" format. When I use uppercase tags I receive the following error: Computing event counts... java.io.IOException: Found unexpected annotation while handling a name sequence: meal <END>, ###<START:term>### sugar <END>, Incorporating indexed data for training... Exception in thread "main" java.lang.NullPointerException at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263) at opennlp.maxent.GIS.trainModel(GIS.java:256) at opennlp.model.TrainUtil.train(TrainUtil.java:182) at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:360) at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:426) at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:458) at opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNameFinderTrainerTool.java:201) at opennlp.tools.cmdline.CLI.main(CLI.java:191)
