Richard,
The problem is the ',' after then <END> tag.
<START:term> Avocados <END> , ....
The error is because <END>, is not an <END> token with the ',' butted
against it.
Lower case may seem to work; but, then you don't have any tokens... and
thereby no data to train.
James
On 4/22/2013 8:56 PM, Richard Head Jr. wrote:
Using 1.5.2. My training data looks like this:
Guacamole Dip: 5 Hass <start:term> Avocados <end>, <start:term>
Jalapeno <end> Puree with <start:term> Salt <end> and <start:term> BHT <end>
(preservative).
Here's the command I'm using:
opennlp TokenNameFinderTrainer -encoding UTF-8 -lang en -data terms.train
-model terms.bin
I found a message on this list acknowledging this as a bug that should have
been fixed in 1.5.1:
http://www.mail-archive.com/[email protected]/msg00162.html
I should also note that the docs and the above message say that entities should be marked using the
"<START:xxx> <END>" format. When I use uppercase tags I receive the following
error:
Computing event counts... java.io.IOException: Found unexpected annotation while handling a name
sequence: meal <END>, ###<START:term>### sugar <END>,
Incorporating indexed data for training...
Exception in thread "main" java.lang.NullPointerException
at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
at opennlp.maxent.GIS.trainModel(GIS.java:256)
at opennlp.model.TrainUtil.train(TrainUtil.java:182)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:360)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:426)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:458)
at
opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNameFinderTrainerTool.java:201)
at opennlp.tools.cmdline.CLI.main(CLI.java:191)