Hi,
I apologise if the question is trivial but I'm not experienced with openNLP
(and not too confident in my Java skills either).
I'm trying to train a sentence detection model for Zulu. No matter whether
I'm using the command line interface or the API, it appears to be training
but a model file is not created. I'm getting the following exception [1]:
java.lang.IllegalArgumentException: The maxent model is not compatible with
the sentence detector!
The original data comes from the Ukwabelana corpus [2] in a text file
(US-ASCII), one sentence per line. It is completely stripped off of
capitalisation and any kind of punctuation. I automatically added a "." at
the end of every sentence, so that there is some EOS token for the program
to pick up.
I would appreciate any insight as to what is to be done!
Mariya
[1] The whole output is:
Indexing events using cutoff of 5
Computing event counts… done. 29424 events
Indexing… done.
Sorting and merging events… done. Reduced 29424 events to 7830.
Done indexing.
Incorporating indexed data for training…
done.
Number of Event Tokens: 7830
Number of Outcomes: 1
Number of Predicates: 1673
…done.
Computing model parameters …
Performing 100 iterations.
1: … loglikelihood=0.0 1.0
2: … loglikelihood=0.0 1.0
Exception in thread “main” java.lang.IllegalArgumentException: The
maxent model is not compatible with the sentence detector!
at
opennlp.tools.util.model.BaseModel.checkArtifactMap(BaseModel.java:275)
at opennlp.tools.sentdetect.SentenceModel.<init>(SentenceModel.java:64)
at
opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:285)
at
opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:296)
at
opennlp.tools.cmdline.sentdetect.SentenceDetectorTrainerTool.run(SentenceDetectorTrainerTool.java:111)
at opennlp.tools.cmdline.CLI.main(CLI.java:191)
[2]
http://www.cs.bris.ac.uk/Research/MachineLearning/Morphology/resources.jsp#corpus