Model for sentence detection not being created

Mariya Koleva Tue, 19 Jun 2012 07:04:08 -0700

Hi,
I apologise if the question is trivial but I'm not experienced with openNLP
(and not too confident in my Java skills either).

I'm trying to train a sentence detection model for Zulu. No matter whether
I'm using the command line interface or the API, it appears to be training
but a model file is not created. I'm getting the following exception [1]:
java.lang.IllegalArgumentException: The maxent model is not compatible with
the sentence detector!

The original data comes from the Ukwabelana corpus [2] in a text file
(US-ASCII), one sentence per line. It is completely stripped off of
capitalisation and any kind of punctuation. I automatically added a "." at
the end of every sentence, so that there is some EOS token for the program
to pick up.

I would appreciate any insight as to what is to be done!

Mariya

[1] The whole output is:

Indexing events using cutoff of 5

Computing event counts… done. 29424 events
Indexing… done.
Sorting and merging events… done. Reduced 29424 events to 7830.
Done indexing.
Incorporating indexed data for training…
done.

Number of Event Tokens: 7830
Number of Outcomes: 1
Number of Predicates: 1673

…done.

Computing model parameters …
Performing 100 iterations.
1: … loglikelihood=0.0 1.0
2: … loglikelihood=0.0 1.0

Exception in thread “main” java.lang.IllegalArgumentException: The
maxent model is not compatible with the sentence detector!

at
opennlp.tools.util.model.BaseModel.checkArtifactMap(BaseModel.java:275)
at opennlp.tools.sentdetect.SentenceModel.<init>(SentenceModel.java:64)
at
opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:285)
at
opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:296)
at
opennlp.tools.cmdline.sentdetect.SentenceDetectorTrainerTool.run(SentenceDetectorTrainerTool.java:111)
at opennlp.tools.cmdline.CLI.main(CLI.java:191)

[2]
http://www.cs.bris.ac.uk/Research/MachineLearning/Morphology/resources.jsp#corpus

Model for sentence detection not being created

Reply via email to