Hello,
the sentence detector does end-of-sentence character
disambiguation. In your case all end-of-sentence characters
are proper end of sentences.
So it only sees one outcome in your entire corpus. To train
a sentence detector model you need both cases, so it can learn
which are valid sentence ends, and which are not.
The training fails on some internal validation, that should be done
with a nicer error message.
I suggest to not remove the punctuation from your training sentences,
then it should work.
HTH,
Jörn
On 06/19/2012 04:03 PM, Mariya Koleva wrote:
Hi,
I apologise if the question is trivial but I'm not experienced with openNLP
(and not too confident in my Java skills either).
I'm trying to train a sentence detection model for Zulu. No matter whether
I'm using the command line interface or the API, it appears to be training
but a model file is not created. I'm getting the following exception [1]:
java.lang.IllegalArgumentException: The maxent model is not compatible with
the sentence detector!
The original data comes from the Ukwabelana corpus [2] in a text file
(US-ASCII), one sentence per line. It is completely stripped off of
capitalisation and any kind of punctuation. I automatically added a "." at
the end of every sentence, so that there is some EOS token for the program
to pick up.
I would appreciate any insight as to what is to be done!
Mariya
[1] The whole output is:
Indexing events using cutoff of 5
Computing event counts… done. 29424 events
Indexing… done.
Sorting and merging events… done. Reduced 29424 events to 7830.
Done indexing.
Incorporating indexed data for training…
done.
Number of Event Tokens: 7830
Number of Outcomes: 1
Number of Predicates: 1673
…done.
Computing model parameters …
Performing 100 iterations.
1: … loglikelihood=0.0 1.0
2: … loglikelihood=0.0 1.0
Exception in thread “main” java.lang.IllegalArgumentException: The
maxent model is not compatible with the sentence detector!
at
opennlp.tools.util.model.BaseModel.checkArtifactMap(BaseModel.java:275)
at opennlp.tools.sentdetect.SentenceModel.<init>(SentenceModel.java:64)
at
opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:285)
at
opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:296)
at
opennlp.tools.cmdline.sentdetect.SentenceDetectorTrainerTool.run(SentenceDetectorTrainerTool.java:111)
at opennlp.tools.cmdline.CLI.main(CLI.java:191)
[2]
http://www.cs.bris.ac.uk/Research/MachineLearning/Morphology/resources.jsp#corpus