Jorn, all, If I may ask one more question about retraining the sentence detector. I created a corpus that I want to use for training, but I would rather improve on the existing sentence splitter, so this is what I did to get the initial corpus:
opennlp SentenceDetector en-sent.bin > output.txt However, although I gave the process 4GB of memory, it seems to be running for a while, and the only output I see is: Loading Sentence Detector model ... done (0.031s) What I expect is to see the list of sentences used for training, so that I can merge output.txt with my corpus, and retrain the parser. But after more than one hour, it still did not start to write into output.txt. Is this the right way to go about it? If so, is it normal for this to take a long time or is there something else I need to do? Thanks Danica On 3 September 2013 12:15, Danica Damljanovic <[email protected]>wrote: > Thanks Jorn, that solved the problem! > > > On 2 September 2013 19:32, Jörn Kottmann <[email protected]> wrote: > >> The error seems to occur when you don't have enough training data (and >> train with cutoff 5). Try to train with >> more data. >> >> Jörn >> >> >> >> On 09/02/2013 07:10 PM, Danica Damljanovic wrote: >> >>> Hi everyone >>> >>> I am trying to retrain the Sentence Detector, however, I keep getting an >>> exception. I get the same result with the command line and >>> programmatically. Below is the command I ran and the output. >>> >>> I use apache-opennlp-1.5.3, Mac OS Lion, and I tried with this sample >>> text >>> I found in one of the online tutorials: >>> >>> "Being at the polls was just like being at church. >>> I didn't smell a drop of liquor, and we didn't have a bit of trouble. >>> The campaign leading to the election was not so quiet. >>> It was marked by controversy, anonymous midnight phone calls and veiled >>> threats of violence. >>> During the election campaign, both candidates, Davis and Bush, reportedly >>> received anonymous telephone calls. >>> Ordinary Williams said he , too , was subjected to anonymous calls soon >>> after he scheduled the election. >>> Many local citizens feared that there would be irregularities at the >>> polls. >>> Williams got himself a permit to carry a gun and promised an orderly >>> election. >>> He attended New York University before switching to Georgetown University >>> in Washington." >>> >>> >>> Any hint much appreciated. >>> >>> bin/opennlp SentenceDetectorTrainer -encoding UTF-8 -lang en -data >>> en-sent.train -model en-sent.bin >>> Indexing events using cutoff of 5 >>> >>> Computing event counts... done. 9 events >>> Indexing... done. >>> Sorting and merging events... done. Reduced 9 events to 2. >>> Done indexing. >>> Incorporating indexed data for training... >>> done. >>> Number of Event Tokens: 2 >>> Number of Outcomes: 1 >>> Number of Predicates: 4 >>> ...done. >>> Computing model parameters ... >>> Performing 100 iterations. >>> 1: ... loglikelihood=0.0 1.0 >>> 2: ... loglikelihood=0.0 1.0 >>> Exception in thread "main" java.lang.**IllegalArgumentException: >>> opennlp.tools.util.**InvalidFormatException: The maxent model is not >>> compatible with the sentence detector! >>> at opennlp.tools.util.model.**BaseModel.checkArtifactMap(** >>> BaseModel.java:476) >>> at opennlp.tools.sentdetect.**SentenceModel.<init>(** >>> SentenceModel.java:54) >>> at >>> opennlp.tools.sentdetect.**SentenceDetectorME.train(** >>> SentenceDetectorME.java:315) >>> at >>> opennlp.tools.cmdline.**sentdetect.**SentenceDetectorTrainerTool.**run(* >>> *SentenceDetectorTrainerTool.**java:88) >>> at opennlp.tools.cmdline.CLI.**main(CLI.java:222) >>> Caused by: opennlp.tools.util.**InvalidFormatException: The maxent >>> model is >>> not compatible with the sentence detector! >>> at >>> opennlp.tools.sentdetect.**SentenceModel.**validateArtifactMap(** >>> SentenceModel.java:117) >>> at opennlp.tools.util.model.**BaseModel.checkArtifactMap(** >>> BaseModel.java:474) >>> ... 4 more >>> >>> Thanks in advance! >>> Danica >>> >>> >> >
