Jork: Yes, but that's if I want to train the model using input.txt and then read it again into output.txt.
What I am trying to do is read out the corpus of sentences that are used to create en-sent.bin, so that I can improve it. Any hints on how can I do that? In other words, I am trying to read http://opennlp.sourceforge.net/models-1.5/en-sent.bin into en-sent.train txt file that I can open and update (and then retrain the sentence splitter). On 3 September 2013 16:19, Jörn Kottmann <[email protected]> wrote: > On 09/03/2013 04:57 PM, Danica Damljanovic wrote: > >> If I may ask one more question about retraining the sentence detector. I >> created a corpus that I want to use for training, but I would rather >> improve on the existing sentence splitter, so this is what I did to get >> the >> initial corpus: >> >> opennlp SentenceDetector en-sent.bin > output.txt >> >> However, although I gave the process 4GB of memory, it seems to be running >> for a while, and the only output I see is: >> >> >> Loading Sentence Detector model ... done (0.031s) >> >> What I expect is to see the list of sentences used for training, so that I >> can merge output.txt with my corpus, and retrain the parser. But after >> more >> than one hour, it still did not start to write into output.txt. >> > > > You need to provide some input text to the Sentence Detector, otherwise it > just waits forever for it, > have a look at the manual for a sample. > > Jörn >
