Jork: Yes, but that's if I want to train the model using input.txt and then
read it again into output.txt.

What I am trying to do is read out the corpus of sentences that are used to
create en-sent.bin,
so that I can improve it. Any hints on how can I do that?

In other words, I am trying to read
http://opennlp.sourceforge.net/models-1.5/en-sent.bin into en-sent.train
txt file that I can open and update (and then retrain the sentence
splitter).


On 3 September 2013 16:19, Jörn Kottmann <[email protected]> wrote:

> On 09/03/2013 04:57 PM, Danica Damljanovic wrote:
>
>> If I may ask one more question about retraining the sentence detector. I
>> created a corpus that I want to use for training, but I would rather
>> improve on the existing sentence splitter, so this is what I did to get
>> the
>> initial corpus:
>>
>> opennlp SentenceDetector en-sent.bin > output.txt
>>
>> However, although I gave the process 4GB of memory, it seems to be running
>> for a while, and the only output I see is:
>>
>>
>> Loading Sentence Detector model ... done (0.031s)
>>
>> What I expect is to see the list of sentences used for training, so that I
>> can merge output.txt with my corpus, and retrain the parser. But after
>> more
>> than one hour, it still did not start to write into output.txt.
>>
>
>
> You need to provide some input text to the Sentence Detector, otherwise it
> just waits forever for it,
> have a look at the manual for a sample.
>
> Jörn
>

Reply via email to