On 02/08/2013 06:46 PM, Surendra wrote:
Hi,
I am a post graduate student in computer science. I am working on sentence 
boundary detection of local Indian language. Could you please provide me the 
format of the train file and a sample file like en-sent.train which will be 
help full for me to create model.



The sentence detector training data to train the en-sent.bin model is not Open Source. The easiest way to get training data is to get a corpus and just extract the sentences for the training, there are a couple of freely or cheaply available corpora which could be used. Some are already supported by OpenNLP, have a look at the manual.

Jörn

Reply via email to