Thank you James. I only work with English. I think that training is not the way to go, instead I am going through sentence structure which is much more intelligent than using some sort of statistical method.
On Sat, Dec 26, 2015 at 9:00 PM, James Kosin <[email protected]> wrote: > Carlos, > > It is possible to train a sentence detector to separate sentences; > however, you will have to provide your own training set. For the training > set you wouldn't have any punctuation and each sentence would be on a > separate line. > Be warned, you will need a lot of training data in this case due to the > absence of the punctuation. > > The harder part will be getting a model to add the proper punctuation. In > English we have the keywords of: How, When, Where, Who, What... to help > determine questions. Other languages use other keys to denote questions, > statements, and expressions in a sentence. > > Hopefully, you don't have to work with English; because, most cases it > isn't easy to determine sentence boundaries based on the grammar or word > composition alone. English is very bad about that. > > Good Luck, it sounds like you have an interesting problem. > > James Kosin > > > On 12/25/2015 1:15 AM, Carlos A wrote: > >> Hello all, >> >> Is there any better way to separate sentences, that have NO punctuation, >> with OpenNLP? >> >> The sentence parser will not work in some cases. >> >> In other words, I would like to be able to separate phrases, do some sort >> of Sentence Boundary Segmentation Disambiguation on text that are >> transcripts which have no punctuation. And then, once sentences are >> separated, add the punctuation with a period or a question mark if the >> sentence starts as a question. >> >> Something like using the chunker so that I can determine the sentences >> based on their NP VP, NP VP NP, and so on. >> >> Thank you. >> >> C. >> >> >
