Good luck. Only issue I'd be worried about is that English sentence structure can be very complex and even with a well trained chunker (POS) parser, you may still end up with a large number of false positive sentences.

On 12/27/2015 1:27 PM, Carlos A wrote:
Thank you James. I only work with English. I think that training is not the
way to go, instead I am going through sentence structure which is much more
intelligent than using some sort of statistical method.

On Sat, Dec 26, 2015 at 9:00 PM, James Kosin <[email protected]> wrote:

Carlos,

It is possible to train a sentence detector to separate sentences;
however, you will have to provide your own training set.  For the training
set you wouldn't have any punctuation and each sentence would be on a
separate line.
Be warned, you will need a lot of training data in this case due to the
absence of the punctuation.

The harder part will be getting a model to add the proper punctuation.  In
English we have the keywords of:  How, When, Where, Who, What... to help
determine questions.  Other languages use other keys to denote questions,
statements, and expressions in a sentence.

Hopefully, you don't have to work with English; because, most cases it
isn't easy to determine sentence boundaries based on the grammar or word
composition alone.  English is very bad about that.

Good Luck, it sounds like you have an interesting problem.

James Kosin


On 12/25/2015 1:15 AM, Carlos A wrote:

Hello all,

Is there any better way to separate sentences, that have NO punctuation,
with OpenNLP?

The sentence parser will not work in some cases.

In other words, I would like to be able to separate phrases, do some sort
of Sentence Boundary Segmentation Disambiguation on text that are
transcripts which have no punctuation. And then, once sentences are
separated, add the punctuation with a period or a question mark if the
sentence starts as a question.

Something like using the chunker so that I can determine the sentences
based on their NP VP, NP VP NP, and so on.

Thank you.

C.



Reply via email to