On 01/25/2013 12:42 PM, Sergey Serebryakov wrote:
I have a question related to the algorithm how sentence splitting is done.
I thought that splitting could only be done at one of the EOS characters
('.', "!" and "?"). But I figured out that OpenNLP Sentence Splitter also
splits text at other characters. For instance, the following text "No. 6
(A/54/6/Rev.1) vols" is splitted into  "No. 6 (A/54/6/Rev.1)" and "vols."
sentences. Is it the correct behaviour with respect to trained model or am
I missing something? Thank you.

Thats correct, there is some optimization which keeps "(A/54/6/Rev.1)" together,
you might expect it should be "No. 6 (A/54/6/Rev." and "1) vols".

As far as I know its possible to disable that, but I would have to take a look at the code to see how it works exactly, but I suspect you have a flag somewhere.

Jörn

Reply via email to