On 01/25/2013 12:42 PM, Sergey Serebryakov wrote:
I have a question related to the algorithm how sentence splitting is done.
I thought that splitting could only be done at one of the EOS characters
('.', "!" and "?"). But I figured out that OpenNLP Sentence Splitter also
splits text at other characters. For instance, the following text "No. 6
(A/54/6/Rev.1) vols" is splitted into "No. 6 (A/54/6/Rev.1)" and "vols."
sentences. Is it the correct behaviour with respect to trained model or am
I missing something? Thank you.
Thats correct, there is some optimization which keeps "(A/54/6/Rev.1)"
together,
you might expect it should be "No. 6 (A/54/6/Rev." and "1) vols".
As far as I know its possible to disable that, but I would have to take
a look
at the code to see how it works exactly, but I suspect you have a flag
somewhere.
Jörn