Riccardo, You can tune your sentence detector using a custom context generator.
At http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/test/java/opennlp/tools/sentdetect/ take a look at DummySentenceDetectorFactory.java and SentenceDetectorFactoryTest.java If you prefer a concrete example, take a look at an implementation I did for another project: https://github.com/cogroo/cogroo4/tree/master/cogroo-nlp/src/main/java/org/cogroo/tools/sentdetect William On Tue, Mar 26, 2013 at 9:52 AM, Riccardo Tasso <[email protected]>wrote: > Thank you Jörn, in fact the results improved a lot: > Precision: 0.5325131810193322 > Recall: 0.4745497259201253 > F-Measure: 0.5018633540372671 > > I guess the splitter could have better results if it were able to detect > parenthetic structure such as: > some text - speech - other text > which in my dataset is splitted as: > some text > - speech - > other text > Is it possible? > > Another optimization should be the one which could detect symbols to end a > sentence longer than one character, for example "...". > > Can you tell me more about the following parameters? > > - iterations > - cutoff > > Is there any guideline on how tune them? > > Cheers, > Riccardo > > > > 2013/3/26 Jörn Kottmann <[email protected]> > > > On 03/26/2013 08:40 AM, Riccardo Tasso wrote: > > > >> Is the Sentence Detector able to split also on non dot characters? In my > >> case there should be also other characters delimiting the end of a > >> segment, > >> such as: colon (:), dash (-), various kind of quotation marks (", `, ', > >> ...). > >> > > > > The Sentence Detector can only split on end-of-sentence characters, by > > default these > > are . ! ? but with 1.5.3 you can set them during training to your custom > > set, there is > > a command line argument for it on the Sentence Detector Trainer, haver a > > look at the help. > > > > If you don't want to compile yourself use the 1.5.3 RC2 which we are > > currently testing. > > > > Jörn > > > > > > >
