Thank you Jörn, in fact the results improved a lot: Precision: 0.5325131810193322 Recall: 0.4745497259201253 F-Measure: 0.5018633540372671
I guess the splitter could have better results if it were able to detect parenthetic structure such as: some text - speech - other text which in my dataset is splitted as: some text - speech - other text Is it possible? Another optimization should be the one which could detect symbols to end a sentence longer than one character, for example "...". Can you tell me more about the following parameters? - iterations - cutoff Is there any guideline on how tune them? Cheers, Riccardo 2013/3/26 Jörn Kottmann <[email protected]> > On 03/26/2013 08:40 AM, Riccardo Tasso wrote: > >> Is the Sentence Detector able to split also on non dot characters? In my >> case there should be also other characters delimiting the end of a >> segment, >> such as: colon (:), dash (-), various kind of quotation marks (", `, ', >> ...). >> > > The Sentence Detector can only split on end-of-sentence characters, by > default these > are . ! ? but with 1.5.3 you can set them during training to your custom > set, there is > a command line argument for it on the Sentence Detector Trainer, haver a > look at the help. > > If you don't want to compile yourself use the 1.5.3 RC2 which we are > currently testing. > > Jörn > > >
