Hello, the sentence detector only considers EOS chars as potential sentence boundaries, it should not be difficult to extend/modify it so that locations detected by user code are used for the split decision.
The iterations specify the maximum number of iterations for an iterative machine learning algorithm, and cutoff removes features which did not occur at least n times in the training data. Jörn On 03/26/2013 01:52 PM, Riccardo Tasso wrote:
Thank you Jörn, in fact the results improved a lot: Precision: 0.5325131810193322 Recall: 0.4745497259201253 F-Measure: 0.5018633540372671 I guess the splitter could have better results if it were able to detect parenthetic structure such as: some text - speech - other text which in my dataset is splitted as: some text - speech - other text Is it possible? Another optimization should be the one which could detect symbols to end a sentence longer than one character, for example "...". Can you tell me more about the following parameters? - iterations - cutoff Is there any guideline on how tune them? Cheers, Riccardo 2013/3/26 Jörn Kottmann <[email protected]>On 03/26/2013 08:40 AM, Riccardo Tasso wrote:Is the Sentence Detector able to split also on non dot characters? In my case there should be also other characters delimiting the end of a segment, such as: colon (:), dash (-), various kind of quotation marks (", `, ', ...).The Sentence Detector can only split on end-of-sentence characters, by default these are . ! ? but with 1.5.3 you can set them during training to your custom set, there is a command line argument for it on the Sentence Detector Trainer, haver a look at the help. If you don't want to compile yourself use the 1.5.3 RC2 which we are currently testing. Jörn
