Thank you Jörn, in fact the results improved a lot:
Precision: 0.5325131810193322
Recall: 0.4745497259201253
F-Measure: 0.5018633540372671

I guess the splitter could have better results if it were able to detect
parenthetic structure such as:
some text - speech - other text
which in my dataset is splitted as:
some text
- speech -
other text
Is it possible?

Another optimization should be the one which could detect symbols to end a
sentence longer than one character, for example "...".

Can you tell me more about the following parameters?

   - iterations
   - cutoff

Is there any guideline on how tune them?

Cheers,
Riccardo



2013/3/26 Jörn Kottmann <[email protected]>

> On 03/26/2013 08:40 AM, Riccardo Tasso wrote:
>
>> Is the Sentence Detector able to split also on non dot characters? In my
>> case there should be also other characters delimiting the end of a
>> segment,
>> such as: colon (:), dash (-), various kind of quotation marks (", `, ',
>> ...).
>>
>
> The Sentence Detector can only split on end-of-sentence characters, by
> default these
> are . ! ? but with 1.5.3 you can set them during training to your custom
> set, there is
> a command line argument for it on the Sentence Detector Trainer, haver a
> look at the help.
>
> If you don't want to compile yourself use the 1.5.3 RC2 which we are
> currently testing.
>
> Jörn
>
>
>

Reply via email to