Hello,

the sentence detector only considers EOS chars as potential
sentence boundaries, it should not be difficult to extend/modify it so
that locations detected by user code are used for the split decision.

The iterations specify the maximum number of iterations for an iterative
machine learning algorithm, and cutoff removes features which did not
occur at least n times in the training data.

Jörn

On 03/26/2013 01:52 PM, Riccardo Tasso wrote:
Thank you Jörn, in fact the results improved a lot:
Precision: 0.5325131810193322
Recall: 0.4745497259201253
F-Measure: 0.5018633540372671

I guess the splitter could have better results if it were able to detect
parenthetic structure such as:
some text - speech - other text
which in my dataset is splitted as:
some text
- speech -
other text
Is it possible?

Another optimization should be the one which could detect symbols to end a
sentence longer than one character, for example "...".

Can you tell me more about the following parameters?

    - iterations
    - cutoff

Is there any guideline on how tune them?

Cheers,
Riccardo



2013/3/26 Jörn Kottmann <[email protected]>

On 03/26/2013 08:40 AM, Riccardo Tasso wrote:

Is the Sentence Detector able to split also on non dot characters? In my
case there should be also other characters delimiting the end of a
segment,
such as: colon (:), dash (-), various kind of quotation marks (", `, ',
...).

The Sentence Detector can only split on end-of-sentence characters, by
default these
are . ! ? but with 1.5.3 you can set them during training to your custom
set, there is
a command line argument for it on the Sentence Detector Trainer, haver a
look at the help.

If you don't want to compile yourself use the 1.5.3 RC2 which we are
currently testing.

Jörn




Reply via email to