Re: Sentence detection evaluation methodology

Christopher Kotfila Tue, 06 Aug 2013 05:39:41 -0700

Jim - FooBar(); writes:

> On 05/08/13 19:17, Christopher Kotfila wrote:
>> Just so I'm clear,  true positives require correctly identifying the 
>> beginning and end of the sentence,  and any model that misidentifies one 
>> sentence in fact misidentifies two because sentences are assumed to be 
>> appear in a serial fashion. The model won't have a chance to correctly 
>> identify the beginning of a new sentence until the end of the second 
>> sentence.
>
> well, this is an implementation issue...as I said before these metrics 
> are more general than say 'sentence-detection'. Therefore, when you say 
> for instance precision of 75% it means that your model identified three 
> quarters of all the individual sentences in your test-set, as  
> individual sentences. In other words the predictions were 75% accurate. 
> Your question is more on the practical side...well, if you implement a 
> sentence-detection model using regular-expressions (and you certainly 
> could) there is no reason to identify start & end...all you care is the 
> split point and the start/end is assumed to be before/after that.  
> Similarly, with a ML model your features might have nothing to do with 
> the 'end-of-sentence' but rather you might put emphasis in 
> 'start-of-sentence' where the features are usually richer.
>
>> If this is the case then sentences must be compared in an unordered way, The 
>> second sentence in my model cannot be compared to the second sentence in the 
>> ground truth because the first sentence of my model may have subsumed the 
>> second (or more) sentences. This is different (or is it?) than assuming 
>> there is a sentence token and trying to correctly classify the set of 
>> sentence tokens as they appear in the stream of tokens that make up the text.
>
> again that depends on your implementation...


Right,  the focus of my question is on how OpenNLP implements sentence 
detection evaluation. It's a brass tacks question- I am trying to re-implement 
the evaluation in Python
. Maybe this belongs on devel? 

>
>> But just so i'm clear,  the OpenNLP sentence detection module tests on exact 
>> matches (start and end of sentences) yes?
>
> if you mean the maxent model, then no! It doesn't operate based on exact 
> matches like say regex...the maxent sentence-detection module involves a 
> pre-trained probabilistic classifier (aka maximum-entropy). It gives 
> predictions by examining certain features in the text.

No, not the maxent model,  What I meant was the evaluation of percision and 
recall are based on exact matches of start and end of sentence,  rather than 
correctly identifying only start or end. 

>
> Jim

Thanks again Jim for taking time!
/Chris

Re: Sentence detection evaluation methodology

Reply via email to