Re: Sentence detection evaluation methodology

Christopher Kotfila Mon, 05 Aug 2013 12:12:11 -0700

Jim <[email protected]> writes:

Hi Jim,

Just so I'm clear,  true positives are require correctly identifying the 
beginning and end of the sentence,  and any model that misidentifies one 
sentence in fact misidentifies two because sentences are assumed to be appear 
in a serial fashion. The model won't have a chance to correctly identify the 
beginning of a new sentence until the end of the second sentence.

If this is the case then sentences must be compared in an unordered way, The 
second sentence in my model cannot be compared to the second sentence in the 
ground truth because the first sentence of my model may have subsumed the 
second (or more) sentences. This is different (or is it?) than assuming there 
is a sentence token and trying to correctly classify the set of sentence tokens 
as they appear in the stream of tokens that make up the text. 

But just so I'm clear,  the OpenNLP sentence detection module tests on exact 
matches (start and end of sentences) yes?

Thanks again for your time!
/Chris

> Hi Chris,
>
> precision (P) and recall (R) are well defined evaluation metrics and
> apply to various statistical evaluations including
> sentence-detection... but there is nothing special about
> sentence-detection. If you understand what P & R mean in a NER or a
> POS-tagging conext, then it is the same thing for
> sentence-detection...
>
> for example say you have a predictive model M. You train it on some
> data X and you test it on some data Y.
>
> -P is concerned with 'what proportion of the retrieved data, that are
> true positives' (they were correctly classified as relevant). In
> sentence-detection, that would translate to 'how many of the
> recognised sentences are actually correct?'
>
> -R is concerned with 'what proportion of all the relevant data has
> been retrieved'.  In sentence-detection this translates to 'out of all
> the correct sentences, how many did the model retrieve?'
>
> I've always found the picture in [1] quite helpful
>
> [1] https://en.wikipedia.org/wiki/Precision_and_recall
>
> HTH,
>
> Jim
>
>
> On 05/08/13 12:29, Christopher Kotfila wrote:
>> Good morning!
>>
>> I'm trying to get a better sense of how precision and recall are calculated
>> for the sentence detection module. The manual online does not seem to have
>> a through discussion of the topic, and while i've begun looking through the
>> source I am not an experienced Java programmer and so am having some
>> difficulty divining the theory behind numbers.  Citations welcome!
>>
>> Thanks!
>> Chris
>>
Re: Sentence detection evaluation methodology

Reply via email to