Jim <[email protected]> writes:
Hi Jim,
Just so I'm clear, true positives are require correctly identifying the
beginning and end of the sentence, and any model that misidentifies one
sentence in fact misidentifies two because sentences are assumed to be appear
in a serial fashion. The model won't have a chance to correctly identify the
beginning of a new sentence until the end of the second sentence.
If this is the case then sentences must be compared in an unordered way, The
second sentence in my model cannot be compared to the second sentence in the
ground truth because the first sentence of my model may have subsumed the
second (or more) sentences. This is different (or is it?) than assuming there
is a sentence token and trying to correctly classify the set of sentence tokens
as they appear in the stream of tokens that make up the text.
But just so I'm clear, the OpenNLP sentence detection module tests on exact
matches (start and end of sentences) yes?
Thanks again for your time!
/Chris
> Hi Chris,
>
> precision (P) and recall (R) are well defined evaluation metrics and
> apply to various statistical evaluations including
> sentence-detection... but there is nothing special about
> sentence-detection. If you understand what P & R mean in a NER or a
> POS-tagging conext, then it is the same thing for
> sentence-detection...
>
> for example say you have a predictive model M. You train it on some
> data X and you test it on some data Y.
>
> -P is concerned with 'what proportion of the retrieved data, that are
> true positives' (they were correctly classified as relevant). In
> sentence-detection, that would translate to 'how many of the
> recognised sentences are actually correct?'
>
> -R is concerned with 'what proportion of all the relevant data has
> been retrieved'. In sentence-detection this translates to 'out of all
> the correct sentences, how many did the model retrieve?'
>
> I've always found the picture in [1] quite helpful
>
> [1] https://en.wikipedia.org/wiki/Precision_and_recall
>
> HTH,
>
> Jim
>
>
> On 05/08/13 12:29, Christopher Kotfila wrote:
>> Good morning!
>>
>> I'm trying to get a better sense of how precision and recall are calculated
>> for the sentence detection module. The manual online does not seem to have
>> a through discussion of the topic, and while i've begun looking through the
>> source I am not an experienced Java programmer and so am having some
>> difficulty divining the theory behind numbers. Citations welcome!
>>
>> Thanks!
>> Chris
>>