Jim - FooBar(); writes: > On 05/08/13 19:17, Christopher Kotfila wrote: >> Just so I'm clear, true positives require correctly identifying the >> beginning and end of the sentence, and any model that misidentifies one >> sentence in fact misidentifies two because sentences are assumed to be >> appear in a serial fashion. The model won't have a chance to correctly >> identify the beginning of a new sentence until the end of the second >> sentence. > > well, this is an implementation issue...as I said before these metrics > are more general than say 'sentence-detection'. Therefore, when you say > for instance precision of 75% it means that your model identified three > quarters of all the individual sentences in your test-set, as > individual sentences. In other words the predictions were 75% accurate. > Your question is more on the practical side...well, if you implement a > sentence-detection model using regular-expressions (and you certainly > could) there is no reason to identify start & end...all you care is the > split point and the start/end is assumed to be before/after that. > Similarly, with a ML model your features might have nothing to do with > the 'end-of-sentence' but rather you might put emphasis in > 'start-of-sentence' where the features are usually richer. > >> If this is the case then sentences must be compared in an unordered way, The >> second sentence in my model cannot be compared to the second sentence in the >> ground truth because the first sentence of my model may have subsumed the >> second (or more) sentences. This is different (or is it?) than assuming >> there is a sentence token and trying to correctly classify the set of >> sentence tokens as they appear in the stream of tokens that make up the text. > > again that depends on your implementation...
Right, the focus of my question is on how OpenNLP implements sentence detection evaluation. It's a brass tacks question- I am trying to re-implement the evaluation in Python . Maybe this belongs on devel? > >> But just so i'm clear, the OpenNLP sentence detection module tests on exact >> matches (start and end of sentences) yes? > > if you mean the maxent model, then no! It doesn't operate based on exact > matches like say regex...the maxent sentence-detection module involves a > pre-trained probabilistic classifier (aka maximum-entropy). It gives > predictions by examining certain features in the text. No, not the maxent model, What I meant was the evaluation of percision and recall are based on exact matches of start and end of sentence, rather than correctly identifying only start or end. > > Jim Thanks again Jim for taking time! /Chris
