Re: Sentence detection evaluation methodology

Jim - FooBar(); Mon, 05 Aug 2013 11:55:08 -0700

On 05/08/13 19:17, Christopher Kotfila wrote:

Just so I'm clear,  true positives require correctly identifying the beginning 
and end of the sentence,  and any model that misidentifies one sentence in fact 
misidentifies two because sentences are assumed to be appear in a serial 
fashion. The model won't have a chance to correctly identify the beginning of a 
new sentence until the end of the second sentence.

well, this is an implementation issue...as I said before these metricsare more general than say 'sentence-detection'. Therefore, when you sayfor instance precision of 75% it means that your model identified threequarters of all the individual sentences in your test-set, asindividual sentences. In other words the predictions were 75% accurate.Your question is more on the practical side...well, if you implement asentence-detection model using regular-expressions (and you certainlycould) there is no reason to identify start & end...all you care is thesplit point and the start/end is assumed to be before/after that.Similarly, with a ML model your features might have nothing to do withthe 'end-of-sentence' but rather you might put emphasis in'start-of-sentence' where the features are usually richer.

If this is the case then sentences must be compared in an unordered way, The 
second sentence in my model cannot be compared to the second sentence in the 
ground truth because the first sentence of my model may have subsumed the 
second (or more) sentences. This is different (or is it?) than assuming there 
is a sentence token and trying to correctly classify the set of sentence tokens 
as they appear in the stream of tokens that make up the text.


again that depends on your implementation...

But just so i'm clear,  the OpenNLP sentence detection module tests on exact 
matches (start and end of sentences) yes?

if you mean the maxent model, then no! It doesn't operate based on exactmatches like say regex...the maxent sentence-detection module involves apre-trained probabilistic classifier (aka maximum-entropy). It givespredictions by examining certain features in the text.

Jim

Re: Sentence detection evaluation methodology

Reply via email to