Chris,
> > AFAIK though it isn't possible to place a cap on the FuzzyOCR score. I
> > don't want to, but I detune it purely to reduce the likelyhood of
> > something hitting my discard threshold by OCR alone.
>
> If you consider this feature so important, then I could implement a
> max_score feature that caps the score done by word recognition. This is
> easy to implement.
>
> Or should it rather be a cap to all FuzzyOcr rules, including the others
> like malformed file etc?
For me a cap on the total score from FuzzyOcr was mandatory.
It was inacceptable that it alone could exceed the threshold,
typically when a multitude of similar FuzzyOcr hits happened.
I kept patching previous versions with:
--- FuzzyOcr.pm.ori Sun Jan 7 13:05:08 2007
+++ FuzzyOcr.pm Tue Jan 9 15:09:24 2007
@@ -927,4 +927,5 @@
infolog($debuginfo) unless ($conf->{focr_enable_image_hashing} ==
3);
}
+ $score = 5 if $score > 5; # !!! Mark
for my $set ( 0 .. 3 ) {
$pms->{conf}->{scoreset}->[$set]->{"FUZZY_OCR"} = $score;
Mark