Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs8.sourceforge.net:/tmp/cvs-serv7317
Modified Files:
ImageStripper.py
Log Message:
Generate token when no text is detected.
Index: ImageStripper.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/ImageStripper.py,v
retrieving revision 1.10
retrieving revision 1.11
diff -C2 -d -r1.10 -r1.11
*** ImageStripper.py 6 Nov 2006 14:50:30 -0000 1.10
--- ImageStripper.py 2 Dec 2006 22:09:25 -0000 1.11
***************
*** 192,198 ****
ocr.close()
ctokens = set()
! nlines = len(ctext.strip().split("\n"))
! if nlines:
! ctokens.add("image-text-lines:%d" % int(log2(nlines)))
self.cache[fhash] = (ctext, ctokens)
textbits.append(ctext)
--- 192,204 ----
ocr.close()
ctokens = set()
! if not ctext.strip():
! # Lots of spam now contains images in which it is
! # difficult or impossible (using ocrad) to find any
! # text. Make a note of that.
! ctokens.add("image-text:no text found")
! else:
! nlines = len(ctext.strip().split("\n"))
! if nlines:
! ctokens.add("image-text-lines:%d" % int(log2(nlines)))
self.cache[fhash] = (ctext, ctokens)
textbits.append(ctext)
_______________________________________________
Spambayes-checkins mailing list
[email protected]
http://mail.python.org/mailman/listinfo/spambayes-checkins