Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs8.sourceforge.net:/tmp/cvs-serv7317

Modified Files:
        ImageStripper.py 
Log Message:
Generate token when no text is detected.


Index: ImageStripper.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/ImageStripper.py,v
retrieving revision 1.10
retrieving revision 1.11
diff -C2 -d -r1.10 -r1.11
*** ImageStripper.py    6 Nov 2006 14:50:30 -0000       1.10
--- ImageStripper.py    2 Dec 2006 22:09:25 -0000       1.11
***************
*** 192,198 ****
                  ocr.close()
                  ctokens = set()
!                 nlines = len(ctext.strip().split("\n"))
!                 if nlines:
!                     ctokens.add("image-text-lines:%d" % int(log2(nlines)))
                  self.cache[fhash] = (ctext, ctokens)
              textbits.append(ctext)
--- 192,204 ----
                  ocr.close()
                  ctokens = set()
!                 if not ctext.strip():
!                     # Lots of spam now contains images in which it is
!                     # difficult or impossible (using ocrad) to find any
!                     # text.  Make a note of that.
!                     ctokens.add("image-text:no text found")
!                 else:
!                     nlines = len(ctext.strip().split("\n"))
!                     if nlines:
!                         ctokens.add("image-text-lines:%d" % int(log2(nlines)))
                  self.cache[fhash] = (ctext, ctokens)
              textbits.append(ctext)

_______________________________________________
Spambayes-checkins mailing list
[email protected]
http://mail.python.org/mailman/listinfo/spambayes-checkins

Reply via email to