Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv22883/spambayes

Modified Files:
        tokenizer.py 
Log Message:
Fix [ 1175439 ] UnicodeEncodeError raised for bogus Content-Type header

If the content-type header is particularly bad, a UnicodeEncodeError could be
raised in tokenizer, which would stop classification.  If the exception is 
raised,
catch it and yield a token instead.

Index: tokenizer.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/tokenizer.py,v
retrieving revision 1.34
retrieving revision 1.35
diff -C2 -d -r1.34 -r1.35
*** tokenizer.py        21 Jan 2005 04:41:40 -0000      1.34
--- tokenizer.py        3 Apr 2005 23:30:54 -0000       1.35
***************
*** 828,834 ****
          yield 'content-type/type:' + x.lower()
  
!     for x in msg.get_charsets(None):
!         if x is not None:
!             yield 'charset:' + x.lower()
  
      x = msg.get('content-disposition')
--- 828,840 ----
          yield 'content-type/type:' + x.lower()
  
!     try:
!         for x in msg.get_charsets(None):
!             if x is not None:
!                 yield 'charset:' + x.lower()
!     except UnicodeEncodeError:
!         # Bad messages can cause an exception here.
!         # See [ 1175439 ] UnicodeEncodeError raised for bogus Content-Type
!         #                 header
!         yield 'charset:invalid_unicode'
  
      x = msg.get('content-disposition')

_______________________________________________
Spambayes-checkins mailing list
[email protected]
http://mail.python.org/mailman/listinfo/spambayes-checkins

Reply via email to