Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv22883/spambayes
Modified Files:
tokenizer.py
Log Message:
Fix [ 1175439 ] UnicodeEncodeError raised for bogus Content-Type header
If the content-type header is particularly bad, a UnicodeEncodeError could be
raised in tokenizer, which would stop classification. If the exception is
raised,
catch it and yield a token instead.
Index: tokenizer.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/tokenizer.py,v
retrieving revision 1.34
retrieving revision 1.35
diff -C2 -d -r1.34 -r1.35
*** tokenizer.py 21 Jan 2005 04:41:40 -0000 1.34
--- tokenizer.py 3 Apr 2005 23:30:54 -0000 1.35
***************
*** 828,834 ****
yield 'content-type/type:' + x.lower()
! for x in msg.get_charsets(None):
! if x is not None:
! yield 'charset:' + x.lower()
x = msg.get('content-disposition')
--- 828,840 ----
yield 'content-type/type:' + x.lower()
! try:
! for x in msg.get_charsets(None):
! if x is not None:
! yield 'charset:' + x.lower()
! except UnicodeEncodeError:
! # Bad messages can cause an exception here.
! # See [ 1175439 ] UnicodeEncodeError raised for bogus Content-Type
! # header
! yield 'charset:invalid_unicode'
x = msg.get('content-disposition')
_______________________________________________
Spambayes-checkins mailing list
[email protected]
http://mail.python.org/mailman/listinfo/spambayes-checkins