Re: Unicode Error - TMDA-CGI 0.13 pending list

Jim Ramsay Tue, 04 May 2004 09:01:22 -0700

Samuel Hill wrote:

Reading the message in pico shows the same thing, no \xa character at
all.
The way you see it below is the way it is, there is not character like
that at all in the message itself.
I have attached the message in an attachment though.
If I place exactly what is in this text file (the attachment) in my
pending it dumps.

The text file does indeed have some funny \xa0 character in the Subject. Reading in in Vim shows a funny blue '| ' character right where I'd expect the \xa0 to be. Also, an 'od -c 1083676255.82872.msg | grep 0001440' shows the character as octal 240 (which is 0xA0).

I believe that the /xa0 is actually coming from Unicode.py or something
right before it.

Sorry, I'll have to disagree here... the character is in the Subject line of the message you attached to that last post of yours. Just most other email apps and/or text editors are smarter than tmda-cgi about it :)

There seems to be a substitution piece of code in Unicode.py to maybe
prevent dumping on reading the message with tmda-cgi with certain
characters but instead is doing harm. So tmda-cgi is trying to do a
substitution? I know it is not tmda itself because the message sitting
in pending is all good, it is only on read in tmda-cgi.
Example...
AltChar  = re.compile("[\x80-\xFF]")
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

def Iso8859(Str): RetVal = u"" while 1: Match = AltChar.search(Str) if Match: RetVal += Str[:Match.start()] + Xlate(Match.group(0)) Str = Str[Match.end():] else: break RetVal += Str

This is only called if the character set requested is 'iso-8859-1' or 'us-ascii'... According to your traceback, the character set requested is 'us_ascii' - so this is never called. This character set is requested because the email itself says, lower down:

Content-Type: text/plain;
        charset="US_ASCII"

I have to apologise, I lied - I actually found time to do this and have fixed this in CVS now, please try the following patch and let me know if it works for you:

--- start here ---
diff -u -r1.6 -r1.7
--- Unicode.py  18 Feb 2004 15:10:48 -0000      1.6
+++ Unicode.py  4 May 2004 15:45:18 -0000       1.7
@@ -72,7 +72,7 @@
   CharSet = CS.input_charset

   # Find appropriate decoder
-  if CharSet in ("iso-8859-1", "us-ascii"):
+  if CharSet in ("iso-8859-1", "us-ascii", "us_ascii" ):
     Decoder = Iso8859
   else:
     try:
--- end here ---

Thanks for helping me track down this bug!

--
Jim Ramsay
"Me fail English?  That's unpossible!"

_____________________________________________
tmda-users mailing list ([EMAIL PROTECTED])
http://tmda.net/lists/listinfo/tmda-users

Re: Unicode Error - TMDA-CGI 0.13 pending list

Reply via email to