According to Giuseppe Bonelli: > sorry if this is not zope specific, but can someone please explain > to me the following behaviour when trying to convert an iso-8859-1 string > read from a file to an utf-8 encoded one? > > s='\x93test\x94' #an iso-8859-1 string > #\x93 and \x94 are left and right > #double quotation marks, > #as seen in a browser set to iso-8859-1
\x93 and \x94 are *not* iso-8859-1 quotation marks. See for example http://en.wikipedia.org/wiki/ISO_8859-1 Instead they seem to be from the Windows-125X (X=0,1,...) codepage: http://www.microsoft.com/globaldev/reference/sbcs/1250.mspx > ss=unicode(s,'iso-8859-1').encode('utf-8') > gives > ss='\xc2\x93test\xc2\x94' > which is wrong (as seen in a browser set to utf-8)! but: >>> unicode(s,'cp1250').encode('utf-8') '\xe2\x80\x9ctest\xe2\x80\x9d' is right. > Do I have to explicitly replace all characters above \x7F ? No, you have to use the right encodings ;-) \wlang{} -- [EMAIL PROTECTED] Fax: +43/1/31336/9207 Zentrum fuer Informatikdienste, Wirtschaftsuniversitaet Wien, Austria _______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )