Re: [Tutor] Unicode strings

Kent Johnson Fri, 22 Aug 2008 12:05:28 -0700

On Fri, Aug 22, 2008 at 2:23 PM, eShopping
<[EMAIL PROTECTED]> wrote:
> Hi
>
> I am trying to read in non-ASCII data from file using Unicode, with this
> test app:
>
> vocab=[("abends","in the evening"),
> ("die Auff\xFCrung","performance (of a play)"),
> ("der Au\xDFenhandel","foreign trade")


The \x escapes are interpreted by the Python parser, they are not part
of the string. In other words, the string contains actual latin-1 byte
codes.

> The data in the file"eng_ger.txt" is listed below.  When I parse the data
> from the list, I get the correct text displayed but when reading it from
> file, the encoding into unicode does not occur.  I would be really grateful
> if someone could explain why the string-> unicode conversion works with
> lists but not with files!
>
> Thanks in advance
>
> Alun Griffiths
>
> Contents of "eng_ger.txt"
>
> abends,in the evening
> die Auff\xFCrung,performance (of a play)
> der Au\xDFenhandel,foreign trade

Here, the python parser is not interpreting the \x escapes so the file
contains actual \x rather than latin-1 characters.

Two options:
- Create the file with actual latin-1 characters
- Use the special 'string-escape' codec to interpret the data from the
file, e.g.
  print "   ",words[0],unicode(words[0].decode('string-escape'),"latin1")

Kent
_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Unicode strings

Reply via email to