On Fri, Aug 22, 2008 at 2:23 PM, eShopping <[EMAIL PROTECTED]> wrote: > Hi > > I am trying to read in non-ASCII data from file using Unicode, with this > test app: > > vocab=[("abends","in the evening"), > ("die Auff\xFCrung","performance (of a play)"), > ("der Au\xDFenhandel","foreign trade")
The \x escapes are interpreted by the Python parser, they are not part of the string. In other words, the string contains actual latin-1 byte codes. > The data in the file"eng_ger.txt" is listed below. When I parse the data > from the list, I get the correct text displayed but when reading it from > file, the encoding into unicode does not occur. I would be really grateful > if someone could explain why the string-> unicode conversion works with > lists but not with files! > > Thanks in advance > > Alun Griffiths > > Contents of "eng_ger.txt" > > abends,in the evening > die Auff\xFCrung,performance (of a play) > der Au\xDFenhandel,foreign trade Here, the python parser is not interpreting the \x escapes so the file contains actual \x rather than latin-1 characters. Two options: - Create the file with actual latin-1 characters - Use the special 'string-escape' codec to interpret the data from the file, e.g. print " ",words[0],unicode(words[0].decode('string-escape'),"latin1") Kent _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor