"Ian Sparks" <[EMAIL PROTECTED]> writes:
> This is probably stupid and/or misguided but supposing I'm passed a
> byte-string value that I want to be unicode, this is what I do. I'm
> sure I'm missing something very important.
Perhaps you need to read one of the good Python Unicode tutorials,
such
ianaré wrote:
> maybe a bit off topic, but how does one find the console's encoding
> from within python?
>
In [1]: import sys
In [3]: sys.stdout.encoding
Out[3]: 'cp437'
In [4]: sys.stdin.encoding
Out[4]: 'cp437'
Kent
--
http://mail.python.org/mailman/listinfo/python-list
The most important thing that you are missing is that you need to know
the encoding used for the 8-bit-character string. Let's guess that it's
Latin1.
Then all you have to do is use the unicode() builtin function, or the
string decode method.
# >>> s = 'Jos\xe9'
# >>> s
# 'Jos\xe9'
# >>> u = unico
maybe a bit off topic, but how does one find the console's encoding
from within python?
--
http://mail.python.org/mailman/listinfo/python-list
First of all, if you run this on the console, find out your console's
encoding. In my case it is English Windows XP. It uses 'cp437'.
C:\>chcp
Active code page: 437
Then
>>> s = "José"
>>> u = u"Jos\u00e9" # same thing in unicode escape
>>> s.decode('cp437') == u # use encoding that
This is probably stupid and/or misguided but supposing I'm passed a byte-string
value that I want to be unicode, this is what I do. I'm sure I'm missing
something very important.
Short version :
>>> s = "José" #Start with non-unicode string
>>> unicoded = eval("u'%s'" % "José")
Long version :