Re: [IronPython] Django, unicode, and #20366

Dino Viehland Thu, 11 Feb 2010 11:07:07 -0800

Vernon wrote:
> You need the 'byte' class for Python 3 anyway. Implement it now.


Done!  Assuming you mean bytes it’s in 2.6 already.  Now if everyone would 
upgrade their code to use b’’ :)

> A small sample...
>
> <code x.py>
> import sys
> u = u'1234\u00f6'
> s = '1234'
> x = str(s)
> print type(x), repr(x)
> x = unicode(s)
> print type(x), repr(x)
> try:
>    x = unicode(u)
>    print type(x), repr(x)
> except:
>    print 'Error=',sys.exc_info()[0]
> try:
>    x = str(u)
>    print type(x), repr(x)
> except:
>    print 'Error=',sys.exc_info()[0]
> </code>
> --------------------
>
> The results...
>
> >c:\python26\python.exe x.py
> <type 'str'> '1234'
> <type 'unicode'> u'1234'
> <type 'unicode'> u'1234\xf6'
> Error= <type 'exceptions.UnicodeEncodeError'>
>
> >"c:\program files\Ironpython 2.6\ipy.exe" x.py
> <type 'str'> '1234'
> <type 'str'> '1234'
> Error= <type 'exceptions.UnicodeDecodeError'>
> Error= <type 'exceptions.UnicodeDecodeError'>
>
> >copy x.py x3.py
> >2to3 -w x3.py
> >c:\python31\python.exe x3.py
> <class 'str'> '1234'
> <class 'str'> '1234'
> <class 'str'> '1234ö'
> <class 'str'> '1234ö'
> ------------------------------
> One would think that IronPython should produce the same output as Python 3 -- 
> since 'str' and 'unicode' are the same thing in both dialects. In particular, 
> the exception when 'converting' unicode to > unicode is just plain wrong.


I'm not going to argue the exception isn't wrong.  But saying IronPython should 
output the same thing as an entirely different script isn't right either.  
After running 2to3 the script looks like this for me:

import sys
u = '1234\u00f6'
s = '1234'
x = str(s)
print(type(x), repr(x))
x = str(s)
print(type(x), repr(x))
try:
    x = str(u)
    print(type(x), repr(x))
except:
    print('Error=',sys.exc_info()[0])
try:
    x = str(u)
    print(type(x), repr(x))
except:
    print('Error=',sys.exc_info()[0])

You can argue whether or not 2to3 did the right thing here - it has completely 
dropped the distinction between str and unicode.  In reality if this was a 
script written for Python 2.5 and above your usage of str here is ambiguous.  
If this script was written for 2.6 and above then it's clear you want strings 
and not bytes because you'd have used bytes/bytearray/b'' to indicate bytes.  
The problem is there's still lots of code which runs on 2.5+ and won't be using 
bytes/bytearray/b'' but really is dealing with bytes and not strings.

The fact is this is going to be broken unless we were to make str be a distinct 
type from Unicode - then there'd be no ambiguity and we wouldn't have to guess. 
 But that's a massive change which propagates through the entire IronPython 
code base and involves tons of breaking changes.  I've looked at doing this 
before and it's spreads everywhere and there's lots of new ugliness.  We could 
look at doing it again but it seems like making that massive change and then 
switching to 3k and changing it all back isn't very productive.



_______________________________________________
Users mailing list
Users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com

Re: [IronPython] Django, __unicode__, and #20366

Reply via email to

Re: [IronPython] Django, unicode, and #20366