Re: [IronPython] Issue about string.upper and string.lower

Dino Viehland Wed, 17 Dec 2008 20:15:29 -0800

There were no issues with making this change revealed by our tests.  The change 
is in our 2.1 branch internally so should so up externally really soon.  I 
suspect we'll discuss backporting it to 2.0.1 at one of our team meetings but 
unless we hear about some issues this causes then I personally would be 
pro-backporting.

From: users-boun...@lists.ironpython.com 
[mailto:users-boun...@lists.ironpython.com] On Behalf Of Dino Viehland
Sent: Monday, December 15, 2008 11:13 PM
To: glenn.k.jones+...@gmail.com; Discussion of IronPython
Subject: Re: [IronPython] Issue about string.upper and string.lower

I've actually looked at this not too long ago and I think your proposal of 
calling the Invariant functions is the correct solution.  I was looking at a 
few things:  This bug http://bugs.python.org/issue1528802, the 3.0 decimal.py 
module, and also just using Turkish I at the command prompt.  If you follow the 
comments the bug says:

"String upper and lower conversion are locale dependent and
implemented by the underlying libc, whereas Unicode
upper/lower conversion is not and only depends on the
Unicode character database."

That's a pretty clear statement that we shouldn't be using the current locale 
for our upper/lower string conversions.  It wouldn't surprise me if that breaks 
something somewhere because we won't be doing locale dependent conversions on 
what someone expects to its type to be str not unicode but in this case I think 
it'd be better to be consistent with the Unicode side of Python as that's our 
future.

As for the decimal module it doesn't change from 2.x to 3.0.  So .upper() 
apparently doesn't have this problem when CPython switches to Unicode strings.  
Or at least no one's hit it, and when they do I think the resolution would be 
the same as 1528802.

Finally at the command prompt I could never get CPython to do a 
culture-sensitive operation.  I hadn't fully convinced myself on that part 
though because I hadn't yet escalated to a Turkish install of the OS running 
IronPython.

But I'm still pretty confident we're at fault and we should change our 
lower/upper implementation.  Obviously the change is easy but I'll do a full 
test pass to see if it breaks anything.

I also think calling ToUpper to get non-Pythonic results is easy enough (I 
actually think it's kind of better this way - it saves typing out the framework 
friendly ToUpperInvariant :)).

From: users-boun...@lists.ironpython.com 
[mailto:users-boun...@lists.ironpython.com] On Behalf Of Glenn Jones
Sent: Friday, December 12, 2008 7:27 AM
To: Discussion of IronPython
Subject: [IronPython] Issue about string.upper and string.lower

Hello everybody,

We ran across 
http://www.codeplex.com/IronPython/WorkItem/View.aspx?WorkItemId=13629 (turkish 
collation issues) today while trying to port Resolver One to IronPython 2.0.

This is in a test that tries to import decimal while in the turkish locale (it 
was actually reported by a user!). It does an .upper on a string with an 'i', 
and that doesn't give the expected results.

This is a very big issue because all the python code out there expects string 
transformations to be locale-independent, and there may be strange bugs in 
strange places.

Is mapping .upper and .lower to ToUpperInvariant and ToLowerInvariant an 
acceptable solution? People that want to do locale-dependent transformations 
can always use the .NET specific ToUpper/ToLower.

We can work around the decimal being unimportable by hacking it, but clearly 
this is not a general solution. We will report other modules that might fail 
from this as we find them.

Thanks,
Glenn and Orestis

_______________________________________________
Users mailing list
Users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com

Re: [IronPython] Issue about string.upper and string.lower

Reply via email to