[issue17252] Latin Capital Letter I with Dot Above

2013-02-20 Thread Christian Heimes
Christian Heimes added the comment: In the meantime you can use PyICU https://pypi.python.org/pypi/PyICU for locale aware transformations: >>> from icu import UnicodeString, Locale >>> tr = Locale("TR") >>> s = UnicodeString("KADIN") >>> print(unicode(s.toLower(tr))) kadın >>> unicode(s.toLower

[issue17252] Latin Capital Letter I with Dot Above

2013-02-20 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 20.02.2013 15:58, Benjamin Peterson wrote: > > Benjamin Peterson added the comment: > > The "locale" module does not affect Unicode operations. That's C locale; I'm > talking about concept of Unicode locale, which Python doesn't currently know > anythi

[issue17252] Latin Capital Letter I with Dot Above

2013-02-20 Thread Benjamin Peterson
Benjamin Peterson added the comment: The "locale" module does not affect Unicode operations. That's C locale; I'm talking about concept of Unicode locale, which Python doesn't currently know anything about. I agree it would be useful to customize the locale of various unicode operations. That

[issue17252] Latin Capital Letter I with Dot Above

2013-02-20 Thread Antoine Pitrou
Antoine Pitrou added the comment: > "İ" is not "i\u0307". That's a different letter. "i\u0307"is 'i with > combining dot above'. However, "İ" is "\u0130" (Latin Capital Letter > I with Dot Above). Did you actually read my message? You can reconcile the two using unicodedata.normalize(). ---

[issue17252] Latin Capital Letter I with Dot Above

2013-02-20 Thread Firat Ozgul
Firat Ozgul added the comment: Apparently, what Python did wrong in the past was somewhat good for Turkish Python developers! This means Turkish developers now have one more problem to solve. Bad. -- ___ Python tracker

[issue17252] Latin Capital Letter I with Dot Above

2013-02-20 Thread R. David Murray
R. David Murray added the comment: Yes, earlier in that file is the generic translation: # Preserve canonical equivalence for I with dot. Turkic is handled below. 0130; 0069 0307; 0130; 0130; # LATIN CAPITAL LETTER I WITH DOT ABOVE You see that Python is following the standard, here. Agreed ab

[issue17252] Latin Capital Letter I with Dot Above

2013-02-20 Thread Firat Ozgul
Firat Ozgul added the comment: Whatever the behavior of Python is in 'generic' terms, I believe, we should be able to do locale-dependent uppercasing-lowercasing, which we cannot do at the moment. -- ___ Python tracker

[issue17252] Latin Capital Letter I with Dot Above

2013-02-20 Thread Firat Ozgul
Firat Ozgul added the comment: Even if you set Turkish locale, the output is still "generic". Furthermore, does "canonical equivalence" really dictate that 'Latin Capital Letter I with Dot Above' should be mapped to 'I With Combining Dot Above' in lowercase? Note: 'Uppercase Dotted i' only ex

[issue17252] Latin Capital Letter I with Dot Above

2013-02-20 Thread Benjamin Peterson
Benjamin Peterson added the comment: Notice the lines you pulled have "tr" and "az" at the end of them meaning they only apply for Turkish and Azeri. Since the lower() method has no idea whether the user intends to be in a Turkish or Azeri locale or not, we just have to use the generic lowerin

[issue17252] Latin Capital Letter I with Dot Above

2013-02-20 Thread Firat Ozgul
Firat Ozgul added the comment: Excerpt from http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt # Turkish and Azeri # I and i-dotless; I-dot and i are case pairs in Turkish and Azeri # The following rules handle those cases. 0130; 0069; 0130; 0130; tr; # LATIN CAPITAL LETTER I WITH DOT ABO

[issue17252] Latin Capital Letter I with Dot Above

2013-02-20 Thread Firat Ozgul
Changes by Firat Ozgul : -- resolution: invalid -> status: closed -> open ___ Python tracker ___ ___ Python-bugs-list mailing list Un

[issue17252] Latin Capital Letter I with Dot Above

2013-02-20 Thread R. David Murray
R. David Murray added the comment: Ah, you are right, I did not decode it to see what the actual characters were. That does contradict what I said, but I'm way out of my depth on unicode at this point, so we'll have to wait for someone more expert to weigh in. -- _

[issue17252] Latin Capital Letter I with Dot Above

2013-02-20 Thread Firat Ozgul
Firat Ozgul added the comment: ascii("KİTAP".lower().upper()) should return "K\u0130TAP". Yes, Python 3.2 loses information, but Python 3.3 inserts faulty information, which, I think, is much worse than losing information. -- ___ Python tracker

[issue17252] Latin Capital Letter I with Dot Above

2013-02-20 Thread Firat Ozgul
Firat Ozgul added the comment: Don't you think that there is a problem here? >>> "KİTAP".lower().upper() 'KİTAP' >>> ascii("KİTAP".lower().upper()) "'KI\\u0307TAP'" "İ" is not "i\u0307". That's a different letter. "i\u0307"is 'i with combining dot above'. However, "İ" is "\u0130" (Latin Capita

[issue17252] Latin Capital Letter I with Dot Above

2013-02-20 Thread Antoine Pitrou
Antoine Pitrou added the comment: Yes, I think 3.3 is correct here. I think it was Benjamin who fixed/improved the behaviour of casing methods. Compare 3.3: >>> "ß".upper() 'SS' with 3.2: >>> "ß".upper() 'ß' Also, 3.2 loses information: >>> "KİTAP".lower().upper() 'KITAP' >>> ascii("KİTAP".

[issue17252] Latin Capital Letter I with Dot Above

2013-02-20 Thread Firat Ozgul
Firat Ozgul added the comment: r.david.murray: '(...) because in 3.3 "\u0130".lower().upper() == "\u0130"' Do you mean in Python 3.3 "\u0130".lower() returns "\u0130"? If you are saying so, this is not the case, because in Python 3.3:: >>> '\u0130'.lower() 'i\u0307' -- _

[issue17252] Latin Capital Letter I with Dot Above

2013-02-20 Thread R. David Murray
R. David Murray added the comment: Right, and the unicode consortium says that that weird thing 3.3 is doing is the "canonical" lowercasing, and this is the case exactly because in 3.3 "\u0130".lower().upper() == "\u0130". Which I why I asked Ezio if we ever came up with a way to do lower/upp

[issue17252] Latin Capital Letter I with Dot Above

2013-02-20 Thread Firat Ozgul
Firat Ozgul added the comment: In Python, things like lowercasing-uppercasing and sorting were always problematic with regard to Turkish language. For instance, whatever the locale is, you cannot lowercase the word 'KADIN' (woman) in Turkish correctly:: >>> "KADIN".lower() 'kadin' ..

[issue17252] Latin Capital Letter I with Dot Above

2013-02-20 Thread R. David Murray
R. David Murray added the comment: I thought this would just be a difference in the unicode database, but that appears not to be the case. Ezio, this is related to the infamous Turkic dotless lower case i problem (see, eg, http://mail.python.org/pipermail/python-bugs-list/2005-October/030686.

[issue17252] Latin Capital Letter I with Dot Above

2013-02-20 Thread Firat Ozgul
New submission from Firat Ozgul: lower() method of strings gives different output for 'Latin Capital Letter I with Dot Above' on Python 3.2 and Python 3.3. On Python 3.2 (Windows XP): >>> "\u0130".lower() 'i' #this is correct On Python 3.3 (Windows XP): >>> "\u0130".lower() 'i\u0307' #this