Re: break unichr instead of fix ord?

2009-08-30 Thread Martin v. Löwis
To reiterate, I am not advocating for any change. I simply want to understand if there is a good reason for limiting the use of unchr/ord on narrow builds to a subset of the unicode characters that Python otherwise supports. So far, it seems not and that unichr/ord is a poster child for

Re: break unichr instead of fix ord?

2009-08-30 Thread Nobody
On Sun, 30 Aug 2009 06:54:21 +0200, Dieter Maurer wrote: What you propose would break the property unichr(i) always returns a string of length one, if it returns anything at all. But getting a ValueError in some builds (and not in others) is rather worse than getting unicode strings of

Re: break unichr instead of fix ord?

2009-08-29 Thread rurpy
On 08/28/2009 02:12 AM, Martin v. Löwis wrote: [I reordered the quotes from your previous post to try and get the responses in a more coherent order. No intent to take anything out of context...] Nothing else in the PEP seems remotely relevant. [to providing justification for the behavior of

Re: break unichr instead of fix ord?

2009-08-29 Thread Steven D'Aprano
On Sat, 29 Aug 2009 07:38:51 -0700, rurpy wrote: Then, the next question is why is it implemented that way, to which the answer is because the PEP says so. Not at all a satisfying answer unless one believes in PEPal infallibility. :-) Not at all. You don't have to believe that PEPs are

Re: break unichr instead of fix ord?

2009-08-29 Thread rurpy
On 08/29/2009 12:06 PM, Steven D'Aprano wrote: [...] The reasons for the current behavior so far: 1. What you propose would break the property unichr(i) always returns a string of length one, if it returns anything at all. Yes. And i don't see the problem with that. Why is that

Re: break unichr instead of fix ord?

2009-08-29 Thread rurpy
On 08/29/2009 01:43 PM, Vlastimil Brom wrote: 2009/8/29ru...@yahoo.com: On 08/28/2009 02:12 AM, Martin v. Löwis wrote: So far, it seems not and that unichr/ord is a poster child for purity beats practicality. -- http://mail.python.org/mailman/listinfo/python-list As Mark

Re: break unichr instead of fix ord?

2009-08-29 Thread Dieter Maurer
Martin v. Löwis mar...@v.loewis.de writes on Fri, 28 Aug 2009 10:12:34 +0200: The PEP says: * unichr(i) for 0 = i 2**16 (0x1) always returns a length-one string. * unichr(i) for 2**16 = i = TOPCHAR will return a length-one string on wide Python builds. On

Re: break unichr instead of fix ord?

2009-08-28 Thread Martin v. Löwis
The PEP says: * unichr(i) for 0 = i 2**16 (0x1) always returns a length-one string. * unichr(i) for 2**16 = i = TOPCHAR will return a length-one string on wide Python builds. On narrow builds it will raise ValueError. and * ord() is always the

Re: break unichr instead of fix ord?

2009-08-27 Thread rurpy
On 08/26/2009 11:51 PM, Martin v. Löwis wrote: [...] But regardless, the significant question is, what is the reason for having ord() (and unichr) not work for surrogate pairs and thus not usable with a large number of unicode characters that Python otherwise supports? See PEP

Re: break unichr instead of fix ord?

2009-08-26 Thread Vlastimil Brom
2009/8/25 ru...@yahoo.com: In Python 2.5 on Windows I could do [*1]:  # Create a unicode character outside of the BMP.   a = u'\U00010040'  # On Windows it is represented as a surogate pair.   len(a)  2   a[0],a[1]  (u'\ud800', u'\udc40')  # Create the same character with the unichr()

Re: break unichr instead of fix ord?

2009-08-26 Thread Martin v. Löwis
In Python 2.5 on Windows I could do [*1]: a = unichr (65600) a[0],a[1] (u'\ud800', u'\udc40') I can't reproduce that. My copy of Python on Windows gives Traceback (most recent call last): File pyshell#0, line 1, in module unichr(65600) ValueError: unichr() arg not in

Re: break unichr instead of fix ord?

2009-08-26 Thread rurpy
On 08/26/2009 03:10 PM, Martin v. Löwis wrote: In Python 2.5 on Windows I could do [*1]: a = unichr (65600) a[0],a[1] (u'\ud800', u'\udc40') I can't reproduce that. My copy of Python on Windows gives Traceback (most recent call last): File pyshell#0, line 1,

Re: break unichr instead of fix ord?

2009-08-26 Thread rurpy
On Aug 25, 9:53 pm, Mark Tolonen metolone+gm...@gmail.com wrote: ru...@yahoo.com wrote in message news:2ad21a79-4a6c-42a7-8923-beb304bb5...@v20g2000yqm.googlegroups.com... In Python 2.5 on Windows I could do [*1]:  # Create a unicode character outside of the BMP.   a = u'\U00010040'

Re: break unichr instead of fix ord?

2009-08-26 Thread rurpy
On Aug 26, 2:05 am, Vlastimil Brom vlastimil.b...@gmail.com wrote: [...] Hi, I'm not sure about the exact reasons for this behaviour on narrow builds either (maybe the consistency of the input/ output data to exactly one character?). However, if I need these functions for higher unicode

Re: break unichr instead of fix ord?

2009-08-26 Thread Vlastimil Brom
2009/8/27 ru...@yahoo.com: On Aug 26, 2:05 am, Vlastimil Brom vlastimil.b...@gmail.com wrote: [...] ... However, if I need these functions for higher unicode planes, the following rather hackish replacements seem to work. I presume, there might be smarter ways of dealing with this, but

Re: break unichr instead of fix ord?

2009-08-26 Thread Steven D'Aprano
On Wed, 26 Aug 2009 16:27:33 -0700, rurpy wrote: But regardless, the significant question is, what is the reason for having ord() (and unichr) not work for surrogate pairs and thus not usable with a large number of unicode characters that Python otherwise supports? I'm no expert on Unicode,

Re: break unichr instead of fix ord?

2009-08-26 Thread rurpy
On 08/26/2009 08:52 PM, Steven D'Aprano wrote: On Wed, 26 Aug 2009 16:27:33 -0700, rurpy wrote: But regardless, the significant question is, what is the reason for having ord() (and unichr) not work for surrogate pairs and thus not usable with a large number of unicode characters that

Re: break unichr instead of fix ord?

2009-08-26 Thread Martin v. Löwis
My apologies for the red herring. I was working from a comment in my replacement ord() function. I dug up an old copy of Python 2.4.3 and could not reproduce it there either so I have no explanation for the comment (which I wrote). Python 2.3 maybe? No. The behavior you observed would

break unichr instead of fix ord?

2009-08-25 Thread rurpy
In Python 2.5 on Windows I could do [*1]: # Create a unicode character outside of the BMP. a = u'\U00010040' # On Windows it is represented as a surogate pair. len(a) 2 a[0],a[1] (u'\ud800', u'\udc40') # Create the same character with the unichr() function. a = unichr

Re: break unichr instead of fix ord?

2009-08-25 Thread Jan Kaliszewski
25-08-2009 o 21:45:49 ru...@yahoo.com wrote: In Python 2.5 on Windows I could do [*1]: # Create a unicode character outside of the BMP. a = u'\U00010040' # On Windows it is represented as a surogate pair. [snip] On Python 2.6, unichr() was fixed (using the word loosely) so that it

Re: break unichr instead of fix ord?

2009-08-25 Thread Christian Heimes
Jan Kaliszewski wrote: Are you sure, you couldn't have UCS-4-compiled Python distro for Windows?? :-O Nope, Windows require UCS-2 builds. Christian -- http://mail.python.org/mailman/listinfo/python-list

Re: break unichr instead of fix ord?

2009-08-25 Thread Mark Tolonen
ru...@yahoo.com wrote in message news:2ad21a79-4a6c-42a7-8923-beb304bb5...@v20g2000yqm.googlegroups.com... In Python 2.5 on Windows I could do [*1]: # Create a unicode character outside of the BMP. a = u'\U00010040' # On Windows it is represented as a surogate pair. len(a) 2