To reiterate, I am not advocating for any change. I
simply want to understand if there is a good reason
for limiting the use of unchr/ord on narrow builds to
a subset of the unicode characters that Python otherwise
supports. So far, it seems not and that unichr/ord
is a poster child for
On Sun, 30 Aug 2009 06:54:21 +0200, Dieter Maurer wrote:
What you propose would break the property unichr(i) always returns
a string of length one, if it returns anything at all.
But getting a ValueError in some builds (and not in others)
is rather worse than getting unicode strings of
On 08/28/2009 02:12 AM, Martin v. Löwis wrote:
[I reordered the quotes from your previous post to try
and get the responses in a more coherent order. No
intent to take anything out of context...]
Nothing else in the PEP seems remotely relevant.
[to providing justification for the behavior of
On Sat, 29 Aug 2009 07:38:51 -0700, rurpy wrote:
Then, the next question is why is it implemented that way, to which
the answer is because the PEP says so.
Not at all a satisfying answer unless one believes in PEPal
infallibility. :-)
Not at all. You don't have to believe that PEPs are
On 08/29/2009 12:06 PM, Steven D'Aprano wrote:
[...]
The reasons for the current behavior so far:
1.
What you propose would break the property unichr(i) always returns a
string of length one, if it returns anything at all.
Yes. And i don't see the problem with that. Why is that
On 08/29/2009 01:43 PM, Vlastimil Brom wrote:
2009/8/29ru...@yahoo.com:
On 08/28/2009 02:12 AM, Martin v. Löwis wrote:
So far, it seems not and that unichr/ord
is a poster child for purity beats practicality.
--
http://mail.python.org/mailman/listinfo/python-list
As Mark
Martin v. Löwis mar...@v.loewis.de writes on Fri, 28 Aug 2009 10:12:34
+0200:
The PEP says:
* unichr(i) for 0 = i 2**16 (0x1) always returns a
length-one string.
* unichr(i) for 2**16 = i = TOPCHAR will return a
length-one string on wide Python builds. On
The PEP says:
* unichr(i) for 0 = i 2**16 (0x1) always returns a
length-one string.
* unichr(i) for 2**16 = i = TOPCHAR will return a
length-one string on wide Python builds. On narrow
builds it will raise ValueError.
and
* ord() is always the
On 08/26/2009 11:51 PM, Martin v. Löwis wrote:
[...]
But regardless, the significant question is, what is
the reason for having ord() (and unichr) not work for
surrogate pairs and thus not usable with a large number
of unicode characters that Python otherwise supports?
See PEP
2009/8/25 ru...@yahoo.com:
In Python 2.5 on Windows I could do [*1]:
# Create a unicode character outside of the BMP.
a = u'\U00010040'
# On Windows it is represented as a surogate pair.
len(a)
2
a[0],a[1]
(u'\ud800', u'\udc40')
# Create the same character with the unichr()
In Python 2.5 on Windows I could do [*1]:
a = unichr (65600)
a[0],a[1]
(u'\ud800', u'\udc40')
I can't reproduce that. My copy of Python on Windows gives
Traceback (most recent call last):
File pyshell#0, line 1, in module
unichr(65600)
ValueError: unichr() arg not in
On 08/26/2009 03:10 PM, Martin v. Löwis wrote:
In Python 2.5 on Windows I could do [*1]:
a = unichr (65600)
a[0],a[1]
(u'\ud800', u'\udc40')
I can't reproduce that. My copy of Python on Windows gives
Traceback (most recent call last):
File pyshell#0, line 1,
On Aug 25, 9:53 pm, Mark Tolonen metolone+gm...@gmail.com wrote:
ru...@yahoo.com wrote in message
news:2ad21a79-4a6c-42a7-8923-beb304bb5...@v20g2000yqm.googlegroups.com...
In Python 2.5 on Windows I could do [*1]:
# Create a unicode character outside of the BMP.
a = u'\U00010040'
On Aug 26, 2:05 am, Vlastimil Brom vlastimil.b...@gmail.com wrote:
[...]
Hi,
I'm not sure about the exact reasons for this behaviour on narrow
builds either (maybe the consistency of the input/ output data to
exactly one character?).
However, if I need these functions for higher unicode
2009/8/27 ru...@yahoo.com:
On Aug 26, 2:05 am, Vlastimil Brom vlastimil.b...@gmail.com wrote:
[...]
...
However, if I need these functions for higher unicode planes, the
following rather hackish replacements seem to work. I presume, there
might be smarter ways of dealing with this, but
On Wed, 26 Aug 2009 16:27:33 -0700, rurpy wrote:
But regardless, the significant question is, what is the reason for
having ord() (and unichr) not work for surrogate pairs and thus not
usable with a large number of unicode characters that Python otherwise
supports?
I'm no expert on Unicode,
On 08/26/2009 08:52 PM, Steven D'Aprano wrote:
On Wed, 26 Aug 2009 16:27:33 -0700, rurpy wrote:
But regardless, the significant question is, what is the reason for
having ord() (and unichr) not work for surrogate pairs and thus not
usable with a large number of unicode characters that
My apologies for the red herring. I was working from
a comment in my replacement ord() function. I dug up
an old copy of Python 2.4.3 and could not reproduce it
there either so I have no explanation for the comment
(which I wrote). Python 2.3 maybe?
No. The behavior you observed would
In Python 2.5 on Windows I could do [*1]:
# Create a unicode character outside of the BMP.
a = u'\U00010040'
# On Windows it is represented as a surogate pair.
len(a)
2
a[0],a[1]
(u'\ud800', u'\udc40')
# Create the same character with the unichr() function.
a = unichr
25-08-2009 o 21:45:49 ru...@yahoo.com wrote:
In Python 2.5 on Windows I could do [*1]:
# Create a unicode character outside of the BMP.
a = u'\U00010040'
# On Windows it is represented as a surogate pair.
[snip]
On Python 2.6, unichr() was fixed (using the word
loosely) so that it
Jan Kaliszewski wrote:
Are you sure, you couldn't have UCS-4-compiled Python distro
for Windows?? :-O
Nope, Windows require UCS-2 builds.
Christian
--
http://mail.python.org/mailman/listinfo/python-list
ru...@yahoo.com wrote in message
news:2ad21a79-4a6c-42a7-8923-beb304bb5...@v20g2000yqm.googlegroups.com...
In Python 2.5 on Windows I could do [*1]:
# Create a unicode character outside of the BMP.
a = u'\U00010040'
# On Windows it is represented as a surogate pair.
len(a)
2
22 matches
Mail list logo