Changes by STINNER Victor victor.stin...@haypocalc.com:
--
status: open - closed
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3297
___
___
STINNER Victor victor.stin...@haypocalc.com added the comment:
We are too close from the final 2.7 release, it's too late to backport. As I
wrote, this feature is not important and there are many workaround, so we don't
need to backport to 3.1. Close the issue: use Python 3.2 if you want a
Changes by Terry J. Reedy tjre...@udel.edu:
--
versions: -Python 2.4, Python 2.5, Python 3.0
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3297
___
STINNER Victor victor.stin...@haypocalc.com added the comment:
@benjamin.peterson: Do you plan to port r75928 to 2.7 and 3.1? If not, can you
close this issue?
I think that this issue priority is minor because few people write directly
non-BMP characters in Python files (maybe only one, Ezio
Changes by Benjamin Peterson benja...@python.org:
--
dependencies: +UnicodeEncodeError - I can't even see license
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3297
___
Benjamin Peterson benja...@python.org added the comment:
Committed Adam's patch in r75928.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3297
___
Amaury Forgeot d'Arc amaur...@gmail.com added the comment:
This last point is already tracked by issue5127.
--
nosy: +amaury.forgeotdarc
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3297
Adam Olsen rha...@gmail.com added the comment:
Patch, which uses UTF-32-BE as indicated in my last comment. Test included.
--
keywords: +patch
Added file: http://bugs.python.org/file15043/py3k-nonBMP-literal.diff
___
Python tracker
Adam Olsen rha...@gmail.com added the comment:
With some further prodding I've noticed that although the test behaves
as expected in the py3k branch (fails on UTF-32 builds before the
patch), it doesn't fail using python 3.0. I'm guessing there's
interactions with compile() vs import and the
Adam Olsen rha...@gmail.com added the comment:
Looks like the failure mode has changed here, presumably due to issue
#3672 patches. It now always fails, even after loading from a .pyc.
This is using py3k via bzr, which reports itself as 3.2a0
$ rm unicodetest.pyc
$ ./python -c 'import
Adam Olsen rha...@gmail.com added the comment:
I've traced down the biggest problem to decode_unicode in ast.c. It
needs to convert everything into a form of escapes so it becomes pure
ascii, which then become evaluated back into a unicode object.
Unfortunately, it uses UTF-16-BE to do so,
Changes by Lino Mastrodomenico l.mastrodomen...@gmail.com:
--
nosy: +l.mastrodomenico
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3297
___
___
Changes by Jakub Wilk uba...@users.sf.net:
--
nosy: +jwilk
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3297
___
___
Python-bugs-list mailing
Changes by hippietrail hippytr...@gmail.com:
--
nosy: +hippietrail
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3297
___
___
Python-bugs-list
Adam Olsen [EMAIL PROTECTED] added the comment:
Marc, I don't understand what you're saying. UTF-16's surrogates are
not optional. Unicode 2.0 and later require them, and Python is
supposed to support it.
Likewise, UCS-4 originally allowed a much larger range of code points,
but it no longer
Adam Olsen [EMAIL PROTECTED] added the comment:
I've got another report open about the codecs not properly reporting
errors relating to surrogates: issue 3672
___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3297
Marc-Andre Lemburg [EMAIL PROTECTED] added the comment:
On 2008-08-29 23:33, Terry J. Reedy wrote:
Terry J. Reedy [EMAIL PROTECTED] added the comment:
Just to clarify: Python can be built as UCS2 or UCS4 build (not UTF-16
vs. UTF-32)
I recently read most of the Unicode 5 standard and as
Terry J. Reedy [EMAIL PROTECTED] added the comment:
Just to clarify: Python can be built as UCS2 or UCS4 build (not UTF-16
vs. UTF-32)
I recently read most of the Unicode 5 standard and as near as I could
tell it no longer uses the term UCS, if it ever did. Chapter 3 has only
the following 3
Benjamin Peterson [EMAIL PROTECTED] added the comment:
Ping.
--
nosy: +benjamin.peterson
___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3297
___
___
Changes by Antoine Pitrou [EMAIL PROTECTED]:
--
priority: - critical
versions: +Python 2.6
___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3297
___
___
Marc-Andre Lemburg [EMAIL PROTECTED] added the comment:
Adam, I do know what I'm talking about: I was the lead designer of the
Unicode integration you find in Python and implemented most of it.
What you see as repr() of a Unicode object is the result of applying a
codec to the internal
Adam Olsen [EMAIL PROTECTED] added the comment:
Marc, perhaps Unicode has refined their definitions since you last looked?
Valid UTF-8 *cannot* contain surrogates[1]. If it does, you have
CESU-8[2][3], not UTF-8.
So there are two bugs: first, the UTF-8 codec should refuse to load
surrogates.
Adam Olsen [EMAIL PROTECTED] added the comment:
Err, to clarify, the parse/compile/whatever stages is producing broken
UTF-32 (surrogates are ill-formed there too), and that gets transformed
into CESU-8 when the .pyc is saved.
___
Python tracker [EMAIL
Ezio Melotti [EMAIL PROTECTED] added the comment:
On my Linux box sys.maxunicode == 1114111 and len(u'\U00010123') == 1,
so it should be a UTF-32 build.
On windows instead sys.maxunicode == 65535 and len(u'\U00010123') == 2,
so it should be a UTF-16 build.
The problem seems then related to
Adam Olsen [EMAIL PROTECTED] added the comment:
Simpler way to reproduce this (on linux):
$ rm unicodetest.pyc
$
$ python -c 'import unicodetest'
Result: False
Len: 2 1
Repr: u'\ud800\udd23' u'\U00010123'
$
$ python -c 'import unicodetest'
Result: True
Len: 1 1
Repr: u'\U00010123'
Marc-Andre Lemburg [EMAIL PROTECTED] added the comment:
Just to clarify: Python can be built as UCS2 or UCS4 build (not UTF-16
vs. UTF-32).
The conversions done from the literal escaped representation to the
internal format are done using the unicode-escape and raw-unicode-escape
codecs.
PYC
Adam Olsen [EMAIL PROTECTED] added the comment:
No, the configure options are wrong - we do use UTF-16 and UTF-32.
Although modern UCS-4 has been restricted down to the range of UTF-32
(it used to be larger!), UCS-2 still doesn't support the supplementary
planes (ie no surrogates.)
If it
New submission from Ezio Melotti [EMAIL PROTECTED]:
Problem: when you have Unicode characters with a code point greater than
U+ written directly in the source file (that is, not in the form
u'\U' but as normal chars in a u'' string) the interpreter uses
surrogate pairs for
28 matches
Mail list logo