[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-03-18 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: Committed to py3k, r70452. Since this is partway between a bugfix and a new feature, I suggest that it's not worth merging it to 3.0 (or 2.6). It should be backported to 2.7, however; I'll do this after verifying that the py3k buildbots

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-03-18 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: Backported to the trunk in r70454. Thanks, all! -- resolution: - fixed status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4474

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-03-01 Thread STINNER Victor
Changes by STINNER Victor victor.stin...@haypocalc.com: Removed file: http://bugs.python.org/file13167/unicode_fromwidechar_surrogate-6.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4474 ___

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-02-28 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: Good catch! Added defined(SIZEOF_WCHAR) to the testcapi code as well, and removed the change to PC/pyconfig.h, since we don't need it any more... Added file: http://bugs.python.org/file13210/unicode_fromwidechar_surrogate-7.patch

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-02-26 Thread STINNER Victor
Changes by STINNER Victor victor.stin...@haypocalc.com: Removed file: http://bugs.python.org/file13166/unicode_fromwidechar_surrogate-5.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4474 ___

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-02-26 Thread STINNER Victor
Changes by STINNER Victor victor.stin...@haypocalc.com: Removed file: http://bugs.python.org/file12890/unicode_fromwidechar_surrogate-4.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4474 ___

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-02-26 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: add defined(SIZEOF_WCHAR_T) check I don't understand why SIZEOF_WCHAR_T could be unset, but the patch version 6 only checks defined(SIZEOF_WCHAR_T) in unicodeobject.c, not in _testcapimodule.c (#if SIZEOF_WCHAR_T == 4).

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-02-24 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: Updated Victor's patch: - applies cleanly against newly whitespace-normalized unicodeobject.c - renamed USE_WCHAR_SURROGATE to CONVERT_WCHAR_TO_SURROGATES - add defined(SIZEOF_WCHAR_T) check I find the patched version of

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-02-24 Thread Marc-Andre Lemburg
Marc-Andre Lemburg m...@egenix.com added the comment: On 2009-02-24 20:39, Mark Dickinson wrote: Mark Dickinson dicki...@gmail.com added the comment: Updated Victor's patch: - applies cleanly against newly whitespace-normalized unicodeobject.c - renamed USE_WCHAR_SURROGATE to

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-02-24 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: It would be better to have a single #ifdef #else #endif Yes, of course it would. :) ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4474 ___

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-02-24 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: New patch, with two separate versions of PyUnicode_FromWideChar. Added file: http://bugs.python.org/file13167/unicode_fromwidechar_surrogate-6.patch ___ Python tracker rep...@bugs.python.org

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-02-24 Thread Marc-Andre Lemburg
Marc-Andre Lemburg m...@egenix.com added the comment: On 2009-02-24 21:50, Mark Dickinson wrote: Mark Dickinson dicki...@gmail.com added the comment: New patch, with two separate versions of PyUnicode_FromWideChar. Thanks, much better :-) ___ Python

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-01-29 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: For lemburg, updated patch: - Move USE_WCHAR_SURROGATE define outside PyUnicode_FromWideChar() (and indent the defines, sorry) - Add #define SIZEOF_WCHAR_T 2 to PC/pyconfig.h Added file:

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-01-29 Thread STINNER Victor
Changes by STINNER Victor victor.stin...@haypocalc.com: Removed file: http://bugs.python.org/file12822/unicode_fromwidechar_surrogate-3.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4474 ___

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-01-27 Thread Marc-Andre Lemburg
Marc-Andre Lemburg m...@egenix.com added the comment: On 2009-01-26 17:56, STINNER Victor wrote: STINNER Victor victor.stin...@haypocalc.com added the comment: @marketdickinson, @lemburg: ping! I updated the patch, does it look better? Yes, but there are a few things that still need

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-01-26 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: @marketdickinson, @lemburg: ping! I updated the patch, does it look better? ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4474 ___

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-01-20 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Also note that on platforms with 16-bit wchar_t, the comparison (0x *w) will always be false, so an additional check for (Py_UNICODE_SIZE 2) is needed. Yes, but the right test is (SIZEOF_WCHAR_T 2). I wrote a new test:

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-01-20 Thread STINNER Victor
Changes by STINNER Victor victor.stin...@haypocalc.com: Removed file: http://bugs.python.org/file12776/unicode_fromwidechar_surrogate-2.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4474 ___

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-01-19 Thread Marc-Andre Lemburg
Marc-Andre Lemburg m...@egenix.com added the comment: On 2009-01-18 22:59, Mark Dickinson wrote: Mark Dickinson dicki...@gmail.com added the comment: Looks good to me. I'm not in a position to test with 16-bit wchar_t, but I can't see why anything would go wrong. I think we can take

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-01-18 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: Looks good to me. I'm not in a position to test with 16-bit wchar_t, but I can't see why anything would go wrong. I think we can take our chances: check this in and watch the buildbots for signs of trouble. Some minor whitespace issues in

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-01-17 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: Thanks for the patch, Victor! Looks pretty good at first glance, except that it seems that the UTF-32 to UTF-16 translation is skipped if HAVE_USABLE_WCHAR_T is defined. Is that deliberate? A test would be good, too.

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-01-17 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Looks pretty good at first glance, except that it seems that the UTF-32 to UTF-16 translation is skipped if HAVE_USABLE_WCHAR_T is defined. Is that deliberate? #ifdef HAVE_USABLE_WCHAR_T memcpy(unicode-str, w, size *

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-01-17 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: I understand this code as: sizeof(wchar_t) == sizeof(Py_UNICODE). If I misunderstood the code, it's a a heap overflow :-) Yep, sorry. You're right. A test would be good, too. PyUnicode_FromWideChar() is not a public API. Should I write

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-01-17 Thread Marc-Andre Lemburg
Marc-Andre Lemburg m...@egenix.com added the comment: On 2009-01-17 14:00, STINNER Victor wrote: A test would be good, too. PyUnicode_FromWideChar() is not a public API. Should I write a function in _testcapi? It is a public C API. Regardless of this aspect, we should always add tests for

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-01-17 Thread Marc-Andre Lemburg
Marc-Andre Lemburg m...@egenix.com added the comment: On 2009-01-17 14:00, STINNER Victor wrote: STINNER Victor victor.stin...@haypocalc.com added the comment: Looks pretty good at first glance, except that it seems that the UTF-32 to UTF-16 translation is skipped if HAVE_USABLE_WCHAR_T is

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-01-17 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Updated patch including a test in _testcapi module: create two PyUnicode objects from wide string (PyUnicode_FromWideChar) and another from utf-8 (PyUnicode_FromString) and compare the value. Patch is still for py3k branch and can

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-01-17 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: I run my test on py3k on Linux with 32 bits wchar_t: - 16 bits Py_UNICODE: test is failing without PyUnicode_FromWideChar() patch - 32 bits Py_UNICODE: test pass without the patch, so the issue only impact 16 bits Py_UNICODE Can

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-01-17 Thread STINNER Victor
Changes by STINNER Victor victor.stin...@haypocalc.com: Removed file: http://bugs.python.org/file12773/unicode_fromwidechar_surrogate.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4474 ___

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-01-17 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: (with the full patch, all tests pass with 16 or 32 bits Py_UNICODE) ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4474 ___

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-01-16 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Patch fixing PyUnicode_FromWideChar() for UCS-2 build: create surrogates for character U+ like PyUnicode_FromOrdinal() does. -- keywords: +patch Added file:

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-01-16 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Note: I wrote my patch against py3k r68646. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4474 ___

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2008-12-01 Thread Mark Dickinson
Mark Dickinson [EMAIL PROTECTED] added the comment: Just to be clear, the defect in PyUnicode_FromWideChar is present both in Python 2.x and Python 3.x. The problem with command-line arguments only occurs in Python 3.x, since 2.x doesn't use PyUnicode_FromWideChar in converting arguments. I

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2008-12-01 Thread Marc-Andre Lemburg
Marc-Andre Lemburg [EMAIL PROTECTED] added the comment: This is due to the function downcasting the wchar_t values to Py_UNICODE, which is a 2-byte value if you build Python as UCS2 version on Unix. Most Unixes ship with UCS4 builds, so you don't see the problem there. Mac OS X ships with a

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2008-12-01 Thread Roumen Petrov
Roumen Petrov [EMAIL PROTECTED] added the comment: Marc-Andre explain all. For the protocol my version is from trunk, python is build with default options. Since system tcl limit UTF-8 to 3 bytes, python is build for UCS-2. In the report output from python is with character 010d(UCS-2). May

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2008-12-01 Thread STINNER Victor
Changes by STINNER Victor [EMAIL PROTECTED]: -- nosy: +haypo ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue4474 ___ ___ Python-bugs-list mailing list

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2008-11-30 Thread Mark Dickinson
New submission from Mark Dickinson [EMAIL PROTECTED]: On systems (Linux, OS X) where sizeof(wchar_t) is 4 and wchar_t arrays are usually encoded as UTF-32, it looks as though PyUnicode_FromWideChar simply truncates the 32-bit characters to 16-bits, thus giving incorrect results for characters

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2008-11-30 Thread Martin v. Löwis
Changes by Martin v. Löwis [EMAIL PROTECTED]: -- versions: +Python 2.6, Python 2.7, Python 3.0 ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue4474 ___

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2008-11-30 Thread Mark Dickinson
Mark Dickinson [EMAIL PROTECTED] added the comment: it is fine on linux Interesting. Which version of Python is that? And is PyUNICODE 2 bytes or 4 bytes for that build of Python? ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue4474