[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-02-25 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: The latest patch (issue7649v4.diff) checks if the char is ASCII or non-ASCII and then, if the char is ASCII, it converts it directly to Unicode, otherwise it tries to decode it using the default encoding, raising a UnicodeDecodeError if

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-02-24 Thread Marc-Andre Lemburg
Marc-Andre Lemburg m...@egenix.com added the comment: Amaury Forgeot d'Arc wrote: Amaury Forgeot d'Arc amaur...@gmail.com added the comment: Could you please check for chars above 0x7f first and then use PyUnicode_Decode() instead of the PyUnicode_FromStringAndSize() API I concur:

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-02-24 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc amaur...@gmail.com added the comment: But why is it necessary to check for chars above 0x7f? The Python default encoding has to be ASCII compatible, Yes, but it is not necessarily as strict. for example, after I manage to set the default encoding to latin-1, u%s %

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-02-24 Thread Marc-Andre Lemburg
Marc-Andre Lemburg m...@egenix.com added the comment: Amaury Forgeot d'Arc wrote: Amaury Forgeot d'Arc amaur...@gmail.com added the comment: But why is it necessary to check for chars above 0x7f? The Python default encoding has to be ASCII compatible, Yes, but it is not necessarily as

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-02-24 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: I was trying to decode mainly to get a UnicodeDecodeError automatically. If I check if the char is not in the ASCII range (i.e. 0x7F) I think I'd have to set the error message for the UnicodeDecodeError manually and possibly duplicate it

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-02-24 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc amaur...@gmail.com added the comment: Marc-André's remark was that if char0x80 (the vast majority), it's not necessary to call any decode function: just copy the byte. Other cases (error, or non-default encoding) may use a slower path. --

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-02-24 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: Ok, adding a fast path shouldn't make the code more complicated, so I'll include it in the patch, thanks for the feedback. -- ___ Python tracker rep...@bugs.python.org

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-02-24 Thread Eric Smith
Eric Smith e...@trueblade.com added the comment: I'm working on a similar issue for int.__format__('c'). What's not clear to me is why this doesn't work the same as chr(i). That is, shouldn't chr(i) == ('%c' % i) hold for i in range(256)? And if that's so, why not just copy chr's

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-02-24 Thread Eric Smith
Eric Smith e...@trueblade.com added the comment: Of course. Sorry about that. But then why not copy unichr()? It calls PyUnicode_FromOrdinal. I guess my real question is: Should '%c' % i be identical to chr(i) and should u'%c' % i be identical to unichr(i)? And by identical I mean return

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-02-24 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: At least from point of view, the difference between ints and chars is: * u'%c' % 0xB5 means create a Unicode char with codepoint 0xB5, i.e. the Unicode char µ U+00B5 MICRO SIGN; * u'%c' % '\xB5' means create a Unicode char converting the

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-02-24 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: Just to clarify: u'%c' % some_int should behave like unichr(some_int); u'%c' % some_chr should behave like u'%s' % some_chr. -- ___ Python tracker rep...@bugs.python.org

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-02-24 Thread Marc-Andre Lemburg
Marc-Andre Lemburg m...@egenix.com added the comment: Ezio Melotti wrote: Just to clarify: u'%c' % some_int should behave like unichr(some_int); u'%c' % some_chr should behave like u'%s' % some_chr. That's correct. I guess that in practice u'%c' % some_chr is not all that common. Otherwise,

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-02-23 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: Here's a new patch that uses PyUnicode_FromStringAndSize and PyUnicode_AS_UNICODE(s)[0]. I also added some comments and tested it and it seems to work fine. -- Added file: http://bugs.python.org/file16339/issue7649v3.diff

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-02-23 Thread Eric Smith
Eric Smith e...@trueblade.com added the comment: The patch looks good to me, in that it implements your desired functionality well. I haven't been following the issue closely enough to know whether or not this new functionality is the right way to go. (I'm not saying it's not, just that I

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-02-23 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Commited: r78392 (r78394). -- resolution: - fixed status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7649 ___

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-02-22 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: New patch (issue7649v2.diff) with refleak fixed and improved unittests. -- keywords: +needs review Added file: http://bugs.python.org/file16314/issue7649v2.diff ___ Python tracker

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-02-22 Thread Eric Smith
Eric Smith e...@trueblade.com added the comment: Could you add a comment on why you're calling PyUnicode_FromString and then throwing away the result? I believe it's so you'll get the same error checking as PyUnicode_FromString, but it's sufficiently tricky that I think it deserves a comment.

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-02-22 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: Indeed some comments would be helpful, I'll add them. I also tried already to reuse 's' and extract the first char using unicode_getitem, but it returns a PyObject and anyway it's probably more expensive/complicated than calling

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-01-21 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: @Ezio: Your patch leaks a reference: PyUnicode_FromString(...) is not destroyed (Py_DECREF) on success. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7649

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-01-07 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: If we allow it to work on 2.7 the code will break: 1) when ported to Py3, where mixing bytes strings and unicode is not allowed; 2) when used on Py2.7, where the behavior is broken; 3) when converted to str.format, where only ints are

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-01-07 Thread Eric Smith
Eric Smith e...@trueblade.com added the comment: There's no perfect answer. And since we've gotten this far without anyone caring, and 2.7 is basically the end of life for this issue, perhaps doing nothing is the best course. Any change we make will affect code that runs in both 2.6 and 2.7,

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-01-07 Thread Florent Xicluna
Florent Xicluna la...@yahoo.fr added the comment: Tested on 2.5... ~ $ python2.5 Python 2.5.2 (r252:60911, Jan 4 2009, 21:59:32) [GCC 4.3.2] on linux2 Type help, copyright, credits or license for more information. u'%c' % '\x80' u'\Uff80' On 2.7 (trunk) I get same behaviour as

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-01-07 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: First patch that makes u'%c' % '\x80' raise a UnicodeDecodeError. I could reproduce the problem on Linux 32/64bit, Windows 32bit and Python from 2.4 to 2.7. The '\Uff80' is returned by wide builds. -- Added file:

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-01-06 Thread Ezio Melotti
New submission from Ezio Melotti ezio.melo...@gmail.com: On Py2.x u'%c' % char returns the wrong result if char is in range '\x80'-'\xFF': u'%c' % '\x7f' u'\x7f' # correct u'%c' % '\x80' u'\uff80' # broken u'%c' % '\xFF' u'\u' # broken This is what happens whit %s and 0x80: u'%s' %

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-01-06 Thread Eric Smith
Changes by Eric Smith e...@trueblade.com: -- nosy: +eric.smith ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7649 ___ ___ Python-bugs-list mailing

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-01-06 Thread Eric Smith
Eric Smith e...@trueblade.com added the comment: This looks like a signed vs. unsigned char problem. What platform are you on? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7649 ___

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-01-06 Thread Eric Smith
Changes by Eric Smith e...@trueblade.com: -- nosy: +doerwalter ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7649 ___ ___ Python-bugs-list mailing

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-01-06 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: The problem is the cast char = Py_UNICODE in formatchar() function. char may be unsigned, but most of the time it's signed. signed = unsigned cast extend the sign. Example: (unsigned short)(signed char)0x80 gives 0xFF80. Attached

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-01-06 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: The problem is specific to Python 2.x. With Python3, %c expects one unicode character (eg. a). My patch fixes the char = Py_UNICODE conversion, but raising an error is maybe better to be consistent with u%s % \x80 (and prepare the

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-01-06 Thread Eric Smith
Eric Smith e...@trueblade.com added the comment: Shouldn't it work the same as it does for integers? u'%c' % 0x7f u'\x7f' u'%c' % '\x7f' u'\x7f' u'%c' % 0x80 u'\x80' u'%c' % '\x80' u'\uff80' That would imply to me it shouldn't be an error, it should just return u'\x80'. --

[issue7649] u'%c' % char broken for chars in range '\x80'-'\xFF'

2010-01-06 Thread Eric Smith
Changes by Eric Smith e...@trueblade.com: -- keywords: +easy priority: high - normal stage: test needed - patch review ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7649 ___