Re: [Python-Dev] PEP-393/PEP-3118: unicode format specifiers

2012-03-07 Thread Nick Coghlan
On Wed, Mar 7, 2012 at 9:39 PM, wrote: > Ah. I think the array module should maintain compatibility with Python 3.2, > i.e. "u" should continue to denote Py_UNICODE, i.e. 7fa098f6dc6a should be > reverted. > > It may be that the 'u' code is not particularly useful, but AFAICT, it never > was usef

Re: [Python-Dev] PEP-393/PEP-3118: unicode format specifiers

2012-03-07 Thread Nick Coghlan
On Wed, Mar 7, 2012 at 8:50 PM, Stefan Krah wrote: > *If* the arrays that Victor mentioned give one character per array location, > then memoryview(str) could be used for zero-copy slicing etc. A slight tangent, but it's worth trying to stick to the "code point" term when talking about what Unico

Re: [Python-Dev] PEP-393/PEP-3118: unicode format specifiers

2012-03-07 Thread martin
The main reason why I raised the issue is this: If Python-3.3 is shipped with 'u' -> UCS4 in the array module and *then* someone figures out that the above format codes are a great idea, we'd be stuck with yet another format code incompatibility. Ah. I think the array module should maintain comp

Re: [Python-Dev] PEP-393/PEP-3118: unicode format specifiers

2012-03-07 Thread Stefan Krah
"Martin v. L?wis" wrote: > > I think it would be nice for Python3.3 to implement the PEP-3118 > > suggestion: > > > > 'c' -> UCS1 > > > > 'u' -> UCS2 > > > > 'w' -> UCS4 > > What is the use case for these format codes? Unfortunately I've only worked with UTF-8 so far and I'm not too familiar

Re: [Python-Dev] PEP-393/PEP-3118: unicode format specifiers

2012-03-06 Thread Nick Coghlan
On Wed, Mar 7, 2012 at 4:15 AM, Stefan Krah wrote: > Victor Stinner wrote: >> A Unicode string is an array of code point. Another approach is to >> expose such string as an array of uint8/uint16/uint32 integers. I >> don't know if you expect to get a character / a substring when you >> read the b

Re: [Python-Dev] PEP-393/PEP-3118: unicode format specifiers

2012-03-06 Thread Martin v. Löwis
> I think it would be nice for Python3.3 to implement the PEP-3118 > suggestion: > > 'c' -> UCS1 > > 'u' -> UCS2 > > 'w' -> UCS4 What is the use case for these format codes? Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mai

Re: [Python-Dev] PEP-393/PEP-3118: unicode format specifiers

2012-03-06 Thread Stefan Krah
Victor Stinner wrote: > > 'c' -> UCS1 > > 'u' -> UCS2 > > 'w' -> UCS4 > > A Unicode string is an array of code point. Another approach is to > expose such string as an array of uint8/uint16/uint32 integers. I > don't know if you expect to get a character / a substring when you > read the buffer o

Re: [Python-Dev] PEP-393/PEP-3118: unicode format specifiers

2012-03-06 Thread Victor Stinner
> In the array module the 'u' specifier previously meant "2-bytes, on wide > builds 4-bytes". Currently in 3.3 the 'u' specifier is mapped to UCS4. > > I think it would be nice for Python3.3 to implement the PEP-3118 > suggestion: > > 'c' -> UCS1 > > 'u' -> UCS2 > > 'w' -> UCS4 A Unicode string is

[Python-Dev] PEP-393/PEP-3118: unicode format specifiers

2012-03-06 Thread Stefan Krah
Hello, In the array module the 'u' specifier previously meant "2-bytes, on wide builds 4-bytes". Currently in 3.3 the 'u' specifier is mapped to UCS4. I think it would be nice for Python3.3 to implement the PEP-3118 suggestion: 'c' -> UCS1 'u' -> UCS2 'w' -> UCS4 Actually we could even add '