Re: [Python-3000] string C API

2006-09-15 Thread Nick Coghlan
Martin v. Löwis wrote: > Nick Coghlan schrieb: >> Only the first such call on a given string, though - the idea is to use >> lazy decoding, not to avoid decoding altogether. Most manipulations >> (len, indexing, slicing, concatenation, etc) would require decoding to >> at least UCS-2 (or perhaps UC

Re: [Python-3000] string C API

2006-09-15 Thread Jim Jewett
On 9/15/06, Nick Coghlan <[EMAIL PROTECTED]> wrote: > Martin v. Löwis wrote: > > Nick Coghlan schrieb: > >> Only the first such call on a given string, though - the idea is to use > >> lazy decoding, not to avoid decoding altogether. Most manipulations > >> (len, indexing, slicing, concatenation, e

Re: [Python-3000] string C API

2006-09-15 Thread Nick Coghlan
Jim Jewett wrote: >> > ISTM that raising the exception lazily (which seems to be necessary) >> > would be very confusing. > >> Yeah, it appears it would be necessary to at least *scan* the string >> when it >> was first created in order to ensure it can be decoded without errors >> later on. >

Re: [Python-3000] string C API

2006-09-15 Thread Jason Orendorff
On 9/15/06, Jim Jewett <[EMAIL PROTECTED]> wrote: > There should be only one reference to a string until is constructed, > and after that, its data should be immutable. Recoding that results > in different bytes should not be in-place. Either it returns a new > string (no problem) or it doesn't c

Re: [Python-3000] string C API

2006-09-15 Thread Paul Prescod
On 9/15/06, Jason Orendorff <[EMAIL PROTECTED]> wrote: I'm sure this will happen to the same degree that it's become astandard recipe in Java and C# (both of which lack polymorphicwhatzits).  Which is to say, not at all.I think Jason's point is key. This is probably premature optimization and shoul

Re: [Python-3000] string C API

2006-09-15 Thread Jim Jewett
On 9/15/06, Nick Coghlan <[EMAIL PROTECTED]> wrote: > Jim Jewett wrote: > >> ... would be necessary to at least *scan* the string when it > >> was first created in order to ensure it can be decoded without errors > > What happens today with strings? I think the answer is: > > "Nothing. > >

Re: [Python-3000] string C API

2006-09-15 Thread Josiah Carlson
"Jim Jewett" <[EMAIL PROTECTED]> wrote: > Interning may get awkward if multiple encodings are allowed within a > program, regardless of whether they're allowed for single strings. It > might make sense to intern only strings that are in the same encoding > as the source code. (Or whose values ar

Re: [Python-3000] string C API

2006-09-15 Thread Josiah Carlson
"Jason Orendorff" <[EMAIL PROTECTED]> wrote: > > On 9/15/06, Jim Jewett <[EMAIL PROTECTED]> wrote: > > There should be only one reference to a string until is constructed, > > and after that, its data should be immutable. Recoding that results > > in different bytes should not be in-place. Eith

Re: [Python-3000] string C API

2006-09-15 Thread Antoine Pitrou
Le vendredi 15 septembre 2006 à 10:48 -0700, Josiah Carlson a écrit : > This is one of the reasons why I was talking Latin-1, UCS-2, and UCS-4: You could replace "latin-1" with "one-byte system encoding chosen at interpreter startup depending on locale". There are lots of 8-bit encodings other tha

Re: [Python-3000] string C API

2006-09-15 Thread Marcin 'Qrczak' Kowalczyk
Antoine Pitrou <[EMAIL PROTECTED]> writes: >> This is one of the reasons why I was talking Latin-1, UCS-2, and UCS-4: > > You could replace "latin-1" with "one-byte system encoding chosen at > interpreter startup depending on locale". Latin-1 has the advantage of being trivially decodable to a se

Re: [Python-3000] string C API

2006-09-15 Thread Paul Prescod
On 9/15/06, Antoine Pitrou <[EMAIL PROTECTED]> wrote: Le vendredi 15 septembre 2006 à 10:48 -0700, Josiah Carlson a écrit :> This is one of the reasons why I was talking Latin-1, UCS-2, and UCS-4:You could replace "latin-1" with "one-byte system encoding chosen at interpreter startup depending on

Re: [Python-3000] string C API

2006-09-15 Thread Josiah Carlson
"Paul Prescod" <[EMAIL PROTECTED]> wrote: [snip] > The result seems obvious to me...8-bit-fixed encodings are a terrible idea > and need to just go away. Let's not build them into Python's core on the > basis of a minor and fleeting performance improvement. Variable-width encodings make many oper

Re: [Python-3000] string C API

2006-09-15 Thread Jim Jewett
On 9/15/06, Josiah Carlson <[EMAIL PROTECTED]> wrote: > > "Jim Jewett" <[EMAIL PROTECTED]> wrote: > > Interning may get awkward if multiple encodings are allowed within a > > program, regardless of whether they're allowed for single strings. It > > might make sense to intern only strings that are

Re: [Python-3000] string C API

2006-09-15 Thread Josiah Carlson
"Jim Jewett" <[EMAIL PROTECTED]> wrote: > On 9/15/06, Josiah Carlson <[EMAIL PROTECTED]> wrote: > > "Jim Jewett" <[EMAIL PROTECTED]> wrote: > > > Interning may get awkward if multiple encodings are allowed within a > > > program, regardless of whether they're allowed for single strings. It > > >

Re: [Python-3000] UTF-16

2006-09-15 Thread Andrew Clover
On 2006-09-01, Paul Prescod wrote: > I cannot understand why a user should be forced to choose between 16 and 32 > bit strings AT BUILD TIME. I strongly agree. This has been troublesome for many, not just people trying to install binary libs, but also Python code that does actually need to know

Re: [Python-3000] string C API

2006-09-15 Thread Greg Ewing
Josiah Carlson wrote: > Because all text objects are internally > represented in its minimal 'encoding', equal text objects will always be > in the same encoding. That places a burden on all creators of strings to ensure that they are in the minimal format, which could be inconvenient for some ope

Re: [Python-3000] string C API

2006-09-15 Thread Nick Coghlan
Jim Jewett wrote: > On 9/15/06, Nick Coghlan <[EMAIL PROTECTED]> wrote: >> If you're reading text and you *know* it is ASCII data, then you can >> just set >> the encoding to latin-1 > > Only if latin-1 is a valid encoding for the internal implementation. I think the possible internal encodings

Re: [Python-3000] string C API

2006-09-15 Thread Nick Coghlan
Antoine Pitrou wrote: > Le vendredi 15 septembre 2006 à 10:48 -0700, Josiah Carlson a écrit : >> This is one of the reasons why I was talking Latin-1, UCS-2, and UCS-4: > > You could replace "latin-1" with "one-byte system encoding chosen at > interpreter startup depending on locale". > There are

Re: [Python-3000] string C API

2006-09-15 Thread Ronald Oussoren
On Sep 15, 2006, at 7:04 PM, Jim Jewett wrote: On 9/15/06, Nick Coghlan <[EMAIL PROTECTED]> wrote: Jim Jewett wrote: ... would be necessary to at least *scan* the string when it was first created in order to ensure it can be decoded without errors What happens today with strings? I th

Re: [Python-3000] string C API

2006-09-15 Thread Martin v. Löwis
Nick Coghlan schrieb: > That way the internal representation of a string would only need to grow > one extra field (the one saying how many bytes there are per character), > and the internal state would remain immutable. You could play tricks with ob_size to save this field: - ob_size < 0: 8-bit