Re: [Python-3000] string C API

2006-09-16 Thread Josiah Carlson
Greg Ewing <[EMAIL PROTECTED]> wrote: > > Josiah Carlson wrote: > > Because all text objects are internally > > represented in its minimal 'encoding', equal text objects will always be > > in the same encoding. > > That places a burden on all creators of strings to ensure > that they are in the

Re: [Python-3000] string C API

2006-09-16 Thread Marcin 'Qrczak' Kowalczyk
Greg Ewing <[EMAIL PROTECTED]> writes: > That places a burden on all creators of strings to ensure > that they are in the minimal format, which could be > inconvenient for some operations, e.g. taking a substring > could require making an extra pass to re-code the data. Yes, but taking a substrin

Re: [Python-3000] string C API

2006-09-16 Thread Marcin 'Qrczak' Kowalczyk
"Martin v. Löwis" <[EMAIL PROTECTED]> writes: > You could play tricks with ob_size to save this field: > > - ob_size < 0: 8-bit data; length is abs(ob_size) > - ob_size > 0, (ob_size & 1)==0: 16-bit data, length is ob_size/2 > - ob_size > 0, (ob_size & 1)==1: 32-bit data, length is ob_size/2 I wo

Re: [Python-3000] string C API

2006-09-16 Thread Martin v. Löwis
Josiah Carlson schrieb: >> That places a burden on all creators of strings to ensure >> that they are in the minimal format, which could be >> inconvenient for some operations, e.g. taking a substring >> could require making an extra pass to re-code the data. > > If Martin says it's not a big deal

Re: [Python-3000] string C API

2006-09-16 Thread Martin v. Löwis
Nick Coghlan schrieb: > The choice of latin-1 is deliberate and non-arbitrary. The reason for the > choice is that the ordinals 0-255 in latin-1 map to the Unicode code points > 0-255: That's true, but that this makes a good choice for a special case doesn't follow. Instead, frequency of occurre

Re: [Python-3000] string C API

2006-09-16 Thread Martin v. Löwis
Marcin 'Qrczak' Kowalczyk schrieb: >> You could play tricks with ob_size to save this field: >> >> - ob_size < 0: 8-bit data; length is abs(ob_size) >> - ob_size > 0, (ob_size & 1)==0: 16-bit data, length is ob_size/2 >> - ob_size > 0, (ob_size & 1)==1: 32-bit data, length is ob_size/2 > > I wonde

Re: [Python-3000] string C API

2006-09-16 Thread Nick Coghlan
Martin v. Löwis wrote: > Nick Coghlan schrieb: >> The choice of latin-1 is deliberate and non-arbitrary. The reason for the >> choice is that the ordinals 0-255 in latin-1 map to the Unicode code points >> 0-255: > > That's true, but that this makes a good choice for a special case > doesn't fol

Re: [Python-3000] string C API

2006-09-16 Thread Martin v. Löwis
Nick Coghlan schrieb: > If an 8-bit encoding other than latin-1 is used for the internal buffer, > then every comparison operation would have to decode the string to > Unicode in order to compare code points. > > It seems much simpler to me to ensure that what is stored internally is > *always* th

Re: [Python-3000] string C API

2006-09-16 Thread Josiah Carlson
"Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > > Nick Coghlan schrieb: > > If an 8-bit encoding other than latin-1 is used for the internal buffer, > > then every comparison operation would have to decode the string to > > Unicode in order to compare code points. > > > > It seems much simpler to

Re: [Python-3000] string C API

2006-09-16 Thread Marcin 'Qrczak' Kowalczyk
"Martin v. Löwis" <[EMAIL PROTECTED]> writes: > Just try implementing comparison some time. You can end up implementing > the same algorithm six times at least, once for each pair (1,1), (1,2), > (1,4), (2,2), (2,4), (4,4). If the algorithm isn't symmetric (i.e. > you can't reduce (2,1) to (1,2)),

Re: [Python-3000] string C API

2006-09-16 Thread Greg Ewing
Martin v. Löwis wrote: > Just try implementing comparison some time. You can end up implementing > the same algorithm six times at least, once for each pair (1,1), (1,2), > (1,4), (2,2), (2,4), (4,4). #define UnicodeStringComparisonFunction(TYPE1, TYPE2) \ /* code to implement it here */ Unico