Martin v. Löwis wrote:
> Nick Coghlan schrieb:
>> Only the first such call on a given string, though - the idea is to use
>> lazy decoding, not to avoid decoding altogether. Most manipulations
>> (len, indexing, slicing, concatenation, etc) would require decoding to
>> at least UCS-2 (or perhaps UC
On 9/15/06, Nick Coghlan <[EMAIL PROTECTED]> wrote:
> Martin v. Löwis wrote:
> > Nick Coghlan schrieb:
> >> Only the first such call on a given string, though - the idea is to use
> >> lazy decoding, not to avoid decoding altogether. Most manipulations
> >> (len, indexing, slicing, concatenation, e
Jim Jewett wrote:
>> > ISTM that raising the exception lazily (which seems to be necessary)
>> > would be very confusing.
>
>> Yeah, it appears it would be necessary to at least *scan* the string
>> when it
>> was first created in order to ensure it can be decoded without errors
>> later on.
>
On 9/15/06, Jim Jewett <[EMAIL PROTECTED]> wrote:
> There should be only one reference to a string until is constructed,
> and after that, its data should be immutable. Recoding that results
> in different bytes should not be in-place. Either it returns a new
> string (no problem) or it doesn't c
On 9/15/06, Jason Orendorff <[EMAIL PROTECTED]> wrote:
I'm sure this will happen to the same degree that it's become astandard recipe in Java and C# (both of which lack polymorphicwhatzits). Which is to say, not at all.I think Jason's point is key. This is probably premature optimization and shoul
On 9/15/06, Nick Coghlan <[EMAIL PROTECTED]> wrote:
> Jim Jewett wrote:
> >> ... would be necessary to at least *scan* the string when it
> >> was first created in order to ensure it can be decoded without errors
> > What happens today with strings? I think the answer is:
> > "Nothing.
> >
"Jim Jewett" <[EMAIL PROTECTED]> wrote:
> Interning may get awkward if multiple encodings are allowed within a
> program, regardless of whether they're allowed for single strings. It
> might make sense to intern only strings that are in the same encoding
> as the source code. (Or whose values ar
"Jason Orendorff" <[EMAIL PROTECTED]> wrote:
>
> On 9/15/06, Jim Jewett <[EMAIL PROTECTED]> wrote:
> > There should be only one reference to a string until is constructed,
> > and after that, its data should be immutable. Recoding that results
> > in different bytes should not be in-place. Eith
Le vendredi 15 septembre 2006 à 10:48 -0700, Josiah Carlson a écrit :
> This is one of the reasons why I was talking Latin-1, UCS-2, and UCS-4:
You could replace "latin-1" with "one-byte system encoding chosen at
interpreter startup depending on locale".
There are lots of 8-bit encodings other tha
Antoine Pitrou <[EMAIL PROTECTED]> writes:
>> This is one of the reasons why I was talking Latin-1, UCS-2, and UCS-4:
>
> You could replace "latin-1" with "one-byte system encoding chosen at
> interpreter startup depending on locale".
Latin-1 has the advantage of being trivially decodable to a se
On 9/15/06, Antoine Pitrou <[EMAIL PROTECTED]> wrote:
Le vendredi 15 septembre 2006 à 10:48 -0700, Josiah Carlson a écrit :> This is one of the reasons why I was talking Latin-1, UCS-2, and UCS-4:You could replace "latin-1" with "one-byte system encoding chosen at
interpreter startup depending on
"Paul Prescod" <[EMAIL PROTECTED]> wrote:
[snip]
> The result seems obvious to me...8-bit-fixed encodings are a terrible idea
> and need to just go away. Let's not build them into Python's core on the
> basis of a minor and fleeting performance improvement.
Variable-width encodings make many oper
On 9/15/06, Josiah Carlson <[EMAIL PROTECTED]> wrote:
>
> "Jim Jewett" <[EMAIL PROTECTED]> wrote:
> > Interning may get awkward if multiple encodings are allowed within a
> > program, regardless of whether they're allowed for single strings. It
> > might make sense to intern only strings that are
"Jim Jewett" <[EMAIL PROTECTED]> wrote:
> On 9/15/06, Josiah Carlson <[EMAIL PROTECTED]> wrote:
> > "Jim Jewett" <[EMAIL PROTECTED]> wrote:
> > > Interning may get awkward if multiple encodings are allowed within a
> > > program, regardless of whether they're allowed for single strings. It
> > >
On 2006-09-01, Paul Prescod wrote:
> I cannot understand why a user should be forced to choose between 16 and 32
> bit strings AT BUILD TIME.
I strongly agree. This has been troublesome for many, not just people
trying to install binary libs, but also Python code that does actually
need to know
Josiah Carlson wrote:
> Because all text objects are internally
> represented in its minimal 'encoding', equal text objects will always be
> in the same encoding.
That places a burden on all creators of strings to ensure
that they are in the minimal format, which could be
inconvenient for some ope
Jim Jewett wrote:
> On 9/15/06, Nick Coghlan <[EMAIL PROTECTED]> wrote:
>> If you're reading text and you *know* it is ASCII data, then you can
>> just set
>> the encoding to latin-1
>
> Only if latin-1 is a valid encoding for the internal implementation.
I think the possible internal encodings
Antoine Pitrou wrote:
> Le vendredi 15 septembre 2006 à 10:48 -0700, Josiah Carlson a écrit :
>> This is one of the reasons why I was talking Latin-1, UCS-2, and UCS-4:
>
> You could replace "latin-1" with "one-byte system encoding chosen at
> interpreter startup depending on locale".
> There are
On Sep 15, 2006, at 7:04 PM, Jim Jewett wrote:
On 9/15/06, Nick Coghlan <[EMAIL PROTECTED]> wrote:
Jim Jewett wrote:
... would be necessary to at least *scan* the string when it
was first created in order to ensure it can be decoded without
errors
What happens today with strings? I th
Nick Coghlan schrieb:
> That way the internal representation of a string would only need to grow
> one extra field (the one saying how many bytes there are per character),
> and the internal state would remain immutable.
You could play tricks with ob_size to save this field:
- ob_size < 0: 8-bit
20 matches
Mail list logo