Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-31 Thread François Pinard
[Greg Ewing] >> All development is done in house by French people. All documentation, >> external or internal, comments, identifier and function names, >> everything is in French. > There's nothing stopping you from creating your own Frenchified > version of Python that lets you use all the c

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-31 Thread Greg Ewing
François Pinard wrote: > All development is done in house by French people. All documentation, > external or internal, comments, identifier and function names, > everything is in French. There's nothing stopping you from creating your own Frenchified version of Python that lets you use all the

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-31 Thread Steve Holden
Adam Olsen wrote: > On 10/30/05, François Pinard <[EMAIL PROTECTED]> wrote: > >>All development is done in house by French people. All documentation, >>external or internal, comments, identifier and function names, >>everything is in French. Some of the developers here have had a long >>programm

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-30 Thread Adam Olsen
On 10/30/05, François Pinard <[EMAIL PROTECTED]> wrote: > All development is done in house by French people. All documentation, > external or internal, comments, identifier and function names, > everything is in French. Some of the developers here have had a long > programming life, while they on

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-30 Thread François Pinard
[Martin von Löwis] > My canonical example is François Pinard, who keeps requesting it, > saying that local people where surprised they couldn't use accented > characters in Python. Perhaps that's because he actually is Quebecian > :-) I presume I should comment a bit on this. People here are

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-26 Thread Bengt Richter
At 11:43 2005-10-24 +0200, M.-A. Lemburg wrote: >Bengt Richter wrote: >> Please bear with me for a few paragraphs ;-) > >Please note that source code encoding doesn't really have >anything to do with the way the interpreter executes the >program - it's merely a way to tell the parser how to >conver

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread Neil Hodgson
M.-A. Lemburg: > You mean a slice that slices out the next ? Yes. > This sounds a lot like you'd want iterators for the various > index types. Should be possible to implement on top of the > proposed APIs, e.g. itergraphemes(u), itercodepoints(u), etc. Iterators may be helpful, but can a

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread Martin v. Löwis
Guido van Rossum wrote: > Yes but why? What does this invariant do for him? I don't know about this person, but there are a few things that don't work properly in UTF-16 mode: - the Unicode character database fails to lookup things. u"\U0001D670".isupper() gives false, but should give true

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread Guido van Rossum
On 10/25/05, Bill Janssen <[EMAIL PROTECTED]> wrote: > I think he was more interested in the invariant Martin proposed, that > > len("\U0001") > > should always be the same and should always be 1. Yes but why? What does this invariant do for him? -- --Guido van Rossum (home page: http://www.

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread Bill Janssen
I think he was more interested in the invariant Martin proposed, that len("\U0001") should always be the same and should always be 1. Bill ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubsc

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread Martin v. Löwis
Bill Janssen wrote: > I just got mail this morning from a researcher who wants exactly what > Martin described, and wondered why the default MacPython 2.4.2 didn't > provide it by default. :-) If all he wants is to represent Deseret, he can do so in a 16-bit Unicode type, too: Python supports UTF-

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread M.-A. Lemburg
Bengt Richter wrote: > At 11:43 2005-10-24 +0200, M.-A. Lemburg wrote: > >>Bengt Richter wrote: >> >>>Please bear with me for a few paragraphs ;-) >> >>Please note that source code encoding doesn't really have >>anything to do with the way the interpreter executes the >>program - it's merely a way

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread M.-A. Lemburg
Neil Hodgson wrote: > M.-A. Lemburg: > > >>Unicode has the concept of combining code points, e.g. you can >>store an "é" (e with a accent) as "e" + "'". Now if you slice >>off the accent, you'll break the character that you encoded >>using combining code points. >>... >>next_(u, index) -> int

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Bill Janssen
Guido writes: > Oh, I don't doubt that they want it. But often they don't *need* it, > and the higher-level goal they are trying to accomplish can be dealt > with better in a different way. (Sort of my response to people asking > for static typing in Python as well. :-) I suppose that's true. But

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Greg Ewing
Guido van Rossum wrote: > Python's slice-and-dice model pretty much ensures that indexing is > common. Almost everything is ultimately represented as indices: regex > search results have the index in the API, find()/index() return > indices, many operations take a start and/or end index. Maybe th

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Greg Ewing
Guido van Rossum wrote: > I think the API should reflect the representation *to some extend*, > namely it shouldn't claim to have operations that are typically > thought of as O(1) that can only be implemented as O(n). Maybe a compromise could be reached by using a btree of chunks or something, s

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Guido van Rossum
On 10/24/05, Bill Janssen <[EMAIL PROTECTED]> wrote: > > > - yet others think: "I want all of Unicode, with proper, efficient > > >indexing, so I want four bytes per char". > > > > I doubt the last one though. Probably they really don't want efficient > > indexing, they want to perform higher-l

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Bill Janssen
> > - yet others think: "I want all of Unicode, with proper, efficient > >indexing, so I want four bytes per char". > > I doubt the last one though. Probably they really don't want efficient > indexing, they want to perform higher-level operations that currently > are only possible using effic

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Neil Hodgson
M.-A. Lemburg: > Unicode has the concept of combining code points, e.g. you can > store an "é" (e with a accent) as "e" + "'". Now if you slice > off the accent, you'll break the character that you encoded > using combining code points. > ... > next_(u, index) -> integer > > Returns th

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Martin v. Löwis
Antoine Pitrou wrote: >>There are many design alternatives: > > Wouldn't it be simpler to use: > - one-byte representation if every character <= 0xFF > - two-byte representation if every character <= 0x > - four-byte representation otherwise As I said: there are many alternatives. This one ha

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Guido van Rossum
On 10/24/05, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > Guido van Rossum wrote: > > Changing the APIs would be much work, although perhaps not impossible > > of Python 3000. For example, Raymond Hettinger's partition() API > > doesn't refer to indices at all, and can replace many uses of find()

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Martin v. Löwis
Guido van Rossum wrote: > Changing the APIs would be much work, although perhaps not impossible > of Python 3000. For example, Raymond Hettinger's partition() API > doesn't refer to indices at all, and can replace many uses of find() > or index(). I think Neil's proposal is not to make them go awa

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Guido van Rossum
On 10/24/05, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > Indeed. My guess is that indexing is more common than you think, > especially when iterating over the string. Of course, iteration > could also operate on UTF-8, if you introduced string iterator > objects. Python's slice-and-dice model p

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Antoine Pitrou
> There are many design alternatives: one option would be to support > *three* internal representations in a single type, generating the > others from the one operation existing as needed. The default, initial > representation might be UTF-8, with UCS-4 only being generated when > indexing occurs,

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Martin v. Löwis
M.-A. Lemburg wrote: > There seems to be a general misunderstanding here: even if you > have UCS4 storage, it is still possible to slice a Unicode > string in a way which makes rendering it correctly. [impossible?] > Unicode has the concept of

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Martin v. Löwis
Neil Hodgson wrote: >For Windows, the code will get a little uglier, needing to perform > an allocation/encoding and deallocation more often then at present but > I don't think there will be a speed degradation as Windows is > currently performing a conversion from 8 bit to UTF-16 inside many >

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Bill Janssen
> Python should allow strings to > contain any Unicode character and should be indexable yielding > characters rather than half characters. Therefore Python strings > should appear to be UTF-32. +1. Bill ___ Python-Dev mailing list Python-Dev@python.org

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Bill Janssen
> >I'm thinking about making all character strings Unicode (possibly with > >different internal representations a la NSString in Apple's Objective > >C) and introduce a separate mutable bytes array data type. But I could > >use some validation or feedback on this idea from actual > >practitioners.

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread M.-A. Lemburg
Bengt Richter wrote: > Please bear with me for a few paragraphs ;-) Please note that source code encoding doesn't really have anything to do with the way the interpreter executes the program - it's merely a way to tell the parser how to convert string literals (currently on the Unicode ones) into

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread M.-A. Lemburg
Neil Hodgson wrote: > Guido van Rossum: > > >>Folks, please focus on what Python 3000 should do. >> >>I'm thinking about making all character strings Unicode (possibly with >>different internal representations a la NSString in Apple's Objective >>C) and introduce a separate mutable bytes array da

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Neil Hodgson
Martin v. Löwis: > That's very tricky. If you have multiple implementations, you make > usage at the C API difficult. If you make it either UTF-8 or UTF-32, > you make PythonWin difficult. If you make it UTF-16, you make indexing > difficult. For Windows, the code will get a little uglier, nee

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-23 Thread Martin v. Löwis
Phillip J. Eby wrote: > I'm tempted to say it would be even better if there was a command line > option that could be used to force all binary opens to result in bytes, and > require all text opens to specify an encoding. For Python 3000? -1. There shouldn't be command line switches that have th

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-23 Thread Martin v. Löwis
Neil Hodgson wrote: >I'd like to more tightly define Unicode strings for Python 3000. > Currently, Unicode strings may be implemented with either 2 byte > (UCS-2) or 4 byte (UTF-32) elements. Python should allow strings to > contain any Unicode character and should be indexable yielding > chara

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-23 Thread Neil Hodgson
Guido van Rossum: > Folks, please focus on what Python 3000 should do. > > I'm thinking about making all character strings Unicode (possibly with > different internal representations a la NSString in Apple's Objective > C) and introduce a separate mutable bytes array data type. But I could > use s

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-23 Thread Phillip J. Eby
At 06:06 PM 10/23/2005 -0700, Guido van Rossum wrote: >Folks, please focus on what Python 3000 should do. > >I'm thinking about making all character strings Unicode (possibly with >different internal representations a la NSString in Apple's Objective >C) and introduce a separate mutable bytes array

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-23 Thread Bob Ippolito
On Oct 23, 2005, at 6:06 PM, Guido van Rossum wrote: > Folks, please focus on what Python 3000 should do. > > I'm thinking about making all character strings Unicode (possibly with > different internal representations a la NSString in Apple's Objective > C) and introduce a separate mutable bytes a

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-23 Thread Guido van Rossum
Folks, please focus on what Python 3000 should do. I'm thinking about making all character strings Unicode (possibly with different internal representations a la NSString in Apple's Objective C) and introduce a separate mutable bytes array data type. But I could use some validation or feedback on

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-23 Thread Stephan Richter
On Sunday 23 October 2005 18:10, Jason Orendorff wrote: > -1 on keeping the source encoding of string literals.  Python should > definitely decode them at compile time. > > -1 on decoding implicitly "as needed".  This causes decoding to happen > late, in unpredictable places.  Decodes can fail; the

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-23 Thread Bob Ippolito
On Oct 23, 2005, at 3:10 PM, Jason Orendorff wrote: > -1 on decoding implicitly "as needed". This causes decoding to happen > late, in unpredictable places. Decodes can fail; they should happen > as early and as close to the data source as possible. That's not necessarily true... Some codecs c

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-23 Thread Jason Orendorff
-1 on keeping the source encoding of string literals. Python should definitely decode them at compile time. -1 on decoding implicitly "as needed". This causes decoding to happen late, in unpredictable places. Decodes can fail; they should happen as early and as close to the data source as possi

[Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-22 Thread Bengt Richter
Please bear with me for a few paragraphs ;-) One aspect of str-type strings is the efficiency afforded when all the encoding really is ascii. If the internal encoding were e.g. fixed utf-16le for strings, maybe with today's computers it would still be efficient enough for most actual string purp

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-16 Thread Martin v. Löwis
Martin Blais wrote: >>Yes. setdefaultencoding() is removed from sys by site.py. To get it >>again you must reload sys. > > > Thanks. Actually, I should take the opportunity to advise people that setdefaultencoding doesn't really work. With the default default encoding, strings and Unicode object

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-16 Thread Martin Blais
On 10/15/05, Reinhold Birkenfeld <[EMAIL PROTECTED]> wrote: > Martin Blais wrote: > > On 10/3/05, Michael Hudson <[EMAIL PROTECTED]> wrote: > >> Martin Blais <[EMAIL PROTECTED]> writes: > >> > >> > How hard would that be to implement? > >> > >> import sys > >> reload(sys) > >> sys.setdefaultencodin

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-15 Thread Reinhold Birkenfeld
Martin Blais wrote: > On 10/3/05, Michael Hudson <[EMAIL PROTECTED]> wrote: >> Martin Blais <[EMAIL PROTECTED]> writes: >> >> > How hard would that be to implement? >> >> import sys >> reload(sys) >> sys.setdefaultencoding('undefined') > > Hmmm any particular reason for the call to reload() here?

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-14 Thread Martin Blais
On 10/3/05, Michael Hudson <[EMAIL PROTECTED]> wrote: > Martin Blais <[EMAIL PROTECTED]> writes: > > > How hard would that be to implement? > > import sys > reload(sys) > sys.setdefaultencoding('undefined') Hmmm any particular reason for the call to reload() here? _

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Fredrik Lundh
Josiah Carlson wrote: > > > and isn't pure ASCII. > > > > How can you be sure that something that is /semantically textual/ will > > always remain "pure ASCII" ? That's contradictory, unless your software > > never goes out of the anglo-saxon world (and even...). > > Non-unicode text input widgets

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Josiah Carlson
Antoine Pitrou <[EMAIL PROTECTED]> wrote: > > Le lundi 03 octobre 2005 à 14:59 +0200, Fredrik Lundh a écrit : > > Antoine Pitrou wrote: > > > > > A good rule of thumb is to convert to unicode everything that is > > > semantically textual > > > > and isn't pure ASCII. > > How can you be sure th

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Fredrik Lundh
Jim Fulton wrote: > I would argue that it's evil to change the default encoding > in the first place, except in this case to disable implicit > encoding or decoding. absolutely. unfortunately, all attempts to add such information to the sys module documentation seem to have failed... (last time

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Jim Fulton
M.-A. Lemburg wrote: > Michael Hudson wrote: > >>Martin Blais <[EMAIL PROTECTED]> writes: >> >> >> >>>What if we could completely disable the implicit conversions between >>>unicode and str? In other words, if you would ALWAYS be forced to >>>call either .encode() or .decode() to convert between

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Jim Fulton
Martin Blais wrote: > Hi. > > Like a lot of people (or so I hear in the blogosphere...), I've been > experiencing some friction in my code with unicode conversion > problems. Even when being super extra careful with the types of str's > or unicode objects that my variables can contain, there is a

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Martin Blais
On 10/3/05, M.-A. Lemburg <[EMAIL PROTECTED]> wrote: > > > > I'm not sure it's a sensible default. > > Me neither, especially since this would make it impossible > to write polymorphic code - e.g. ', '.join(list) wouldn't > work anymore if list contains Unicode; dito for u', '.join(list) > with lis

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Antoine Pitrou
Le lundi 03 octobre 2005 à 14:59 +0200, Fredrik Lundh a écrit : > Antoine Pitrou wrote: > > > A good rule of thumb is to convert to unicode everything that is > > semantically textual > > and isn't pure ASCII. How can you be sure that something that is /semantically textual/ will always remain

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Fredrik Lundh
Antoine Pitrou wrote: > A good rule of thumb is to convert to unicode everything that is > semantically textual and isn't pure ASCII. (anyone who are tempted to argue otherwise should benchmark their applications, both speed- and memorywise, and be prepared to come up with very strong arguments

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Antoine Pitrou
Le lundi 03 octobre 2005 à 02:09 -0400, Martin Blais a écrit : > > What if we could completely disable the implicit conversions between > unicode and str? This would be very annoying when dealing with some modules or libraries where the type (str / unicode) returned by a function depends on the c

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread M.-A. Lemburg
Michael Hudson wrote: > Martin Blais <[EMAIL PROTECTED]> writes: > > >>What if we could completely disable the implicit conversions between >>unicode and str? In other words, if you would ALWAYS be forced to >>call either .encode() or .decode() to convert between one and the >>other... wouldn't

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Michael Hudson
Martin Blais <[EMAIL PROTECTED]> writes: > What if we could completely disable the implicit conversions between > unicode and str? In other words, if you would ALWAYS be forced to > call either .encode() or .decode() to convert between one and the > other... wouldn't that help a lot deal with tha

[Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-02 Thread Martin Blais
Hi. Like a lot of people (or so I hear in the blogosphere...), I've been experiencing some friction in my code with unicode conversion problems. Even when being super extra careful with the types of str's or unicode objects that my variables can contain, there is always some case or oversight whe