[Greg Ewing]
>> All development is done in house by French people. All documentation,
>> external or internal, comments, identifier and function names,
>> everything is in French.
> There's nothing stopping you from creating your own Frenchified
> version of Python that lets you use all the c
François Pinard wrote:
> All development is done in house by French people. All documentation,
> external or internal, comments, identifier and function names,
> everything is in French.
There's nothing stopping you from creating your own
Frenchified version of Python that lets you use all
the
Adam Olsen wrote:
> On 10/30/05, François Pinard <[EMAIL PROTECTED]> wrote:
>
>>All development is done in house by French people. All documentation,
>>external or internal, comments, identifier and function names,
>>everything is in French. Some of the developers here have had a long
>>programm
On 10/30/05, François Pinard <[EMAIL PROTECTED]> wrote:
> All development is done in house by French people. All documentation,
> external or internal, comments, identifier and function names,
> everything is in French. Some of the developers here have had a long
> programming life, while they on
[Martin von Löwis]
> My canonical example is François Pinard, who keeps requesting it,
> saying that local people where surprised they couldn't use accented
> characters in Python. Perhaps that's because he actually is Quebecian
> :-)
I presume I should comment a bit on this.
People here are
At 11:43 2005-10-24 +0200, M.-A. Lemburg wrote:
>Bengt Richter wrote:
>> Please bear with me for a few paragraphs ;-)
>
>Please note that source code encoding doesn't really have
>anything to do with the way the interpreter executes the
>program - it's merely a way to tell the parser how to
>conver
M.-A. Lemburg:
> You mean a slice that slices out the next ?
Yes.
> This sounds a lot like you'd want iterators for the various
> index types. Should be possible to implement on top of the
> proposed APIs, e.g. itergraphemes(u), itercodepoints(u), etc.
Iterators may be helpful, but can a
Guido van Rossum wrote:
> Yes but why? What does this invariant do for him?
I don't know about this person, but there are a few things that
don't work properly in UTF-16 mode:
- the Unicode character database fails to lookup things.
u"\U0001D670".isupper() gives false, but should give true
On 10/25/05, Bill Janssen <[EMAIL PROTECTED]> wrote:
> I think he was more interested in the invariant Martin proposed, that
>
> len("\U0001")
>
> should always be the same and should always be 1.
Yes but why? What does this invariant do for him?
--
--Guido van Rossum (home page: http://www.
I think he was more interested in the invariant Martin proposed, that
len("\U0001")
should always be the same and should always be 1.
Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubsc
Bill Janssen wrote:
> I just got mail this morning from a researcher who wants exactly what
> Martin described, and wondered why the default MacPython 2.4.2 didn't
> provide it by default. :-)
If all he wants is to represent Deseret, he can do so in a 16-bit
Unicode type, too: Python supports UTF-
Bengt Richter wrote:
> At 11:43 2005-10-24 +0200, M.-A. Lemburg wrote:
>
>>Bengt Richter wrote:
>>
>>>Please bear with me for a few paragraphs ;-)
>>
>>Please note that source code encoding doesn't really have
>>anything to do with the way the interpreter executes the
>>program - it's merely a way
Neil Hodgson wrote:
> M.-A. Lemburg:
>
>
>>Unicode has the concept of combining code points, e.g. you can
>>store an "é" (e with a accent) as "e" + "'". Now if you slice
>>off the accent, you'll break the character that you encoded
>>using combining code points.
>>...
>>next_(u, index) -> int
Guido writes:
> Oh, I don't doubt that they want it. But often they don't *need* it,
> and the higher-level goal they are trying to accomplish can be dealt
> with better in a different way. (Sort of my response to people asking
> for static typing in Python as well. :-)
I suppose that's true. But
Guido van Rossum wrote:
> Python's slice-and-dice model pretty much ensures that indexing is
> common. Almost everything is ultimately represented as indices: regex
> search results have the index in the API, find()/index() return
> indices, many operations take a start and/or end index.
Maybe th
Guido van Rossum wrote:
> I think the API should reflect the representation *to some extend*,
> namely it shouldn't claim to have operations that are typically
> thought of as O(1) that can only be implemented as O(n).
Maybe a compromise could be reached by using a
btree of chunks or something, s
On 10/24/05, Bill Janssen <[EMAIL PROTECTED]> wrote:
> > > - yet others think: "I want all of Unicode, with proper, efficient
> > >indexing, so I want four bytes per char".
> >
> > I doubt the last one though. Probably they really don't want efficient
> > indexing, they want to perform higher-l
> > - yet others think: "I want all of Unicode, with proper, efficient
> >indexing, so I want four bytes per char".
>
> I doubt the last one though. Probably they really don't want efficient
> indexing, they want to perform higher-level operations that currently
> are only possible using effic
M.-A. Lemburg:
> Unicode has the concept of combining code points, e.g. you can
> store an "é" (e with a accent) as "e" + "'". Now if you slice
> off the accent, you'll break the character that you encoded
> using combining code points.
> ...
> next_(u, index) -> integer
>
> Returns th
Antoine Pitrou wrote:
>>There are many design alternatives:
>
> Wouldn't it be simpler to use:
> - one-byte representation if every character <= 0xFF
> - two-byte representation if every character <= 0x
> - four-byte representation otherwise
As I said: there are many alternatives. This one ha
On 10/24/05, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Guido van Rossum wrote:
> > Changing the APIs would be much work, although perhaps not impossible
> > of Python 3000. For example, Raymond Hettinger's partition() API
> > doesn't refer to indices at all, and can replace many uses of find()
Guido van Rossum wrote:
> Changing the APIs would be much work, although perhaps not impossible
> of Python 3000. For example, Raymond Hettinger's partition() API
> doesn't refer to indices at all, and can replace many uses of find()
> or index().
I think Neil's proposal is not to make them go awa
On 10/24/05, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Indeed. My guess is that indexing is more common than you think,
> especially when iterating over the string. Of course, iteration
> could also operate on UTF-8, if you introduced string iterator
> objects.
Python's slice-and-dice model p
> There are many design alternatives: one option would be to support
> *three* internal representations in a single type, generating the
> others from the one operation existing as needed. The default, initial
> representation might be UTF-8, with UCS-4 only being generated when
> indexing occurs,
M.-A. Lemburg wrote:
> There seems to be a general misunderstanding here: even if you
> have UCS4 storage, it is still possible to slice a Unicode
> string in a way which makes rendering it correctly.
[impossible?]
> Unicode has the concept of
Neil Hodgson wrote:
>For Windows, the code will get a little uglier, needing to perform
> an allocation/encoding and deallocation more often then at present but
> I don't think there will be a speed degradation as Windows is
> currently performing a conversion from 8 bit to UTF-16 inside many
>
> Python should allow strings to
> contain any Unicode character and should be indexable yielding
> characters rather than half characters. Therefore Python strings
> should appear to be UTF-32.
+1.
Bill
___
Python-Dev mailing list
Python-Dev@python.org
> >I'm thinking about making all character strings Unicode (possibly with
> >different internal representations a la NSString in Apple's Objective
> >C) and introduce a separate mutable bytes array data type. But I could
> >use some validation or feedback on this idea from actual
> >practitioners.
Bengt Richter wrote:
> Please bear with me for a few paragraphs ;-)
Please note that source code encoding doesn't really have
anything to do with the way the interpreter executes the
program - it's merely a way to tell the parser how to
convert string literals (currently on the Unicode ones)
into
Neil Hodgson wrote:
> Guido van Rossum:
>
>
>>Folks, please focus on what Python 3000 should do.
>>
>>I'm thinking about making all character strings Unicode (possibly with
>>different internal representations a la NSString in Apple's Objective
>>C) and introduce a separate mutable bytes array da
Martin v. Löwis:
> That's very tricky. If you have multiple implementations, you make
> usage at the C API difficult. If you make it either UTF-8 or UTF-32,
> you make PythonWin difficult. If you make it UTF-16, you make indexing
> difficult.
For Windows, the code will get a little uglier, nee
Phillip J. Eby wrote:
> I'm tempted to say it would be even better if there was a command line
> option that could be used to force all binary opens to result in bytes, and
> require all text opens to specify an encoding.
For Python 3000? -1. There shouldn't be command line switches that have
th
Neil Hodgson wrote:
>I'd like to more tightly define Unicode strings for Python 3000.
> Currently, Unicode strings may be implemented with either 2 byte
> (UCS-2) or 4 byte (UTF-32) elements. Python should allow strings to
> contain any Unicode character and should be indexable yielding
> chara
Guido van Rossum:
> Folks, please focus on what Python 3000 should do.
>
> I'm thinking about making all character strings Unicode (possibly with
> different internal representations a la NSString in Apple's Objective
> C) and introduce a separate mutable bytes array data type. But I could
> use s
At 06:06 PM 10/23/2005 -0700, Guido van Rossum wrote:
>Folks, please focus on what Python 3000 should do.
>
>I'm thinking about making all character strings Unicode (possibly with
>different internal representations a la NSString in Apple's Objective
>C) and introduce a separate mutable bytes array
On Oct 23, 2005, at 6:06 PM, Guido van Rossum wrote:
> Folks, please focus on what Python 3000 should do.
>
> I'm thinking about making all character strings Unicode (possibly with
> different internal representations a la NSString in Apple's Objective
> C) and introduce a separate mutable bytes a
Folks, please focus on what Python 3000 should do.
I'm thinking about making all character strings Unicode (possibly with
different internal representations a la NSString in Apple's Objective
C) and introduce a separate mutable bytes array data type. But I could
use some validation or feedback on
On Sunday 23 October 2005 18:10, Jason Orendorff wrote:
> -1 on keeping the source encoding of string literals. Python should
> definitely decode them at compile time.
>
> -1 on decoding implicitly "as needed". This causes decoding to happen
> late, in unpredictable places. Decodes can fail; the
On Oct 23, 2005, at 3:10 PM, Jason Orendorff wrote:
> -1 on decoding implicitly "as needed". This causes decoding to happen
> late, in unpredictable places. Decodes can fail; they should happen
> as early and as close to the data source as possible.
That's not necessarily true... Some codecs c
-1 on keeping the source encoding of string literals. Python should
definitely decode them at compile time.
-1 on decoding implicitly "as needed". This causes decoding to happen
late, in unpredictable places. Decodes can fail; they should happen
as early and as close to the data source as possi
Please bear with me for a few paragraphs ;-)
One aspect of str-type strings is the efficiency afforded when all the encoding
really
is ascii. If the internal encoding were e.g. fixed utf-16le for strings, maybe
with today's
computers it would still be efficient enough for most actual string purp
Martin Blais wrote:
>>Yes. setdefaultencoding() is removed from sys by site.py. To get it
>>again you must reload sys.
>
>
> Thanks.
Actually, I should take the opportunity to advise people that
setdefaultencoding doesn't really work. With the default default
encoding, strings and Unicode object
On 10/15/05, Reinhold Birkenfeld <[EMAIL PROTECTED]> wrote:
> Martin Blais wrote:
> > On 10/3/05, Michael Hudson <[EMAIL PROTECTED]> wrote:
> >> Martin Blais <[EMAIL PROTECTED]> writes:
> >>
> >> > How hard would that be to implement?
> >>
> >> import sys
> >> reload(sys)
> >> sys.setdefaultencodin
Martin Blais wrote:
> On 10/3/05, Michael Hudson <[EMAIL PROTECTED]> wrote:
>> Martin Blais <[EMAIL PROTECTED]> writes:
>>
>> > How hard would that be to implement?
>>
>> import sys
>> reload(sys)
>> sys.setdefaultencoding('undefined')
>
> Hmmm any particular reason for the call to reload() here?
On 10/3/05, Michael Hudson <[EMAIL PROTECTED]> wrote:
> Martin Blais <[EMAIL PROTECTED]> writes:
>
> > How hard would that be to implement?
>
> import sys
> reload(sys)
> sys.setdefaultencoding('undefined')
Hmmm any particular reason for the call to reload() here?
_
Josiah Carlson wrote:
> > > and isn't pure ASCII.
> >
> > How can you be sure that something that is /semantically textual/ will
> > always remain "pure ASCII" ? That's contradictory, unless your software
> > never goes out of the anglo-saxon world (and even...).
>
> Non-unicode text input widgets
Antoine Pitrou <[EMAIL PROTECTED]> wrote:
>
> Le lundi 03 octobre 2005 à 14:59 +0200, Fredrik Lundh a écrit :
> > Antoine Pitrou wrote:
> >
> > > A good rule of thumb is to convert to unicode everything that is
> > > semantically textual
> >
> > and isn't pure ASCII.
>
> How can you be sure th
Jim Fulton wrote:
> I would argue that it's evil to change the default encoding
> in the first place, except in this case to disable implicit
> encoding or decoding.
absolutely. unfortunately, all attempts to add such information to the
sys module documentation seem to have failed...
(last time
M.-A. Lemburg wrote:
> Michael Hudson wrote:
>
>>Martin Blais <[EMAIL PROTECTED]> writes:
>>
>>
>>
>>>What if we could completely disable the implicit conversions between
>>>unicode and str? In other words, if you would ALWAYS be forced to
>>>call either .encode() or .decode() to convert between
Martin Blais wrote:
> Hi.
>
> Like a lot of people (or so I hear in the blogosphere...), I've been
> experiencing some friction in my code with unicode conversion
> problems. Even when being super extra careful with the types of str's
> or unicode objects that my variables can contain, there is a
On 10/3/05, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> >
> > I'm not sure it's a sensible default.
>
> Me neither, especially since this would make it impossible
> to write polymorphic code - e.g. ', '.join(list) wouldn't
> work anymore if list contains Unicode; dito for u', '.join(list)
> with lis
Le lundi 03 octobre 2005 à 14:59 +0200, Fredrik Lundh a écrit :
> Antoine Pitrou wrote:
>
> > A good rule of thumb is to convert to unicode everything that is
> > semantically textual
>
> and isn't pure ASCII.
How can you be sure that something that is /semantically textual/ will
always remain
Antoine Pitrou wrote:
> A good rule of thumb is to convert to unicode everything that is
> semantically textual
and isn't pure ASCII.
(anyone who are tempted to argue otherwise should benchmark their
applications, both speed- and memorywise, and be prepared to come
up with very strong arguments
Le lundi 03 octobre 2005 à 02:09 -0400, Martin Blais a écrit :
>
> What if we could completely disable the implicit conversions between
> unicode and str?
This would be very annoying when dealing with some modules or libraries
where the type (str / unicode) returned by a function depends on the
c
Michael Hudson wrote:
> Martin Blais <[EMAIL PROTECTED]> writes:
>
>
>>What if we could completely disable the implicit conversions between
>>unicode and str? In other words, if you would ALWAYS be forced to
>>call either .encode() or .decode() to convert between one and the
>>other... wouldn't
Martin Blais <[EMAIL PROTECTED]> writes:
> What if we could completely disable the implicit conversions between
> unicode and str? In other words, if you would ALWAYS be forced to
> call either .encode() or .decode() to convert between one and the
> other... wouldn't that help a lot deal with tha
Hi.
Like a lot of people (or so I hear in the blogosphere...), I've been
experiencing some friction in my code with unicode conversion
problems. Even when being super extra careful with the types of str's
or unicode objects that my variables can contain, there is always some
case or oversight whe
57 matches
Mail list logo