Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread James Y Knight
On Oct 3, 2005, at 3:47 PM, Fredrik Lundh wrote: > Antoine Pitrou wrote: > > If I have an unicode string containing legal characters greater than 0x7F, and I pass it to a function which converts it to str, the conversion fails. >>> >>> so? if it does that, it's not unico

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread skip
Antoine> If an stdlib function returns an 8-bit string containing Antoine> non-ascii data, then this string used in unicode context incurs Antoine> an implicit conversion, which fails. Such strings should be converted to Unicode at the point where they enter the application. That's

Re: [Python-Dev] Unicode charmap decoders slow

2005-10-03 Thread jepler
As the OP suggests, decoding with a codec like mac-roman or iso8859-1 is very slow compared to encoding or decoding with utf-8. Here I'm working with 53k of data instead of 53 megs. (Note: this is a laptop, so it's possible that thermal or battery management features affected these numbers a bit,

Re: [Python-Dev] Proposal for 2.5: Returning values from PEP 342 enhanced generators

2005-10-03 Thread Christopher Armstrong
On 10/4/05, Piet Delport <[EMAIL PROTECTED]> wrote: > One system that could benefit from this change is Christopher Armstrong's > defgen.py[1] for Twisted, which he recently reincarnated (as newdefgen.py) to > use enhanced generators. The resulting code is much cleaner than before, and > closer to

[Python-Dev] Unicode charmap decoders slow

2005-10-03 Thread Tony Nelson
Is there a faster way to transcode from 8-bit chars (charmaps) to utf-8 than going through unicode()? I'm writing a small card-file program. As a test, I use a 53 MB MBox file, in mac-roman encoding. My program reads and parses the file into messages in about 3 to 5 seconds (Wow! Go Python!), but

[Python-Dev] Proposal for 2.5: Returning values from PEP 342 enhanced generators

2005-10-03 Thread Piet Delport
PEP 255 ("Simple Generators") closes with: > Q. Then why not allow an expression on "return" too? > > A. Perhaps we will someday. In Icon, "return expr" means both "I'm >done", and "but I have one final useful value to return too, and >this is it". At the start, and in the absence of com

Re: [Python-Dev] 64-bit bytecode compatibility (was Re: [PEAK] ez_setup on 64-bit linux problem)

2005-10-03 Thread Viren Shah
Phillip J. Eby wrote: > At 12:14 PM 9/29/2005 -0400, Viren Shah wrote: > >> File "/root/svn-install-apps/setuptools-0.6a4/pkg_resources.py", >> line 949, in _get >> return self.loader.get_data(path) >> OverflowError: signed integer is greater than maximum > > > Interesting. That looks li

Re: [Python-Dev] 64-bit bytecode compatibility (was Re: [PEAK] ez_setup on 64-bit linux problem)

2005-10-03 Thread Viren Shah
Phillip J. Eby wrote: > At 09:49 AM 9/29/2005 -0400, Viren Shah wrote: > >> [I sent this earlier without being a subscriber and it was sent to the >> moderation queue so I'm resending it after subscribing] >> >> Hi, >> I'm running a 64-bit Fedora Core 3 with python 2.3.4. I'm trying to >> inst

Re: [Python-Dev] bytes type

2005-10-03 Thread Antoine Pitrou
Le lundi 03 octobre 2005 à 17:42 -0700, Guido van Rossum a écrit : > I don't see a use case for replace. Agreed. > Alternatively, you could always specify Latin-1 as the encoding and > convert it that way -- I don't think there's any input that can cause > Latin-1 decoding to fail. You seem to

Re: [Python-Dev] bytes type

2005-10-03 Thread Guido van Rossum
This would presumaby support the (read-only part of the) buffer API so search would be covered. I don't see a use case for replace. Alternatively, you could always specify Latin-1 as the encoding and convert it that way -- I don't think there's any input that can cause Latin-1 decoding to fail.

Re: [Python-Dev] bytes type

2005-10-03 Thread Antoine Pitrou
Le lundi 03 octobre 2005 à 14:02 -0700, Guido van Rossum a écrit : > On 10/3/05, Antoine Pitrou <[EMAIL PROTECTED]> wrote: > > Could the "bytes" type be just the same as the current "str" type but > > without the implicit unicode conversion ? Or am I missing some desired > > functionality ? > > No

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread M.-A. Lemburg
Martin Blais wrote: > On 10/3/05, Antoine Pitrou <[EMAIL PROTECTED]> wrote: > If that's how things were designed, then Python's entire standard brary (not to mention third-party libraries) is not "unicode safe" - to quote your own words - since many functions may return 8-bit strings >

Re: [Python-Dev] PEP 343 and __with__

2005-10-03 Thread Phillip J. Eby
At 05:15 PM 10/3/2005 -0400, Jason Orendorff wrote: >Phillip J. Eby writes: > > You didn't offer any reasons why this would be useful and/or good. > >It makes it dramatically easier to write Python classes that correctly >support 'with'. I don't see any simple way to do this under PEP 343; >the on

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread Martin Blais
On 10/3/05, Antoine Pitrou <[EMAIL PROTECTED]> wrote: > > > > If that's how things were designed, then Python's entire standard > > > brary (not to mention third-party libraries) is not "unicode safe" - > > > to quote your own words - since many functions may return 8-bit strings > > > containing n

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread Martin v. Löwis
Antoine Pitrou wrote: > To which you apparently didn't read my answer, that is: > you can never be sure that a variable containing something which > is /semantically/ textual (*) will never contain anything other than > ASCII text. That is simply not true. There are variables that is semantically

Re: [Python-Dev] PEP 343 and __with__

2005-10-03 Thread Jason Orendorff
Phillip J. Eby writes: > You didn't offer any reasons why this would be useful and/or good. It makes it dramatically easier to write Python classes that correctly support 'with'. I don't see any simple way to do this under PEP 343; the only sane thing to do is write a separate @contextmanager gen

Re: [Python-Dev] bytes type

2005-10-03 Thread Guido van Rossum
On 10/3/05, Antoine Pitrou <[EMAIL PROTECTED]> wrote: > Could the "bytes" type be just the same as the current "str" type but > without the implicit unicode conversion ? Or am I missing some desired > functionality ? No. It will be a mutable array of bytes. It will intentionally resemble strings a

Re: [Python-Dev] bytes type

2005-10-03 Thread Antoine Pitrou
> Presumably in Python 3.0, opening a file in "text" mode will require an > encoding to be specified, and opening it in "binary" mode will cause it to > produce or consume byte arrays, not strings. This should apply to sockets > too, and really any I/O facility, including GUI frameworks, DBAPI

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread Phillip J. Eby
At 10:38 PM 10/3/2005 +0200, Antoine Pitrou wrote: >To which you apparently didn't read my answer, that is: >you can never be sure that a variable containing something which >is /semantically/ textual (*) will never contain anything other than >ASCII text. For example raw_input() won't tell you tha

Re: [Python-Dev] --disable-unicode (Tests and unicode)

2005-10-03 Thread M.-A. Lemburg
Martin v. Löwis wrote: > M.-A. Lemburg wrote: > >>Is the added complexity needed to support not having Unicode support >>compiled into Python really worth it ? > > If there are volunteers willing to maintain it, and the other volunteers > are not affected: certainly. No objections there. I only

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread Antoine Pitrou
> > If that's how things were designed, then Python's entire standard > > brary (not to mention third-party libraries) is not "unicode safe" - > > to quote your own words - since many functions may return 8-bit strings > > containing non-ascii characters. > > huh? first you talk about functions

Re: [Python-Dev] --disable-unicode (Tests and unicode)

2005-10-03 Thread Martin v. Löwis
M.-A. Lemburg wrote: > Is the added complexity needed to support not having Unicode support > compiled into Python really worth it ? If there are volunteers willing to maintain it, and the other volunteers are not affected: certainly. > I know that Martin introduced this feature a long time ago,

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread Fredrik Lundh
Antoine Pitrou wrote: > > > If I have an unicode string containing legal characters greater than > > > 0x7F, and I pass it to a function which converts it to str, the > > > conversion fails. > > > > so? if it does that, it's not unicode safe. > [...] > > what's that has to do with > > my argument

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread Antoine Pitrou
Hi, Le lundi 03 octobre 2005 à 20:37 +0200, Fredrik Lundh a écrit : > > If I have an unicode string containing legal characters greater than > > 0x7F, and I pass it to a function which converts it to str, the > > conversion fails. > > so? if it does that, it's not unicode safe. [...] > what's

Re: [Python-Dev] PEP 343 and __with__

2005-10-03 Thread Phillip J. Eby
At 07:02 PM 10/3/2005 +0100, Michael Hudson wrote: >"Phillip J. Eby" <[EMAIL PROTECTED]> writes: > > > Since the PEP is accepted and has patches for both its implementation > and a > > good part of its documentation, a major change like this would certainly > > need a better rationale. > >Though g

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread Fredrik Lundh
Antoine Pitrou wrote: > > Under the default encoding (and quite a few other encodings), that's true > > for > > plain ascii strings and Unicode strings. > > If I have an unicode string containing legal characters greater than > 0x7F, and I pass it to a function which converts it to str, the > con

Re: [Python-Dev] PEP 343 and __with__

2005-10-03 Thread Guido van Rossum
For the record, I very much want PEPs 342 and 343 implemented. I haven't had the time to look at the patch and don't expect to find the time any time soon, but it's not for lack of desire to see this feature implemented. I don't like Jason's __with__ proposal and even less like his idea to drop __

Re: [Python-Dev] PEP 343 and __with__

2005-10-03 Thread Michael Hudson
"Phillip J. Eby" <[EMAIL PROTECTED]> writes: > Since the PEP is accepted and has patches for both its implementation and a > good part of its documentation, a major change like this would certainly > need a better rationale. Though given the amount of interest said patch has attracted (none at

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread Antoine Pitrou
Hi, Josiah: > > How can you be sure that something that is /semantically textual/ will > > always remain "pure ASCII" ? That's contradictory, unless your software > > never goes out of the anglo-saxon world (and even...). > > Non-unicode text input widgets. You didn't understand my statement. I

Re: [Python-Dev] PEP 343 and __with__

2005-10-03 Thread Phillip J. Eby
At 12:37 PM 10/3/2005 -0400, Jason Orendorff wrote: >I'm -1 on PEP 343. It seems ...complex. And even with all the >complexity, I *still* won't be able to type > > with self.lock: ... > >which I submit is perfectly reasonable, clean, and clear. Which is why it's proposed to add __enter__/__e

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Fredrik Lundh
Josiah Carlson wrote: > > > and isn't pure ASCII. > > > > How can you be sure that something that is /semantically textual/ will > > always remain "pure ASCII" ? That's contradictory, unless your software > > never goes out of the anglo-saxon world (and even...). > > Non-unicode text input widgets

[Python-Dev] PEP 343 and __with__

2005-10-03 Thread Jason Orendorff
I'm -1 on PEP 343. It seems ...complex. And even with all the complexity, I *still* won't be able to type with self.lock: ... which I submit is perfectly reasonable, clean, and clear. Instead I have to type with locking(self.lock): ... where locking() is apparently either a new built

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Josiah Carlson
Antoine Pitrou <[EMAIL PROTECTED]> wrote: > > Le lundi 03 octobre 2005 à 14:59 +0200, Fredrik Lundh a écrit : > > Antoine Pitrou wrote: > > > > > A good rule of thumb is to convert to unicode everything that is > > > semantically textual > > > > and isn't pure ASCII. > > How can you be sure th

[Python-Dev] Proposal for 2.5: Returning values from PEP 342 enhanced generators

2005-10-03 Thread Piet Delport
PEP 255 ("Simple Generators") closes with: > Q. Then why not allow an expression on "return" too? > > A. Perhaps we will someday. In Icon, "return expr" means both "I'm >done", and "but I have one final useful value to return too, and >this is it". At the start, and in the absence of com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Fredrik Lundh
Jim Fulton wrote: > I would argue that it's evil to change the default encoding > in the first place, except in this case to disable implicit > encoding or decoding. absolutely. unfortunately, all attempts to add such information to the sys module documentation seem to have failed... (last time

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Jim Fulton
M.-A. Lemburg wrote: > Michael Hudson wrote: > >>Martin Blais <[EMAIL PROTECTED]> writes: >> >> >> >>>What if we could completely disable the implicit conversions between >>>unicode and str? In other words, if you would ALWAYS be forced to >>>call either .encode() or .decode() to convert between

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Jim Fulton
Martin Blais wrote: > Hi. > > Like a lot of people (or so I hear in the blogosphere...), I've been > experiencing some friction in my code with unicode conversion > problems. Even when being super extra careful with the types of str's > or unicode objects that my variables can contain, there is a

Re: [Python-Dev] Divorcing str and unicode (no moreimplicit conversions).

2005-10-03 Thread Fredrik Lundh
Antoine Pitrou wrote: > > > A good rule of thumb is to convert to unicode everything that is > > > semantically textual > > > > and isn't pure ASCII. > > How can you be sure that something that is /semantically textual/ will > always remain "pure ASCII" ? "is" != "will always remain" __

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Martin Blais
On 10/3/05, M.-A. Lemburg <[EMAIL PROTECTED]> wrote: > > > > I'm not sure it's a sensible default. > > Me neither, especially since this would make it impossible > to write polymorphic code - e.g. ', '.join(list) wouldn't > work anymore if list contains Unicode; dito for u', '.join(list) > with lis

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Antoine Pitrou
Le lundi 03 octobre 2005 à 14:59 +0200, Fredrik Lundh a écrit : > Antoine Pitrou wrote: > > > A good rule of thumb is to convert to unicode everything that is > > semantically textual > > and isn't pure ASCII. How can you be sure that something that is /semantically textual/ will always remain

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Fredrik Lundh
Antoine Pitrou wrote: > A good rule of thumb is to convert to unicode everything that is > semantically textual and isn't pure ASCII. (anyone who are tempted to argue otherwise should benchmark their applications, both speed- and memorywise, and be prepared to come up with very strong arguments

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Antoine Pitrou
Le lundi 03 octobre 2005 à 02:09 -0400, Martin Blais a écrit : > > What if we could completely disable the implicit conversions between > unicode and str? This would be very annoying when dealing with some modules or libraries where the type (str / unicode) returned by a function depends on the c

Re: [Python-Dev] --disable-unicode (Tests and unicode)

2005-10-03 Thread M.-A. Lemburg
Reinhold Birkenfeld wrote: > Martin v. Löwis wrote: >>>Whether we think it should be supported depends >>on who "we" is, as with all these minor features: some think it is >>a waste of time, some think it should be supported if reasonably >>possible, and some think this a conditio sine qua non. It

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread M.-A. Lemburg
Michael Hudson wrote: > Martin Blais <[EMAIL PROTECTED]> writes: > > >>What if we could completely disable the implicit conversions between >>unicode and str? In other words, if you would ALWAYS be forced to >>call either .encode() or .decode() to convert between one and the >>other... wouldn't

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Michael Hudson
Martin Blais <[EMAIL PROTECTED]> writes: > What if we could completely disable the implicit conversions between > unicode and str? In other words, if you would ALWAYS be forced to > call either .encode() or .decode() to convert between one and the > other... wouldn't that help a lot deal with tha

Re: [Python-Dev] Tests and unicode

2005-10-03 Thread Reinhold Birkenfeld
Martin v. Löwis wrote: > Reinhold Birkenfeld wrote: >> One problem is that no Unicode escapes can be used since compiling >> the file raises ValueErrors for them. Such strings would have to >> be produced using unichr(). > > You mean, in Unicode literals? There are various approaches, depending >