On Oct 3, 2005, at 3:47 PM, Fredrik Lundh wrote:
> Antoine Pitrou wrote:
>
>
If I have an unicode string containing legal characters greater
than
0x7F, and I pass it to a function which converts it to str, the
conversion fails.
>>>
>>> so? if it does that, it's not unico
Antoine> If an stdlib function returns an 8-bit string containing
Antoine> non-ascii data, then this string used in unicode context incurs
Antoine> an implicit conversion, which fails.
Such strings should be converted to Unicode at the point where they enter
the application. That's
As the OP suggests, decoding with a codec like mac-roman or iso8859-1 is very
slow compared to encoding or decoding with utf-8. Here I'm working with 53k of
data instead of 53 megs. (Note: this is a laptop, so it's possible that
thermal or battery management features affected these numbers a bit,
On 10/4/05, Piet Delport <[EMAIL PROTECTED]> wrote:
> One system that could benefit from this change is Christopher Armstrong's
> defgen.py[1] for Twisted, which he recently reincarnated (as newdefgen.py) to
> use enhanced generators. The resulting code is much cleaner than before, and
> closer to
Is there a faster way to transcode from 8-bit chars (charmaps) to utf-8
than going through unicode()?
I'm writing a small card-file program. As a test, I use a 53 MB MBox file,
in mac-roman encoding. My program reads and parses the file into messages
in about 3 to 5 seconds (Wow! Go Python!), but
PEP 255 ("Simple Generators") closes with:
> Q. Then why not allow an expression on "return" too?
>
> A. Perhaps we will someday. In Icon, "return expr" means both "I'm
>done", and "but I have one final useful value to return too, and
>this is it". At the start, and in the absence of com
Phillip J. Eby wrote:
> At 12:14 PM 9/29/2005 -0400, Viren Shah wrote:
>
>> File "/root/svn-install-apps/setuptools-0.6a4/pkg_resources.py",
>> line 949, in _get
>> return self.loader.get_data(path)
>> OverflowError: signed integer is greater than maximum
>
>
> Interesting. That looks li
Phillip J. Eby wrote:
> At 09:49 AM 9/29/2005 -0400, Viren Shah wrote:
>
>> [I sent this earlier without being a subscriber and it was sent to the
>> moderation queue so I'm resending it after subscribing]
>>
>> Hi,
>> I'm running a 64-bit Fedora Core 3 with python 2.3.4. I'm trying to
>> inst
Le lundi 03 octobre 2005 à 17:42 -0700, Guido van Rossum a écrit :
> I don't see a use case for replace.
Agreed.
> Alternatively, you could always specify Latin-1 as the encoding and
> convert it that way -- I don't think there's any input that can cause
> Latin-1 decoding to fail.
You seem to
This would presumaby support the (read-only part of the) buffer API so
search would be covered.
I don't see a use case for replace.
Alternatively, you could always specify Latin-1 as the encoding and
convert it that way -- I don't think there's any input that can cause
Latin-1 decoding to fail.
Le lundi 03 octobre 2005 à 14:02 -0700, Guido van Rossum a écrit :
> On 10/3/05, Antoine Pitrou <[EMAIL PROTECTED]> wrote:
> > Could the "bytes" type be just the same as the current "str" type but
> > without the implicit unicode conversion ? Or am I missing some desired
> > functionality ?
>
> No
Martin Blais wrote:
> On 10/3/05, Antoine Pitrou <[EMAIL PROTECTED]> wrote:
>
If that's how things were designed, then Python's entire standard
brary (not to mention third-party libraries) is not "unicode safe" -
to quote your own words - since many functions may return 8-bit strings
>
At 05:15 PM 10/3/2005 -0400, Jason Orendorff wrote:
>Phillip J. Eby writes:
> > You didn't offer any reasons why this would be useful and/or good.
>
>It makes it dramatically easier to write Python classes that correctly
>support 'with'. I don't see any simple way to do this under PEP 343;
>the on
On 10/3/05, Antoine Pitrou <[EMAIL PROTECTED]> wrote:
>
> > > If that's how things were designed, then Python's entire standard
> > > brary (not to mention third-party libraries) is not "unicode safe" -
> > > to quote your own words - since many functions may return 8-bit strings
> > > containing n
Antoine Pitrou wrote:
> To which you apparently didn't read my answer, that is:
> you can never be sure that a variable containing something which
> is /semantically/ textual (*) will never contain anything other than
> ASCII text.
That is simply not true. There are variables that is semantically
Phillip J. Eby writes:
> You didn't offer any reasons why this would be useful and/or good.
It makes it dramatically easier to write Python classes that correctly
support 'with'. I don't see any simple way to do this under PEP 343;
the only sane thing to do is write a separate @contextmanager
gen
On 10/3/05, Antoine Pitrou <[EMAIL PROTECTED]> wrote:
> Could the "bytes" type be just the same as the current "str" type but
> without the implicit unicode conversion ? Or am I missing some desired
> functionality ?
No. It will be a mutable array of bytes. It will intentionally
resemble strings a
> Presumably in Python 3.0, opening a file in "text" mode will require an
> encoding to be specified, and opening it in "binary" mode will cause it to
> produce or consume byte arrays, not strings. This should apply to sockets
> too, and really any I/O facility, including GUI frameworks, DBAPI
At 10:38 PM 10/3/2005 +0200, Antoine Pitrou wrote:
>To which you apparently didn't read my answer, that is:
>you can never be sure that a variable containing something which
>is /semantically/ textual (*) will never contain anything other than
>ASCII text. For example raw_input() won't tell you tha
Martin v. Löwis wrote:
> M.-A. Lemburg wrote:
>
>>Is the added complexity needed to support not having Unicode support
>>compiled into Python really worth it ?
>
> If there are volunteers willing to maintain it, and the other volunteers
> are not affected: certainly.
No objections there. I only
> > If that's how things were designed, then Python's entire standard
> > brary (not to mention third-party libraries) is not "unicode safe" -
> > to quote your own words - since many functions may return 8-bit strings
> > containing non-ascii characters.
>
> huh? first you talk about functions
M.-A. Lemburg wrote:
> Is the added complexity needed to support not having Unicode support
> compiled into Python really worth it ?
If there are volunteers willing to maintain it, and the other volunteers
are not affected: certainly.
> I know that Martin introduced this feature a long time ago,
Antoine Pitrou wrote:
> > > If I have an unicode string containing legal characters greater than
> > > 0x7F, and I pass it to a function which converts it to str, the
> > > conversion fails.
> >
> > so? if it does that, it's not unicode safe.
> [...]
> > what's that has to do with
> > my argument
Hi,
Le lundi 03 octobre 2005 à 20:37 +0200, Fredrik Lundh a écrit :
> > If I have an unicode string containing legal characters greater than
> > 0x7F, and I pass it to a function which converts it to str, the
> > conversion fails.
>
> so? if it does that, it's not unicode safe.
[...]
> what's
At 07:02 PM 10/3/2005 +0100, Michael Hudson wrote:
>"Phillip J. Eby" <[EMAIL PROTECTED]> writes:
>
> > Since the PEP is accepted and has patches for both its implementation
> and a
> > good part of its documentation, a major change like this would certainly
> > need a better rationale.
>
>Though g
Antoine Pitrou wrote:
> > Under the default encoding (and quite a few other encodings), that's true
> > for
> > plain ascii strings and Unicode strings.
>
> If I have an unicode string containing legal characters greater than
> 0x7F, and I pass it to a function which converts it to str, the
> con
For the record, I very much want PEPs 342 and 343 implemented. I
haven't had the time to look at the patch and don't expect to find the
time any time soon, but it's not for lack of desire to see this
feature implemented.
I don't like Jason's __with__ proposal and even less like his idea to
drop __
"Phillip J. Eby" <[EMAIL PROTECTED]> writes:
> Since the PEP is accepted and has patches for both its implementation and a
> good part of its documentation, a major change like this would certainly
> need a better rationale.
Though given the amount of interest said patch has attracted (none at
Hi,
Josiah:
> > How can you be sure that something that is /semantically textual/ will
> > always remain "pure ASCII" ? That's contradictory, unless your software
> > never goes out of the anglo-saxon world (and even...).
>
> Non-unicode text input widgets.
You didn't understand my statement.
I
At 12:37 PM 10/3/2005 -0400, Jason Orendorff wrote:
>I'm -1 on PEP 343. It seems ...complex. And even with all the
>complexity, I *still* won't be able to type
>
> with self.lock: ...
>
>which I submit is perfectly reasonable, clean, and clear.
Which is why it's proposed to add __enter__/__e
Josiah Carlson wrote:
> > > and isn't pure ASCII.
> >
> > How can you be sure that something that is /semantically textual/ will
> > always remain "pure ASCII" ? That's contradictory, unless your software
> > never goes out of the anglo-saxon world (and even...).
>
> Non-unicode text input widgets
I'm -1 on PEP 343. It seems ...complex. And even with all the
complexity, I *still* won't be able to type
with self.lock: ...
which I submit is perfectly reasonable, clean, and clear. Instead I
have to type
with locking(self.lock): ...
where locking() is apparently either a new built
Antoine Pitrou <[EMAIL PROTECTED]> wrote:
>
> Le lundi 03 octobre 2005 à 14:59 +0200, Fredrik Lundh a écrit :
> > Antoine Pitrou wrote:
> >
> > > A good rule of thumb is to convert to unicode everything that is
> > > semantically textual
> >
> > and isn't pure ASCII.
>
> How can you be sure th
PEP 255 ("Simple Generators") closes with:
> Q. Then why not allow an expression on "return" too?
>
> A. Perhaps we will someday. In Icon, "return expr" means both "I'm
>done", and "but I have one final useful value to return too, and
>this is it". At the start, and in the absence of com
Jim Fulton wrote:
> I would argue that it's evil to change the default encoding
> in the first place, except in this case to disable implicit
> encoding or decoding.
absolutely. unfortunately, all attempts to add such information to the
sys module documentation seem to have failed...
(last time
M.-A. Lemburg wrote:
> Michael Hudson wrote:
>
>>Martin Blais <[EMAIL PROTECTED]> writes:
>>
>>
>>
>>>What if we could completely disable the implicit conversions between
>>>unicode and str? In other words, if you would ALWAYS be forced to
>>>call either .encode() or .decode() to convert between
Martin Blais wrote:
> Hi.
>
> Like a lot of people (or so I hear in the blogosphere...), I've been
> experiencing some friction in my code with unicode conversion
> problems. Even when being super extra careful with the types of str's
> or unicode objects that my variables can contain, there is a
Antoine Pitrou wrote:
> > > A good rule of thumb is to convert to unicode everything that is
> > > semantically textual
> >
> > and isn't pure ASCII.
>
> How can you be sure that something that is /semantically textual/ will
> always remain "pure ASCII" ?
"is" != "will always remain"
__
On 10/3/05, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> >
> > I'm not sure it's a sensible default.
>
> Me neither, especially since this would make it impossible
> to write polymorphic code - e.g. ', '.join(list) wouldn't
> work anymore if list contains Unicode; dito for u', '.join(list)
> with lis
Le lundi 03 octobre 2005 à 14:59 +0200, Fredrik Lundh a écrit :
> Antoine Pitrou wrote:
>
> > A good rule of thumb is to convert to unicode everything that is
> > semantically textual
>
> and isn't pure ASCII.
How can you be sure that something that is /semantically textual/ will
always remain
Antoine Pitrou wrote:
> A good rule of thumb is to convert to unicode everything that is
> semantically textual
and isn't pure ASCII.
(anyone who are tempted to argue otherwise should benchmark their
applications, both speed- and memorywise, and be prepared to come
up with very strong arguments
Le lundi 03 octobre 2005 à 02:09 -0400, Martin Blais a écrit :
>
> What if we could completely disable the implicit conversions between
> unicode and str?
This would be very annoying when dealing with some modules or libraries
where the type (str / unicode) returned by a function depends on the
c
Reinhold Birkenfeld wrote:
> Martin v. Löwis wrote:
>>>Whether we think it should be supported depends
>>on who "we" is, as with all these minor features: some think it is
>>a waste of time, some think it should be supported if reasonably
>>possible, and some think this a conditio sine qua non. It
Michael Hudson wrote:
> Martin Blais <[EMAIL PROTECTED]> writes:
>
>
>>What if we could completely disable the implicit conversions between
>>unicode and str? In other words, if you would ALWAYS be forced to
>>call either .encode() or .decode() to convert between one and the
>>other... wouldn't
Martin Blais <[EMAIL PROTECTED]> writes:
> What if we could completely disable the implicit conversions between
> unicode and str? In other words, if you would ALWAYS be forced to
> call either .encode() or .decode() to convert between one and the
> other... wouldn't that help a lot deal with tha
Martin v. Löwis wrote:
> Reinhold Birkenfeld wrote:
>> One problem is that no Unicode escapes can be used since compiling
>> the file raises ValueErrors for them. Such strings would have to
>> be produced using unichr().
>
> You mean, in Unicode literals? There are various approaches, depending
>
46 matches
Mail list logo