Wow .. a lot of replies today!
On Thu, Aug 7, 2008 at 2:09 AM, Martin v. Löwis [EMAIL PROTECTED]wrote:
It hasn't been given priority: There are currently 606 patches in the
tracker, many fixing bugs of some sort. It's not clear (to me, at least)
why this should be given priority over all the
FWIW, the rest of this discussion is now happening in the tracker:
http://bugs.python.org/issue3300. We could really use some feedback
from Python users in Asian countries.
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev
* Bill Janssen wrote:
I'm far less concerned about
the decision with regards to unquote_to_bytes/quote_from_bytes, as
those are new features which can wait.
Forgive me, but those are the *old* features, which must be there.
This whole discussion circles too much, I think. Maybe it
This whole discussion circles too much, I think. Maybe it should be pepped?
The issue isn't circular. It's been patched and tested, then a whole lot of
people agreed including Guido. Then you and Bill wanted the bytes
functionality back. So I wrote that in there too, and Bill at least said
that
* Matt Giuca wrote:
This whole discussion circles too much, I think. Maybe it should be
pepped?
The issue isn't circular. It's been patched and tested, then a whole lot
of people agreed including Guido. Then you and Bill wanted the bytes
functionality back. So I wrote that in there too,
There are a lot of quotes around. Including After the most recent flurry
of
discussion I've lost track of what's the right thing to do.
But I don't talk for other people.
OK .. let me compose myself a little. Sorry I went ahead and assumed this
was closed.
It's just frustrating to me that
André Malo wrote:
* Matt Giuca wrote:
We've reached, to quote Guido, as close as consensus as we can get on
this issue.
There are a lot of quotes around. Including After the most recent flurry of
discussion I've lost track of what's the right thing to do.
But I don't talk for other people.
I suggest we continue this discussion, if at all, on the bug-tracker,
where there's code, and more participants.
http://bugs.python.org/issue3300
I've now posted my idea of how quote/unquote should work in py3K, there.
Bill
___
Python-Dev mailing list
Nobody's been
assigned to look at it and it hasn't been given a priority, even though
we all agree it's a bug (though we disagree on how to fix it).
This I can explain (I think). Nobody is assigned to look: we usually
don't do assignments of bugs or patches, except when there is a specific
Martin v. Löwis martin at v.loewis.de writes:
URLs are just not made for non-ASCII characters.
Perhaps they are not, but every non-English wiki (just to take a simple, generic
example) potentially contains non-ASCII URLs.
e.g. http://fr.wikipedia.org/wiki/%C3%89l%C3%A9phant
Implement IRIs if you want non-ASCII characters; the rules are much clearer
for these.
I think most people would expect something which works with the current World
Wide Web rather than a rigorous implementation of a specific RFC. Implementing
RFCs is fine but it does not magically
On 2008-08-06 18:55, Antoine Pitrou wrote:
Martin v. Löwis martin at v.loewis.de writes:
URLs are just not made for non-ASCII characters.
Perhaps they are not, but every non-English wiki (just to take a simple, generic
example) potentially contains non-ASCII URLs.
e.g.
On Wed, Aug 6, 2008 at 9:09 AM, Martin v. Löwis [EMAIL PROTECTED] wrote:
Nobody's been
assigned to look at it and it hasn't been given a priority, even though
we all agree it's a bug (though we disagree on how to fix it).
This I can explain (I think). Nobody is assigned to look: we usually
Has anyone had time to look at the patch for this issue? It got a lot of
support about a week ago, but nobody has replied since then, and the patch
still hasn't been assigned to anybody or given a priority.
I hope I've complied with all the patch submission procedures. Please let me
know if there
After the most recent flurry of discussion I've lost track of what's
the right thing to do. I also believe it was said it should wait until
2.7/3.0, so there's no hurry (in fact there's no way to check it -- we
don't have branches for those versions yet).
On Tue, Aug 5, 2008 at 5:47 AM, Matt
After the most recent flurry of discussion I've lost track of what's
the right thing to do. I also believe it was said it should wait until
2.7/3.0, so there's no hurry (in fact there's no way to check it -- we
don't have branches for those versions yet).
I assume you mean 2.7/3.1.
I've
I'm far less concerned about
the decision with regards to unquote_to_bytes/quote_from_bytes, as those are
new features which can wait.
Forgive me, but those are the *old* features, which must be there.
Bill
___
Python-Dev mailing list
Matt Giuca writes:
OK, for all the people who say URI encoding does not encode characters: yes
it does. This is not an encoding for binary data, it's an encoding for
character data, but it's unspecified how the strings map to octets before
being percent-encoded.
In other words, it's an
Guido says:
Actually, we'd need to look at the various other APIs in Py3k before we can
decide whether these should be considered taking or returning bytes or text.
It looks like all other APIs in the Py3k version of urllib treat URLs as
text.
Yes, as I said in the bug tracker,
Of course, it's un-Pythonic to enforce pedantry, and we pedants can
use a string-string encoder correctly.
Sure. All I was asking was that we not break the existing usage of
the standard library unquote by producing a string by *assuming* a
UTF-8 encoded string is what's in those
Also see http://en.wikipedia.org/wiki/Percent-encoding.
Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Bill Janssen writes:
A quoting function that accepts bytes *must* have an encoding
argument.
Huh? What would it use it for?
Ah, you're right. I was thinking in terms of an URI builder, where the
quoter would do any required conversion (eg, if the bytes represented
a string in
Alright, I've uploaded the new patch which adds the two requested
bytes-oriented functions, as well as accompanying docs and tests.
http://bugs.python.org/issue3300
http://bugs.python.org/file11009/parse.py.patch6
I'd rather have two pairs of functions, so that those who want to give
the readers
Bill wrote:
I'm not sure that's sufficient review, though I agree it's necessary.
The major consumers of quote/unquote are not in the Python standard
library.
I figured that Python 3.0 is designed to fix things, with the breaking
third-party code being an acceptable side-effect of that. So
quote_from_bytes = quote
So either name can be used on either input type, with the idea being that
you should use quote on a str, and quote_from_bytes on a bytes. Is this a
good idea or should it be rewritten so each function permits only one input
type?
so you can use quote_from_bytes
so you can use quote_from_bytes on strings?
Yes, currently.
I assumed Guido meant it was okay to have quote accept string/byte input and
have a function that was redundant but limited in what it accepted (i.e.
quote_from_bytes accepts only bytes)
I suppose your implementation doesn't
Hi folks,
This issue got some attention a few weeks back but it seems to have
fallen quiet, and I haven't had a good chance to sit down and reply
again till now.
As I've said before this is a serious issue which will affect a great
deal of code. However it's obviously not as clear-cut as I
Arg! Damnit, why do my replies get split off from the main thread?
Sorry about any confusion this may be causing.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
On Thu, Jul 31, 2008 at 12:11:40AM +1000, Matt Giuca wrote:
2. Default to UTF-8.
In favour: Matt Giuca, Brett Cannon, Jeroen Ruigrok van der Werven
Count me too: +1. Most sites I use theese days use UTF-8 for URL
encoding. Examples:
Wikipedia:
2008/7/30 Matt Giuca [EMAIL PROTECTED]:
2. Default to UTF-8.
In favour: Matt Giuca, Brett Cannon, Jeroen Ruigrok van der Werven
Pros: Fully working and tested solution is implemented; recommended by
RFC 3986 for all future schemes; recommended by W3C for use with HTML;
UTF-8 used by all
Facundo Batista facundobatista at gmail.com writes:
2008/7/30 Matt Giuca matt.giuca at gmail.com:
2. Default to UTF-8.
In favour: Matt Giuca, Brett Cannon, Jeroen Ruigrok van der Werven
Pros: Fully working and tested solution is implemented; recommended by
RFC 3986 for all future
[I was pretty busy these days, so sorry for jumping in late again]
* Matt Giuca wrote:
1. Leave it as it is. quote is Latin-1 if range(0,256), fallback to
UTF-8. unquote is Latin-1.
In favour: Anybody who doesn't reply to this thread
Pros: Already implemented; some existing code depends
On Wed, Jul 30, 2008 at 8:09 AM, André Malo [EMAIL PROTECTED] wrote:
I'm actually in favour of encoding bytes only back and forth. A useful
extension would be *another* function which wraps quote/unquote and encodes
and decodes characters.
I'd reverse this. By all means, add a new pair of
For unquote, I think it will break a lot and surprise everyone. I
think that while this may be purely the best option, it's pretty
silly.
I don't mind being silly to do the right thing. Happens to me a lot :-).
Bill
___
Python-Dev mailing list
On Wed, Jul 30, 2008 at 8:09 AM, André Malo [EMAIL PROTECTED] wrote:
I'm actually in favour of encoding bytes only back and forth. A useful
extension would be *another* function which wraps quote/unquote and encod=
es
and decodes characters.
I'd reverse this. By all means, add a new
On Wed, Jul 30, 2008 at 9:52 AM, Bill Janssen [EMAIL PROTECTED] wrote:
On Wed, Jul 30, 2008 at 8:09 AM, André Malo [EMAIL PROTECTED] wrote:
I'm actually in favour of encoding bytes only back and forth. A useful
extension would be *another* function which wraps quote/unquote and encod=
es
Actually (as I pointed out before) the existing functions are not
string-in/string-out. They are something-in and bytes-out.
Sorry, this is wrong. quote is clearly bytes-in and string-out.
unquote is clearly string-in and bytes-out.
The whole point of quote is to take an arbitrary sequence
It looks like all other APIs in the Py3k version of
urllib treat URLs as text.
The URL is text, a string of ASCII characters. We're just talking
about urllib.quote() and urllib.unquote(), which are there to support
the text-ization of binary values, and the de-text-ization.
I think that
On Wed, Jul 30, 2008 at 10:33 AM, Bill Janssen [EMAIL PROTECTED] wrote:
It looks like all other APIs in the Py3k version of
urllib treat URLs as text.
The URL is text, a string of ASCII characters. We're just talking
about urllib.quote() and urllib.unquote(), which are there to support
the
(Aside: I dislike functions that have a different return type based on
the value of a parameter.)
I wanted to stay out of the whole discussion as it's largely over my head...
But I did want to express support for this idea which I think almost rises
to the level of a standard... I see more
unquote() -- takes string, produces bytes or string
If optional encoding parameter is specified, decodes bytes with
that encoding and returns string. Otherwise, returns bytes.
The default of returning bytes will break almost all uses. Most code
will uses the unquoted result
On Wed, Jul 30, 2008 at 12:49 PM, Bill Janssen [EMAIL PROTECTED] wrote:
unquote() -- takes string, produces bytes or string
If optional encoding parameter is specified, decodes bytes with
that encoding and returns string. Otherwise, returns bytes.
The default of returning bytes
I think this is as close as consensus as we can get on this issue. Can
whoever wrote the patch adjust the patch to this outcome? (I think the
only change is to remove the encoding arguments and make separate
functions for bytes.)
This is 2.7/3.1 only, right? I'm looking at the bales of code
Con: URI encoding does not encode characters.
OK, for all the people who say URI encoding does not encode characters: yes
it does. This is not an encoding for binary data, it's an encoding for
character data, but it's unspecified how the strings map to octets before
being percent-encoded. From
On Wed, Jul 30, 2008 at 8:49 PM, Matt Giuca [EMAIL PROTECTED] wrote:
Con: URI encoding does not encode characters.
OK, for all the people who say URI encoding does not encode characters: yes
it does. This is not an encoding for binary data, it's an encoding for
character data, but it's
Clearly the unquote is str-bytes, snip You can't pass a Unicode string
back
as the result of unquote *without* passing in an encoding specifier,
because the character set is application-specific.
So for unquote you're suggesting that it always return a bytes object
UNLESS an encoding is
* Matt Giuca wrote:
This POV is way too browser-centric...
This is but one example. Note that I found web forms to be the least
clear-cut example of choosing an encoding. Most of the time applications
seem to be using UTF-8, and all the standards I have read are moving
towards specifying
Ah there may be some confusion here. We're only dealing with str-str
transformations (which in Python 3 means Unicode strings). You can't put a
bytes in or get a bytes out of either of these functions. I suggested a
quote_raw and unquote_raw function which would let you do this.
Ah, well,
On Mon, Jul 14, 2008 at 4:54 AM, André Malo [EMAIL PROTECTED] wrote:
Ahem. The HTTP standard does ;-)
Really? Can you include a quotation please? The HTTP standard talks a lot
about ISO-8859-1 (Latin-1) in terms of actually raw encoded bytes, but not
in terms of URI percent-encoding (a
Hi all,
My first post to the list. In fact, first time Python hacker, long-time
Python user though. (Melbourne, Australia).
Some of you may have seen for the past week or so my bug report on Roundup,
http://bugs.python.org/issue3300
I've spent a heap of effort on this patch now so I'd really
On Sat, Jul 12, 2008 at 10:27 AM, Matt Giuca [EMAIL PROTECTED] wrote:
Hi all,
My first post to the list. In fact, first time Python hacker, long-time
Python user though. (Melbourne, Australia).
Welcome!
Some of you may have seen for the past week or so my bug report on Roundup,
Basically, urllib.quote and unquote seem not to have been updated since
Python 2.5, and because of this they implicitly perform Latin-1 encoding and
decoding (with respect to percent-encoded characters). I think they should
default to UTF-8 for a number of reasons, including that's what other
-On [20080712 19:27], Matt Giuca ([EMAIL PROTECTED]) wrote:
Basically, urllib.quote and unquote seem not to have been updated since Python
2.5, and because of this they implicitly perform Latin-1 encoding and decoding
(with respect to percent-encoded characters). I think they should default to
Very nice, I had this somewhere on my todo list to work on. I'm very much
in favour, especially since it synchronizes us with the RFCs (for all I
remember reading about it last time).
I still think that it doesn't. The RFCs haven't changed, and can't
change for compatibility reasons. The
Thanks for all the replies, and making me feel welcome :)
If what you are saying is true, then it can probably go in as a bug
fix (unless someone else knows something about Latin-1 on the Net that
makes this not true).
Well from what I've seen, the only time Latin-1 naturally appears on the
* Matt Giuca wrote:
Well from what I've seen, the only time Latin-1 naturally appears on the
net is when you have a web page in Latin-1 (either explicit or inferred;
and note that a browser like Firefox will infer Latin-1 if it sees only
ASCII characters) with a form in it. Submitting the
This POV is way too browser-centric...
This is but one example. Note that I found web forms to be the least
clear-cut example of choosing an encoding. Most of the time applications
seem to be using UTF-8, and all the standards I have read are moving towards
specifying UTF-8 (from being
57 matches
Mail list logo