Alan Kennedy <a...@xhaus.com> wrote: > Hi Bill, > > [Bill] > > I think the controlling reference here is RFC 3875. > > I think the controlling references are RFC 2616, RFC 2396 and RFC 3987.
I see what you're saying, but it's darn near impossible, as a practical matter, to get any guidance on encoding matters from those. The question is where those names come from, and they come from CGI, and that is (practically speaking) defined these days by RFC 3875, as much as anything. > I think the question is "are people using IRIs in the wild"? If so, > then we must decide how do we best deal with the problems of > recognising iso-8859-1+rfc2037 versus utf-8, or whatever > server-configured encoding the user has chosen. See http://bugs.python.org/issue3300, where we went around and around that question. The answer seems to be, yes. There are lots of useful fragments in that discussion, for instance: ``For the authority (server name) portion of a URI, RFC 3986 is pretty clear that UTF-8 must be used for non-ASCII values (assuming, for a moment, that IDNA addresses are not Punycode encoded already). For the path portion of URIs, a large-ish proportion of them are, indeed, UTF-8 encoded because that has been the de facto standard in Web browsers for a number of years now. For the query and fragment parts, however, the encoding is determined by context and often depends on the encoding of some page that contains the form from which the data is taken. Thus, a large number of URIs contain non-UTF-8 percent-encoded octets.'' Bill _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com