Re: [Zope-dev] Non-ASCII characters in URLs
Dieter Maurer wrote: Wichert Akkerman wrote at 2008-4-7 20:45 +0200: ... Almost surely, Alexander wants to ask why Zope does not allow non-ASCII characters in ids. And, in fact, there are only two reasons: * lazyness of the Zope developpers: without the restriction to ASCII characters careful quoting (and unquoting) is necessary in order to adhere to RFC 2396 (the modern uri syntax specification) This is becoming increasingly painful I will soon have a patch against Zope 2.11b1 which gets rid of this restriction. If there is consense, I can add it to the Zope repository. +1 from my side. Saves me the work to cleanup my own dirty patch :-)) ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Non-ASCII characters in URLs
Wichert Akkerman wrote at 2008-4-7 20:45 +0200: > ... >> Almost surely, Alexander wants to ask why Zope does not allow >> non-ASCII characters in ids. >> >> And, in fact, there are only two reasons: >> >> * lazyness of the Zope developpers: >> >> without the restriction to ASCII characters >> careful quoting (and unquoting) is necessary >> in order to adhere to RFC 2396 (the modern uri syntax specification) > >This is becoming increasingly painful I will soon have a patch against Zope 2.11b1 which gets rid of this restriction. If there is consense, I can add it to the Zope repository. > ... >> * there is no way to specify the encoding used for non ASCII characters. >> >> HTML 4 suggests to convert non ASCII characters first to >> UTF-8 and then url escape the result >> but most HTTP clients do not follow this suggestion. >> Instead, they use the charset found one the page >> that cause them to construct the uri. >> >> I have observed that MS WebDAV from some WebDAV commands >> transfers the url as given and for some other >> commands recodes them into utf-8. >> >> Thus, supporting non ASCII ids occationally may cause >> surprises. > >You mean non ASCII URI's, not non ASCII ids here I suspect. Somehow I'm >not surprised those are painful :( No, I mean non-ASCII ids. They lead to uris with some escaped characters and MS WebDAV for some commands unescapes the uris, interprets them in some default charset ("windows-1252" in our case), recodes them in utf-8, escapes them again and then uses them in the commands. Examples are the COPY and MOVE commands. If an object has a non ASCII charater in its id, say "tüv", its url may look like "http:.../t%FCv". Used in a "COPY" or "MOVE", it is however represented as "http:.../t%C2%BCb". -- Dieter ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Non-ASCII characters in URLs
Previously Dieter Maurer wrote: > Martijn Pieters wrote at 2008-4-7 10:39 +0200: > >On Mon, Apr 7, 2008 at 1:37 AM, Alexander Limi <[EMAIL PROTECTED]> wrote: > >> Is there a good technical explanation for why Zope doesn't allow non-ASCII > >> characters in URLs? > > > >Because URLs don't allow non-ASCII characters? > > Almost surely, Alexander wants to ask why Zope does not allow > non-ASCII characters in ids. > > And, in fact, there are only two reasons: > > * lazyness of the Zope developpers: > > without the restriction to ASCII characters > careful quoting (and unquoting) is necessary > in order to adhere to RFC 2396 (the modern uri syntax specification) This is becoming increasingly painful: it means we can't really use Active Directory's ObjectGUID as userid, it breaks with LDAP DN's with non-ASCII characters (all too common). I really wish Zope ID's were either binary strings or unicode strings. > * there is no way to specify the encoding used for non ASCII characters. > > HTML 4 suggests to convert non ASCII characters first to > UTF-8 and then url escape the result > but most HTTP clients do not follow this suggestion. > Instead, they use the charset found one the page > that cause them to construct the uri. > > I have observed that MS WebDAV from some WebDAV commands > transfers the url as given and for some other > commands recodes them into utf-8. > > Thus, supporting non ASCII ids occationally may cause > surprises. You mean non ASCII URI's, not non ASCII ids here I suspect. Somehow I'm not surprised those are painful :( Wichert. -- Wichert Akkerman <[EMAIL PROTECTED]>It is simple to make things. http://www.wiggy.net/ It is hard to make things simple. ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Non-ASCII characters in URLs
Martijn Pieters wrote at 2008-4-7 10:39 +0200: >On Mon, Apr 7, 2008 at 1:37 AM, Alexander Limi <[EMAIL PROTECTED]> wrote: >> Is there a good technical explanation for why Zope doesn't allow non-ASCII >> characters in URLs? > >Because URLs don't allow non-ASCII characters? Almost surely, Alexander wants to ask why Zope does not allow non-ASCII characters in ids. And, in fact, there are only two reasons: * lazyness of the Zope developpers: without the restriction to ASCII characters careful quoting (and unquoting) is necessary in order to adhere to RFC 2396 (the modern uri syntax specification) * there is no way to specify the encoding used for non ASCII characters. HTML 4 suggests to convert non ASCII characters first to UTF-8 and then url escape the result but most HTTP clients do not follow this suggestion. Instead, they use the charset found one the page that cause them to construct the uri. I have observed that MS WebDAV from some WebDAV commands transfers the url as given and for some other commands recodes them into utf-8. Thus, supporting non ASCII ids occationally may cause surprises. -- Dieter ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Non-ASCII characters in URLs
- Original Message - From: "Martijn Pieters" <[EMAIL PROTECTED]> To: "Alexander Limi" <[EMAIL PROTECTED]> Cc: Sent: Monday, April 07, 2008 4:39 AM Subject: Re: [Zope-dev] Non-ASCII characters in URLs On Mon, Apr 7, 2008 at 1:37 AM, Alexander Limi <[EMAIL PROTECTED]> wrote: Is there a good technical explanation for why Zope doesn't allow non-ASCII characters in URLs? Because URLs don't allow non-ASCII characters? I'd like to be able to let URLs work like this example from Wikipedia: http://ja.wikipedia.org/wiki/メインページ Your browser translates that into http://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8 Is there a fundamental reason (ie. Python objects can only be ASCII) or is it simply bugs that need to be fixed? RFC 1738 (http://www.ietf.org/rfc/rfc1738.txt) doesn't allow non-ascii characters in URLs. No corresponding graphic US-ASCII: URLs are written only with the graphic printable characters of the US-ASCII coded character set. The octets 80-FF hexadecimal are not used in US-ASCII, and the octets 00-1F and 7F hexadecimal represent control characters; these must be encoded. Now, Zope could well support UTF-8 ids, and translate URLs appropriately, but in the meantime you could use the same scheme? IDNA (http://www.ietf.org/rfc/rfc3490.txt) and Punycode (http://www.faqs.org/rfcs/rfc3492.html) may be of some use. Jonathan ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Non-ASCII characters in URLs
On Mon, Apr 7, 2008 at 1:37 AM, Alexander Limi <[EMAIL PROTECTED]> wrote: > Is there a good technical explanation for why Zope doesn't allow non-ASCII > characters in URLs? Because URLs don't allow non-ASCII characters? > I'd like to be able to let URLs work like this example from Wikipedia: > > http://ja.wikipedia.org/wiki/メインページ Your browser translates that into http://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8 > Is there a fundamental reason (ie. Python objects can only be ASCII) or is > it simply bugs that need to be fixed? RFC 1738 (http://www.ietf.org/rfc/rfc1738.txt) doesn't allow non-ascii characters in URLs. No corresponding graphic US-ASCII: URLs are written only with the graphic printable characters of the US-ASCII coded character set. The octets 80-FF hexadecimal are not used in US-ASCII, and the octets 00-1F and 7F hexadecimal represent control characters; these must be encoded. Now, Zope could well support UTF-8 ids, and translate URLs appropriately, but in the meantime you could use the same scheme? -- Martijn Pieters ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Non-ASCII characters in URLs
--On 6. April 2008 16:37:22 -0700 Alexander Limi <[EMAIL PROTECTED]> wrote: Hi, Is there a good technical explanation for why Zope doesn't allow non-ASCII characters in URLs? I'd like to be able to let URLs work like this example from Wikipedia: http://ja.wikipedia.org/wiki/メインページ When I try adding an object with ID "メインページ" in Zope 2, I get the following error message: Error Type: BadRequest Error Value: The id "メインページ " contains characters illegal in URLs. Is there a fundamental reason (ie. Python objects can only be ASCII) or is it simply bugs that need to be fixed? As Paul indicated: the issue dates back to the times when there was only ASCII in the URL world. Especially object IDs have to be ascii - well...Zope came from US :-) Andreas pgpJMq7CsKKOG.pgp Description: PGP signature ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Non-ASCII characters in URLs
On Sun, Apr 06, 2008 at 04:37:22PM -0700, Alexander Limi wrote: > Hi, > > Is there a good technical explanation for why Zope doesn't allow non-ASCII > characters in URLs? I suspect it's only for hysterical raisins. The code in question is in OFS/ObjectManager.py, in the checkValidId() function. Non-ASCII characters trigger a match on the bad_id regular expression search. As I recall, if you look at the revision history, that code is very old. There might even be an existing bug filed about this; I don't remember. -- Paul Winkler http://www.slinkp.com ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
[Zope-dev] Non-ASCII characters in URLs
Hi, Is there a good technical explanation for why Zope doesn't allow non-ASCII characters in URLs? I'd like to be able to let URLs work like this example from Wikipedia: http://ja.wikipedia.org/wiki/メインページ When I try adding an object with ID "メインページ" in Zope 2, I get the following error message: Error Type: BadRequest Error Value: The id "メインページ" contains characters illegal in URLs. Is there a fundamental reason (ie. Python objects can only be ASCII) or is it simply bugs that need to be fixed? Curiously yours, -- Alexander Limi · http://limi.net ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )