2011/5/9 Mindaugas Žakšauskas <min...@gmail.com>: > On Mon, May 9, 2011 at 2:03 PM, Konstantin Kolinko > <knst.koli...@gmail.com> wrote: > <..> >> If ";" is part of the actual path, it must be escaped. >> >> If ";" starts a "path parameter" it must be unescaped. One well-known >> example is ";jsessionid" path parameter. > > Thanks for your answer. Is this rule is just "de facto" rule, or is it > documented anywhere in RFC3986/RFC2396?
As you wrote, it is RFC 3986, per [1] http://tools.ietf.org/html/rfc3986 > Extending my question, is there a clear criteria which would define > which characters always need escaping and which don't? At the moment I > am escaping everything that is not unreserved [1], but I am not sure > about SEOability and user-friendliness - this especially concerns path > with international characters in URLs, e.g. http://site/pathąčęė That is up to the browser how to show those URLs. Many browsers have a setting how to display such URLs. E.g. try to browse non-English Wikipedia for an example of i18n addresses. > I have also found a similar Tomcat bug [2], but it is addressing > slightly different issue. [2] is not a bug. It is an invalid report. It is a useful reading, though. > If anyone wants to benefit this, I have just added 50 bonus points to > my SO question [3]. The main question I want to get answer for is - > which characters can and which need escaping, both in terms of RFC and > Tomcat. > 1. According to RFC 3986, unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" > 2. https://issues.apache.org/bugzilla/show_bug.cgi?id=51132 > 3. > http://stackoverflow.com/questions/5913623/rfc3986-which-pchars-need-to-be-percent-encoded BTW, take a look at the java.net.URI class and its URI.toString() and URI.toURL() methods. Just one example (not 100% related to your case, but one that happens frequently): to converts a File to a proper URL the correct code is to call File.toURI().toURL() because that takes care of % encodings, while the old File.toURL() method does not. 2011/5/9 André Warnier <a...@ice-sa.com>: > (like a space encoded as a "+", and a "+" > encoded as %xy), Andre, one small correction: It sometimes causes confusion, but encoding of space as '+' works only in the query part of the URL. The unambiguous way to encode a space regardless of is position in URL is %20. Encoding space as '+' is defined by "url encoding" encoding scheme defined by HTML standard, in the chapter where it describes how HTML forms are submitted. Best regards, Konstantin Kolinko --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org