2011/5/9 Mindaugas Žakšauskas <min...@gmail.com>:
> On Mon, May 9, 2011 at 2:03 PM, Konstantin Kolinko
> <knst.koli...@gmail.com> wrote:
> <..>
>> If ";" is part of the actual path, it must be escaped.
>>
>> If ";" starts a "path parameter" it must be unescaped. One well-known
>> example is ";jsessionid" path parameter.
>
> Thanks for your answer. Is this rule is just "de facto" rule, or is it
> documented anywhere in RFC3986/RFC2396?

As you wrote, it is RFC 3986, per [1]
http://tools.ietf.org/html/rfc3986

> Extending my question, is there a clear criteria which would define
> which characters always need escaping and which don't? At the moment I
> am escaping everything that is not unreserved [1], but I am not sure
> about SEOability and user-friendliness - this especially concerns path
> with international characters in URLs, e.g. http://site/pathąčęė

That is up to the browser how to show those URLs. Many browsers have a
setting how to display such URLs.  E.g. try to browse non-English
Wikipedia for an example of i18n addresses.

> I have also found a similar Tomcat bug [2], but it is addressing
> slightly different issue.

[2] is not a bug. It is an invalid report. It is a useful reading, though.

> If anyone wants to benefit this, I have just added 50 bonus points to
> my SO question [3]. The main question I want to get answer for is -
> which characters can and which need escaping, both in terms of RFC and
> Tomcat.

> 1. According to RFC 3986, unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
> 2. https://issues.apache.org/bugzilla/show_bug.cgi?id=51132
> 3. 
> http://stackoverflow.com/questions/5913623/rfc3986-which-pchars-need-to-be-percent-encoded

BTW, take a look at the java.net.URI class and its URI.toString() and
URI.toURL() methods.

Just one example (not 100% related to your case, but one that happens
frequently):
to converts a File to a proper URL the correct code is to call

File.toURI().toURL()

because that takes care of % encodings, while the old File.toURL()
method does not.


2011/5/9 André Warnier <a...@ice-sa.com>:
> (like a space encoded as a "+", and a "+"
> encoded as %xy),

Andre, one small correction:
It sometimes causes confusion, but encoding of space as '+' works only
in the query part of the URL.
The unambiguous way to encode a space regardless of is position in URL is %20.

Encoding space as '+' is defined by "url encoding" encoding scheme
defined by HTML standard, in the chapter where it describes how HTML
forms are submitted.


Best regards,
Konstantin Kolinko

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to