Hi, Thanks very much for your answers. Just for a reference, I will sum up what I've managed to get out of this discussion. Please correct me if I am wrong.
My problem wasn't charset incompatibility between client and server as it is the same party which produces URLs and consumes them (and yes, we do use UTF-8 everywhere and have useBodyEncodingForURL set to true). Anyway, it was interesting read to get the whole picture, including Punycode. I hope others did benefit from this, too. What I wanted to clarify was the exact sets of characters needing % encoding. Initially I thought that this all boils down to different character classes but it turned out to be incorrect (the semicolon VS bracket case). My another concern was i18zed paths, and it was a good advice from Konstantin to have a look at Wikipedia. For example, a link to "botánico" in Spanish Wikipedia is printed as <a href="/wiki/Bot%C3%A1nica" title="Botánica"> and browsers are seem to be able to show it percent-decoded without any special effort. I only slipped here because initially I have used [1] which does not encode (at least) some characters correctly. I ended up using modified java.net.URI::appendEncoded(StringBuilder, char) as it's private there and doesn't escape semicolons [2]. My conclusion is to percent-encode everything that is not unreserved. It might be sub-optimal as some characters, such as brackets, do not need encoding, but I better choose safe than sorry. [1] http://stackoverflow.com/questions/573184/java-convert-string-to-valid-uri-object/3332864#3332864 [2] The final code that does the escaping: private static final String UNRESERVED = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890-._~"; private final static char[] hexDigits = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'}; // stolen from java.net.URI and modified to ensure semicolons, etc. get encoded private static void appendEncoded(StringBuilder sb, char c) { ByteBuffer bb = null; try { bb = ThreadLocalCoders.encoderFor("UTF-8").encode(CharBuffer.wrap("" + c)); } catch (CharacterCodingException x) { assert false; } while (bb.hasRemaining()) { int b = bb.get() & 0xff; sb.append('%'); sb.append(hexDigits[(b >> 4) & 0x0f]); sb.append(hexDigits[(b) & 0x0f]); } } // to escape, one needs to iterate over all characters and escape if // !isUnreserved(yourChar) private static boolean isUnreserved(char c) { return UNRESERVED.indexOf(c) != -1; } Regards, Mindaugas --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org