Thank you for another reply, Chris! I was secretly hoping that somebody would stand up and tell me that I have missed something obvious, but the more I look into this issue, the messier it seems. But let's not get ahead of things.
I apologize for the inconsistency in the log lines I posted in my earlier message. I had tried to replace (and anonymize) the URL path elements with s/ts_core_virtual_repository/contextroot/, s/TeamCenterEmulator/subcontext/ and s/geek/localhost/, but I had obviously missed some references. Sorry for the confusion. Even though the cat's out of the bag, I will stick to the replacements for consistency's sake. >> Notice that mod_alias has erronously (considering the use case in >> question) re-encoded the URL, causing %2B to change into '+' and %3C >> to change into equivalent %3c. > > Note that mod_jk is not involved, here: mod_alias is performing the > redirect and mod_jk does not get involved. Also, the change from %3C to > %3c is not really a problem: HTTP allows either upper or lowercase > %-encoded URI elements (see section 2.2 of > http://www.ietf.org/rfc/rfc1738.txt). This is correct. The superceding RFC 3986 states: "If two URIs differ only in the case of hexadecimal digits used in percent-encoded octets, they are equivalent", but it also continues: "For consistency, URI producers and normalizers should use uppercase hexadecimal digits for all percent-encodings", making one choice preferable to the other. [JkOptions +ForwardUriProxy] >> Now, if I manually modify the address bar to access >> http://localhost/contextroot/subcontext/sites/one%2Bone%3Cfour, >> Apache HTTPD access log now shows: >> 131.177.146.160 [11/Jan/2010:12:53:37 +0200] "GET >> /ts_core_virtual_repository/TeamCenterEmulator/sites/one%2Bone%3Cfour >> HTTP/1.1" +200 worker1(worker1) 399 15625 > > Good. > >> but Tomcat access log still shows: >> 131.177.146.160 - - [11/Jan/2010:12:53:34 +0200] "GET >> /ts_core_virtual_repository/TeamCenterEmulator/sites/one+one%3Cfour >> HTTP/1.1" 200 399 > > Right: that's wrong. > >> and my application sees after decoding the URL: sites/one one<four > > Given that Tomcat saw one+one%3Cfour, this is correct decoding. > > What does the mod_jk log show for this request? I reran the test with JK request logging on, and the log shows: [Wed Jan 13 12:14:13 2010] worker1 GET /contextroot/subcontext/sites/one%2Bone%3Cfour HTTP/1.1 200 0.000000 so both Apache HTTPD and mod_jk logs show the correctly encoded URLs from the browser request. >> Quite interesting: No URL rewriting should occur at Apache HTTPD, >> because the RedirectMatch rule does not match, but the URLs in HTTPD >> and Tomcat access logs are semantically different. > > Well, the RedirectMatch rule does match for the first request, and it > definitely appears that mod_alias is mangling your URL. Have you tried > snooping the HTTP conversation to make sure it's not your web browser > that is misinterpreting the 302 response from httpd? Yes I have. Here is a telnet session for proof. ============================== <username>@<hostname>:~ $ telnet localhost 80 Trying 131.177.146.160... Connected to <hostname>. Escape character is '^]'. GET /sites/one%2Bone%3C HTTP/1.0 HTTP/1.1 302 Found Date: Wed, 13 Jan 2010 09:57:32 GMT Server: Apache/2.2.14 (Win32) mod_jk/1.2.28 Location: http://localhost/contextroot/subcontext/sites/one+one%3c Content-Length: 274 Connection: close Content-Type: text/html; charset=iso-8859-1 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>302 Found</title> </head><body> <h1>Found</h1> <p>The document has moved <a href="http://localhost/contextroot/subcontext/sites/one+one%3c">here</a>.</p> </body></html> ============================== It looks to me like both mod_alias and mod_jk (with +ForwardUriProxy) decode the URLs and do not subsequently re-encode the '+' character. I do not have any C coding experience, but I attempted to check mod_jk source code for this. mod_jk uses int jk_canonenc(const char *x, char *y, int maxlen) function for encoding when +ForwardUriProxy is on. Here are the juicy bits (slightly reformatted for brevity): /* characters which should not be encoded */ char *allowed = "~$-_.+!*'(),;:@&="; /* characters which much not be en/de-coded */ char *reserved = "/"; for (i = 0, j = 0; ch != '\0' && j < maxlen; i++, j++, ch=x[i]) { if (strchr(reserved, ch)) { /* always handle '/' first */ y[j] = ch; continue; } if (!JK_ISALNUM(ch) && !strchr(allowed, ch)) { /* recode it, if necessary */ if (j+2<maxlen) { jk_c2hex(ch, &y[j]); j += 2; } else { return JK_FALSE; } } else { y[j] = ch; } } mod_alias and mod_rewrite are already reported to suffer from similar encoding problems. There are several bug reports; the best I could find was https://issues.apache.org/bugzilla/show_bug.cgi?id=32328. Link http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=421820 is also useful in trying to find out what is going on, as is RFC 3986. All this is making my head hurt, but what I guess is going on is that the original URL (still available as r->unparsed_uri) is being decoded in Apache HTTPD at a very early stage, and once mod_jk or other dispatchers activate, the r->uri they handle can already be a result of multiple URI manipulations by mod_rewrite and other modules, and for that reason it can be considered unsafe to mindlessly re-encode some of its reserved characters. But this is only my first guess. +ForwardUriCompatUnparsed solves the mod_jk part of the problem _for me_, but while HTTPD people are working on bug 32328 (since 2007), could it be benecifial for mod_jk to maybe offer a fifth Forwarding mode as a workaround for the problem for mod_jk users? Maybe taking a list of characters to be encoded as an extra argument? Unfortunately, I still have no ideas on how to configure the URL redirection for Apache HTTPD so that the plus-characters are preserved in encoded format. Does anyone have any ideas or hints? Thanks for help! Tero Karttunen --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org