On 13/01/2010 22:47, Tero Karttunen wrote: > Thank you for another reply, Chris! I was secretly hoping that > somebody would stand up and tell me that I have missed something > obvious, but the more I look into this issue, the messier it seems. > But let's not get ahead of things. > > I apologize for the inconsistency in the log lines I posted in my > earlier message. I had tried to replace (and anonymize) the URL path > elements with s/ts_core_virtual_repository/contextroot/, > s/TeamCenterEmulator/subcontext/ and s/geek/localhost/, but I had > obviously missed some references. Sorry for the confusion. Even though > the cat's out of the bag, I will stick to the replacements for > consistency's sake.
Just a thought, mod_jk doesn't always play nicely with other modules if those modules try to manipulate the URI. Have you tried mod_proxy_http? Mark > >>> Notice that mod_alias has erronously (considering the use case in >>> question) re-encoded the URL, causing %2B to change into '+' and %3C >>> to change into equivalent %3c. >> >> Note that mod_jk is not involved, here: mod_alias is performing the >> redirect and mod_jk does not get involved. Also, the change from %3C to >> %3c is not really a problem: HTTP allows either upper or lowercase >> %-encoded URI elements (see section 2.2 of >> http://www.ietf.org/rfc/rfc1738.txt). > > This is correct. The superceding RFC 3986 states: "If two URIs differ > only in the case of hexadecimal digits used in percent-encoded octets, > they are equivalent", but it also continues: "For consistency, URI > producers and normalizers should use uppercase hexadecimal digits for > all percent-encodings", making one choice preferable to the other. > > [JkOptions +ForwardUriProxy] >>> Now, if I manually modify the address bar to access >>> http://localhost/contextroot/subcontext/sites/one%2Bone%3Cfour, >>> Apache HTTPD access log now shows: >>> 131.177.146.160 [11/Jan/2010:12:53:37 +0200] "GET >>> /ts_core_virtual_repository/TeamCenterEmulator/sites/one%2Bone%3Cfour >>> HTTP/1.1" +200 worker1(worker1) 399 15625 >> >> Good. >> >>> but Tomcat access log still shows: >>> 131.177.146.160 - - [11/Jan/2010:12:53:34 +0200] "GET >>> /ts_core_virtual_repository/TeamCenterEmulator/sites/one+one%3Cfour >>> HTTP/1.1" 200 399 >> >> Right: that's wrong. >> >>> and my application sees after decoding the URL: sites/one one<four >> >> Given that Tomcat saw one+one%3Cfour, this is correct decoding. >> >> What does the mod_jk log show for this request? > > I reran the test with JK request logging on, and the log shows: > [Wed Jan 13 12:14:13 2010] worker1 GET > /contextroot/subcontext/sites/one%2Bone%3Cfour HTTP/1.1 200 0.000000 > > so both Apache HTTPD and mod_jk logs show the correctly encoded URLs > from the browser request. > >>> Quite interesting: No URL rewriting should occur at Apache HTTPD, >>> because the RedirectMatch rule does not match, but the URLs in HTTPD >>> and Tomcat access logs are semantically different. >> >> Well, the RedirectMatch rule does match for the first request, and it >> definitely appears that mod_alias is mangling your URL. Have you tried >> snooping the HTTP conversation to make sure it's not your web browser >> that is misinterpreting the 302 response from httpd? > > Yes I have. Here is a telnet session for proof. > > ============================== > <username>@<hostname>:~ $ telnet localhost 80 > Trying 131.177.146.160... > Connected to <hostname>. > Escape character is '^]'. > GET /sites/one%2Bone%3C HTTP/1.0 > > HTTP/1.1 302 Found > Date: Wed, 13 Jan 2010 09:57:32 GMT > Server: Apache/2.2.14 (Win32) mod_jk/1.2.28 > Location: http://localhost/contextroot/subcontext/sites/one+one%3c > Content-Length: 274 > Connection: close > Content-Type: text/html; charset=iso-8859-1 > > <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> > <html><head> > <title>302 Found</title> > </head><body> > <h1>Found</h1> > <p>The document has moved <a > href="http://localhost/contextroot/subcontext/sites/one+one%3c">here</a>.</p> > </body></html> > ============================== > > It looks to me like both mod_alias and mod_jk (with +ForwardUriProxy) > decode the URLs and do not subsequently re-encode the '+' character. > > I do not have any C coding experience, but I attempted to check mod_jk > source code for this. mod_jk uses int jk_canonenc(const char *x, char > *y, int maxlen) function for encoding when +ForwardUriProxy is on. > Here are the juicy bits (slightly reformatted for brevity): > > /* characters which should not be encoded */ > char *allowed = "~$-_.+!*'(),;:@&="; > /* characters which much not be en/de-coded */ > char *reserved = "/"; > for (i = 0, j = 0; ch != '\0' && j < maxlen; i++, j++, ch=x[i]) { > if (strchr(reserved, ch)) { /* always handle '/' first */ > y[j] = ch; > continue; > } > if (!JK_ISALNUM(ch) && !strchr(allowed, ch)) { /* recode it, > if necessary */ > if (j+2<maxlen) { > jk_c2hex(ch, &y[j]); > j += 2; > } > else { > return JK_FALSE; > } > } > else { > y[j] = ch; > } > } > > mod_alias and mod_rewrite are already reported to suffer from similar > encoding problems. There are several bug reports; the best I could > find was https://issues.apache.org/bugzilla/show_bug.cgi?id=32328. > Link http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=421820 is also > useful in trying to find out what is going on, as is RFC 3986. > > All this is making my head hurt, but what I guess is going on is that > the original URL (still available as r->unparsed_uri) is being decoded > in Apache HTTPD at a very early stage, and once mod_jk or other > dispatchers activate, the r->uri they handle can already be a result > of multiple URI manipulations by mod_rewrite and other modules, and > for that reason it can be considered unsafe to mindlessly re-encode > some of its reserved characters. But this is only my first guess. > > +ForwardUriCompatUnparsed solves the mod_jk part of the problem _for > me_, but while HTTPD people are working on bug 32328 (since 2007), > could it be benecifial for mod_jk to maybe offer a fifth Forwarding > mode as a workaround for the problem for mod_jk users? Maybe taking a > list of characters to be encoded as an extra argument? > > Unfortunately, I still have no ideas on how to configure the URL > redirection for Apache HTTPD so that the plus-characters are preserved > in encoded format. Does anyone have any ideas or hints? > > Thanks for help! > > Tero Karttunen > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org