SHORT SUMMARY: =============== When Apache HTTPD is used for both URL manipulation (request redirecting) and load-balancing with mod_jk, the plus-characters seem to behave strangely during character re-encoding. Is this a bug or a feature, and how should the system be set up to ensure correct operation?
MY SYSTEM SET-UP: ================= My application is running on two separate Tomcat 6.0.20 instances. Apache HTTPD 2.2.14 (with mod_jk 1.2.28) acts as facade and load balancer for my application. Apache HTTPD performs two functions: it performs URL redirection from short URLs /sites/* to longer actual URLs /contextroot/subcontext/sites/, and it forwards the requests to the actual URLs to the loadbalancer, which in turn utilizes the Tomcat instances via AJP connectors. The redirect requests are being served by mod_alias, not mod_rewrite. Here are the most relevant lines in httpd.conf: LogFormat "%a %t \"%r\" %X%>s %{JK_LB_LAST_NAME}n(%{JK_LB_FIRST_NAME}n) %b %D" ts CustomLog "|bin/rotatelogs logs/access-%Y%m%d-%H%M%S.log 5M" ts JkMount /contextroot/* loadbalancer AllowEncodedSlashes On RedirectMatch ^/sites(.*)$ /contextroot/subcontext/sites$1 Notice that I have not set any explicit forwarding options, which in this mod_jk version implies +ForwardURIProxy. To begin with: Calling Tomcat HTTP connectors directly with http://localhost:8082/contextroot/subcontext/sites/one%2Bone%3Cthree work correctly and application decodes the URL to "sites/one+one<three" (with the correct preceding context path of course). The problems begin while I try to access the application through Apache HTTPD. OBSERVED BEHAVIOR WITH +ForwardURIProxy: (implicit) ===================================== I try to access URL http://localhost/sites/one%2Bone%3Cthree with Internet Explorer browser. Apache HTTPD access log: 131.177.146.160 [11/Jan/2010:12:42:49 +0200] "GET /sites/one%2Bone%3Cthree HTTP/1.1" +302 -(-) 263 0 131.177.146.160 [11/Jan/2010:12:42:49 +0200] "GET /contextroot/subcontext/sites/one+one%3cthree HTTP/1.1" +200 worker2(worker2) 399 0 Apache Tomcat access log ("standard" access valve): 131.177.146.160 - - [11/Jan/2010:12:44:59 +0200] "GET /contextroot/subcontext/sites/one+one%3Cthree HTTP/1.1" 200 399 What my application actually sees after decoding the URL: sites/one one<three Notice that mod_alias has erronously (considering the use case in question) re-encoded the URL, causing %2B to change into '+' and %3C to change into equivalent %3c. Now, if I manually modify the address bar to access http://localhost/contextroot/subcontext/sites/one%2Bone%3Cfour, Apache HTTPD access log now shows: 131.177.146.160 [11/Jan/2010:12:53:37 +0200] "GET /ts_core_virtual_repository/TeamCenterEmulator/sites/one%2Bone%3Cfour HTTP/1.1" +200 worker1(worker1) 399 15625 but Tomcat access log still shows: 131.177.146.160 - - [11/Jan/2010:12:53:34 +0200] "GET /ts_core_virtual_repository/TeamCenterEmulator/sites/one+one%3Cfour HTTP/1.1" 200 399 and my application sees after decoding the URL: sites/one one<four Quite interesting: No URL rewriting should occur at Apache HTTPD, because the RedirectMatch rule does not match, but the URLs in HTTPD and Tomcat access logs are semantically different. OBSERVED BEHAVIOR WITH +ForwardURICompatUnparsed: ============================================= Let's see what happens when I add to httpd.conf: JkOptions +ForwardURICompatUnparsed Let's try to address URL http://localhost/sites/one%2Bone%3Cfive with Internet Explorer. Apache HTTPD access log: 131.177.146.160 [11/Jan/2010:12:58:04 +0200] "GET /sites/one%2Bone%3Cfive HTTP/1.1" +302 -(-) 278 0 131.177.146.160 [11/Jan/2010:12:58:04 +0200] "GET /contextroot/subcontext/sites/one+one%3cfive HTTP/1.1" +200 worker1(worker1) Apache Tomcat access log: 131.177.146.160 - - [11/Jan/2010:12:58:04 +0200] "GET /ts_core_virtual_repository/TeamCenterEmulator/sites/one+one%3cfive HTTP/1.1" 200 399 What my application actually sees after decoding: sites/one one<five The behavior is similar to what was observed before. Now, I again manually modify the address bar to access http://localhost/contextroot/subcontext/sites/one%2Bone%3Csix. This time, Apache HTTPD access log shows: 131.177.146.160 [11/Jan/2010:13:00:20 +0200] "GET /contextroot/subcontext/sites/one%2Bone%3csix HTTP/1.1" +200 worker1(worker1) 399 15625 and Tomcat access log shows: 131.177.146.160 - - [11/Jan/2010:13:00:18 +0200] "GET /contextroot/subcontext/sites/one%2Bone%3csix HTTP/1.1" 200 399 This time my application sees the intended request after decoding the URL: sites/one+one<six THE ISSUE ========= Both mod_alias and mod_jk seem to decode and re-encode the URL during their processing, and it seems like once de-encoded, neither module will re-encode the '+' character. Is this intentional and desirable? JkOption ForwardURICompatUnparsed seem to offer a workaround for this problem for the load balancer, but the solution has its own problems (incompability with mod_rewrite among other things, although that does not seem to apply to browser redirects). I have not found a similar workaround for mod_alias yet. (mod_rewrite does have the [B] option, "encode backreferences", but my brief experiments with [B,R] failed miserably with the results being "one%252bone%253csix" and similar double-encoded garbage). How should I change my configuration so that http://localhost/sites/one%2Bone%3Cthree gives the same same results as http://localhost:8082/contextroot/subcontext/sites/one%2Bone%3Cthree? Best Regards, Tero Karttunen --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org