SHORT SUMMARY:
===============
When Apache HTTPD is used for both URL manipulation (request
redirecting) and load-balancing with mod_jk, the plus-characters seem
to behave strangely during character re-encoding. Is this a bug or a
feature, and how should the system be set up to ensure correct
operation?

MY SYSTEM SET-UP:
=================
My application is running on two separate Tomcat 6.0.20 instances.
Apache HTTPD 2.2.14 (with mod_jk 1.2.28) acts as facade and load
balancer for my application. Apache HTTPD performs two functions: it
performs URL redirection from short URLs /sites/* to longer actual
URLs /contextroot/subcontext/sites/, and it forwards the requests to
the actual URLs to the loadbalancer, which in turn utilizes the Tomcat
instances via AJP connectors. The redirect requests are being served
by mod_alias, not mod_rewrite.

Here are the most relevant lines in httpd.conf:

LogFormat "%a %t \"%r\" %X%>s
%{JK_LB_LAST_NAME}n(%{JK_LB_FIRST_NAME}n) %b %D" ts
CustomLog "|bin/rotatelogs logs/access-%Y%m%d-%H%M%S.log 5M" ts
JkMount /contextroot/* loadbalancer
AllowEncodedSlashes On
RedirectMatch ^/sites(.*)$ /contextroot/subcontext/sites$1

Notice that I have not set any explicit forwarding options, which in
this mod_jk version implies +ForwardURIProxy.

To begin with: Calling Tomcat HTTP connectors directly with
http://localhost:8082/contextroot/subcontext/sites/one%2Bone%3Cthree
work correctly and application decodes the URL to
"sites/one+one<three" (with the correct preceding context path of
course). The problems begin while I try to access the application
through Apache HTTPD.

OBSERVED BEHAVIOR WITH +ForwardURIProxy: (implicit)
=====================================
I try to access URL http://localhost/sites/one%2Bone%3Cthree with
Internet Explorer browser.
Apache HTTPD access log:
131.177.146.160 [11/Jan/2010:12:42:49 +0200] "GET
/sites/one%2Bone%3Cthree HTTP/1.1" +302 -(-) 263 0
131.177.146.160 [11/Jan/2010:12:42:49 +0200] "GET
/contextroot/subcontext/sites/one+one%3cthree HTTP/1.1" +200
worker2(worker2) 399 0
Apache Tomcat access log ("standard" access valve):
131.177.146.160 - - [11/Jan/2010:12:44:59 +0200] "GET
/contextroot/subcontext/sites/one+one%3Cthree HTTP/1.1" 200 399
What my application actually sees after decoding the URL: sites/one one<three

Notice that mod_alias has erronously (considering the use case in
question) re-encoded the URL, causing %2B to change into '+' and %3C
to change into equivalent %3c.

Now, if I manually modify the address bar to access
http://localhost/contextroot/subcontext/sites/one%2Bone%3Cfour,
Apache HTTPD access log now shows:
131.177.146.160 [11/Jan/2010:12:53:37 +0200] "GET
/ts_core_virtual_repository/TeamCenterEmulator/sites/one%2Bone%3Cfour
HTTP/1.1" +200 worker1(worker1) 399 15625
but Tomcat access log still shows:
131.177.146.160 - - [11/Jan/2010:12:53:34 +0200] "GET
/ts_core_virtual_repository/TeamCenterEmulator/sites/one+one%3Cfour
HTTP/1.1" 200 399
and my application sees after decoding the URL: sites/one one<four

Quite interesting: No URL rewriting should occur at Apache HTTPD,
because the RedirectMatch rule does not match, but the URLs in HTTPD
and Tomcat access logs are semantically different.

OBSERVED BEHAVIOR WITH +ForwardURICompatUnparsed:
=============================================
Let's see what happens when I add to httpd.conf:
JkOptions +ForwardURICompatUnparsed

Let's try to address URL http://localhost/sites/one%2Bone%3Cfive with
Internet Explorer.
Apache HTTPD access log:
131.177.146.160 [11/Jan/2010:12:58:04 +0200] "GET
/sites/one%2Bone%3Cfive HTTP/1.1" +302 -(-) 278 0
131.177.146.160 [11/Jan/2010:12:58:04 +0200] "GET
/contextroot/subcontext/sites/one+one%3cfive HTTP/1.1" +200
worker1(worker1)
Apache Tomcat access log:
131.177.146.160 - - [11/Jan/2010:12:58:04 +0200] "GET
/ts_core_virtual_repository/TeamCenterEmulator/sites/one+one%3cfive
HTTP/1.1" 200 399
What my application actually sees after decoding: sites/one one<five

The behavior is similar to what was observed before.

Now, I again manually modify the address bar to access
http://localhost/contextroot/subcontext/sites/one%2Bone%3Csix. This
time,
Apache HTTPD access log shows:
131.177.146.160 [11/Jan/2010:13:00:20 +0200] "GET
/contextroot/subcontext/sites/one%2Bone%3csix HTTP/1.1" +200
worker1(worker1) 399 15625
and Tomcat access log shows:
131.177.146.160 - - [11/Jan/2010:13:00:18 +0200] "GET
/contextroot/subcontext/sites/one%2Bone%3csix HTTP/1.1" 200 399
This time my application sees the intended request after decoding the
URL: sites/one+one<six

THE ISSUE
=========
Both mod_alias and mod_jk seem to decode and re-encode the URL during
their processing, and it seems like once de-encoded, neither module
will re-encode the '+' character. Is this intentional and desirable?
JkOption ForwardURICompatUnparsed seem to offer a workaround for this
problem for the load balancer, but the solution has its own problems
(incompability with mod_rewrite among other things, although that does
not seem to apply to browser redirects). I have not found a similar
workaround for mod_alias yet.

(mod_rewrite does have the [B] option, "encode backreferences", but my
brief experiments with [B,R] failed miserably with the results being
"one%252bone%253csix" and similar double-encoded garbage).

How should I change my configuration so that
http://localhost/sites/one%2Bone%3Cthree gives the same same results
as http://localhost:8082/contextroot/subcontext/sites/one%2Bone%3Cthree?

Best Regards,
Tero Karttunen

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to