When Apache HTTPD is used for both URL manipulation (request
redirecting) and load-balancing with mod_jk, the plus-characters seem
to behave strangely during character re-encoding. Is this a bug or a
feature, and how should the system be set up to ensure correct

My application is running on two separate Tomcat 6.0.20 instances.
Apache HTTPD 2.2.14 (with mod_jk 1.2.28) acts as facade and load
balancer for my application. Apache HTTPD performs two functions: it
performs URL redirection from short URLs /sites/* to longer actual
URLs /contextroot/subcontext/sites/, and it forwards the requests to
the actual URLs to the loadbalancer, which in turn utilizes the Tomcat
instances via AJP connectors. The redirect requests are being served
by mod_alias, not mod_rewrite.

Here are the most relevant lines in httpd.conf:

LogFormat "%a %t \"%r\" %X%>s
CustomLog "|bin/rotatelogs logs/access-%Y%m%d-%H%M%S.log 5M" ts
JkMount /contextroot/* loadbalancer
AllowEncodedSlashes On
RedirectMatch ^/sites(.*)$ /contextroot/subcontext/sites$1

Notice that I have not set any explicit forwarding options, which in
this mod_jk version implies +ForwardURIProxy.

To begin with: Calling Tomcat HTTP connectors directly with
work correctly and application decodes the URL to
"sites/one+one<three" (with the correct preceding context path of
course). The problems begin while I try to access the application
through Apache HTTPD.

OBSERVED BEHAVIOR WITH +ForwardURIProxy: (implicit)
I try to access URL http://localhost/sites/one%2Bone%3Cthree with
Internet Explorer browser.
Apache HTTPD access log: [11/Jan/2010:12:42:49 +0200] "GET
/sites/one%2Bone%3Cthree HTTP/1.1" +302 -(-) 263 0 [11/Jan/2010:12:42:49 +0200] "GET
/contextroot/subcontext/sites/one+one%3cthree HTTP/1.1" +200
worker2(worker2) 399 0
Apache Tomcat access log ("standard" access valve): - - [11/Jan/2010:12:44:59 +0200] "GET
/contextroot/subcontext/sites/one+one%3Cthree HTTP/1.1" 200 399
What my application actually sees after decoding the URL: sites/one one<three

Notice that mod_alias has erronously (considering the use case in
question) re-encoded the URL, causing %2B to change into '+' and %3C
to change into equivalent %3c.

Now, if I manually modify the address bar to access
Apache HTTPD access log now shows: [11/Jan/2010:12:53:37 +0200] "GET
HTTP/1.1" +200 worker1(worker1) 399 15625
but Tomcat access log still shows: - - [11/Jan/2010:12:53:34 +0200] "GET
HTTP/1.1" 200 399
and my application sees after decoding the URL: sites/one one<four

Quite interesting: No URL rewriting should occur at Apache HTTPD,
because the RedirectMatch rule does not match, but the URLs in HTTPD
and Tomcat access logs are semantically different.

Let's see what happens when I add to httpd.conf:
JkOptions +ForwardURICompatUnparsed

Let's try to address URL http://localhost/sites/one%2Bone%3Cfive with
Internet Explorer.
Apache HTTPD access log: [11/Jan/2010:12:58:04 +0200] "GET
/sites/one%2Bone%3Cfive HTTP/1.1" +302 -(-) 278 0 [11/Jan/2010:12:58:04 +0200] "GET
/contextroot/subcontext/sites/one+one%3cfive HTTP/1.1" +200
Apache Tomcat access log: - - [11/Jan/2010:12:58:04 +0200] "GET
HTTP/1.1" 200 399
What my application actually sees after decoding: sites/one one<five

The behavior is similar to what was observed before.

Now, I again manually modify the address bar to access
http://localhost/contextroot/subcontext/sites/one%2Bone%3Csix. This
Apache HTTPD access log shows: [11/Jan/2010:13:00:20 +0200] "GET
/contextroot/subcontext/sites/one%2Bone%3csix HTTP/1.1" +200
worker1(worker1) 399 15625
and Tomcat access log shows: - - [11/Jan/2010:13:00:18 +0200] "GET
/contextroot/subcontext/sites/one%2Bone%3csix HTTP/1.1" 200 399
This time my application sees the intended request after decoding the
URL: sites/one+one<six

Both mod_alias and mod_jk seem to decode and re-encode the URL during
their processing, and it seems like once de-encoded, neither module
will re-encode the '+' character. Is this intentional and desirable?
JkOption ForwardURICompatUnparsed seem to offer a workaround for this
problem for the load balancer, but the solution has its own problems
(incompability with mod_rewrite among other things, although that does
not seem to apply to browser redirects). I have not found a similar
workaround for mod_alias yet.

(mod_rewrite does have the [B] option, "encode backreferences", but my
brief experiments with [B,R] failed miserably with the results being
"one%252bone%253csix" and similar double-encoded garbage).

How should I change my configuration so that
http://localhost/sites/one%2Bone%3Cthree gives the same same results
as http://localhost:8082/contextroot/subcontext/sites/one%2Bone%3Cthree?

Best Regards,
Tero Karttunen

To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to