Thank you for another reply, Chris! I was secretly hoping that
somebody would stand up and tell me that I have missed something
obvious, but the more I look into this issue, the messier it seems.
But let's not get ahead of things.

I apologize for the inconsistency in the log lines I posted in my
earlier message. I had tried to replace (and anonymize) the URL path
elements with s/ts_core_virtual_repository/contextroot/,
s/TeamCenterEmulator/subcontext/ and s/geek/localhost/, but I had
obviously missed some references. Sorry for the confusion. Even though
the cat's out of the bag, I will stick to the replacements for
consistency's sake.

>> Notice that mod_alias has erronously (considering the use case in
>> question) re-encoded the URL, causing %2B to change into '+' and %3C
>> to change into equivalent %3c.
>
> Note that mod_jk is not involved, here: mod_alias is performing the
> redirect and mod_jk does not get involved. Also, the change from %3C to
> %3c is not really a problem: HTTP allows either upper or lowercase
> %-encoded URI elements (see section 2.2 of
> http://www.ietf.org/rfc/rfc1738.txt).

This is correct. The superceding RFC 3986 states: "If two URIs differ
only in the case of hexadecimal digits used in percent-encoded octets,
they are equivalent", but it also continues: "For consistency, URI
producers and normalizers should use uppercase hexadecimal digits for
all percent-encodings", making one choice preferable to the other.

[JkOptions +ForwardUriProxy]
>> Now, if I manually modify the address bar to access
>> http://localhost/contextroot/subcontext/sites/one%2Bone%3Cfour,
>> Apache HTTPD access log now shows:
>> 131.177.146.160 [11/Jan/2010:12:53:37 +0200] "GET
>> /ts_core_virtual_repository/TeamCenterEmulator/sites/one%2Bone%3Cfour
>> HTTP/1.1" +200 worker1(worker1) 399 15625
>
> Good.
>
>> but Tomcat access log still shows:
>> 131.177.146.160 - - [11/Jan/2010:12:53:34 +0200] "GET
>> /ts_core_virtual_repository/TeamCenterEmulator/sites/one+one%3Cfour
>> HTTP/1.1" 200 399
>
> Right: that's wrong.
>
>> and my application sees after decoding the URL: sites/one one<four
>
> Given that Tomcat saw one+one%3Cfour, this is correct decoding.
>
> What does the mod_jk log show for this request?

I reran the test with JK request logging on, and the log shows:
[Wed Jan 13 12:14:13 2010] worker1 GET
/contextroot/subcontext/sites/one%2Bone%3Cfour HTTP/1.1 200 0.000000

so both Apache HTTPD and mod_jk logs show the correctly encoded URLs
from the browser request.

>> Quite interesting: No URL rewriting should occur at Apache HTTPD,
>> because the RedirectMatch rule does not match, but the URLs in HTTPD
>> and Tomcat access logs are semantically different.
>
> Well, the RedirectMatch rule does match for the first request, and it
> definitely appears that mod_alias is mangling your URL. Have you tried
> snooping the HTTP conversation to make sure it's not your web browser
> that is misinterpreting the 302 response from httpd?

Yes I have. Here is a telnet session for proof.

==============================
<username>@<hostname>:~ $ telnet localhost 80
Trying 131.177.146.160...
Connected to <hostname>.
Escape character is '^]'.
GET /sites/one%2Bone%3C HTTP/1.0

HTTP/1.1 302 Found
Date: Wed, 13 Jan 2010 09:57:32 GMT
Server: Apache/2.2.14 (Win32) mod_jk/1.2.28
Location: http://localhost/contextroot/subcontext/sites/one+one%3c
Content-Length: 274
Connection: close
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a
href="http://localhost/contextroot/subcontext/sites/one+one%3c";>here</a>.</p>
</body></html>
==============================

It looks to me like both mod_alias and mod_jk (with +ForwardUriProxy)
decode the URLs and do not subsequently re-encode the '+' character.

I do not have any C coding experience, but I attempted to check mod_jk
source code for this. mod_jk uses int jk_canonenc(const char *x, char
*y, int maxlen) function for encoding when +ForwardUriProxy is on.
Here are the juicy bits (slightly reformatted for brevity):

    /* characters which should not be encoded */
    char *allowed = "~$-_.+!*'(),;:@&=";
    /* characters which much not be en/de-coded */
    char *reserved = "/";
    for (i = 0, j = 0; ch != '\0' && j < maxlen; i++, j++, ch=x[i]) {
        if (strchr(reserved, ch)) { /* always handle '/' first */
            y[j] = ch;
            continue;
        }
        if (!JK_ISALNUM(ch) && !strchr(allowed, ch)) { /* recode it,
if necessary */
            if (j+2<maxlen) {
                jk_c2hex(ch, &y[j]);
                j += 2;
            }
            else {
                return JK_FALSE;
            }
        }
        else {
            y[j] = ch;
        }
    }

mod_alias and mod_rewrite are already reported to suffer from similar
encoding problems. There are several bug reports; the best I could
find was https://issues.apache.org/bugzilla/show_bug.cgi?id=32328.
Link http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=421820 is also
useful in trying to find out what is going on, as is RFC 3986.

All this is making my head hurt, but what I guess is going on is that
the original URL (still available as r->unparsed_uri) is being decoded
in Apache HTTPD at a very early stage, and once mod_jk or other
dispatchers activate, the r->uri they handle can already be a result
of multiple URI manipulations by mod_rewrite and other modules, and
for that reason it can be considered unsafe to mindlessly re-encode
some of its reserved characters. But this is only my first guess.

+ForwardUriCompatUnparsed solves the mod_jk part of the problem _for
me_, but while HTTPD people are working on bug 32328 (since 2007),
could it be benecifial for mod_jk to maybe offer a fifth Forwarding
mode as a workaround for the problem for mod_jk users? Maybe taking a
list of characters to be encoded as an extra argument?

Unfortunately, I still have no ideas on how to configure the URL
redirection for Apache HTTPD so that the plus-characters are preserved
in encoded format. Does anyone have any ideas or hints?

Thanks for help!

Tero Karttunen

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to