Re: mod_jk: plus-character causes %-encoding problems

Tero Karttunen Thu, 14 Jan 2010 12:34:40 -0800

> Why is '+' decoded to ' ' in the path part of the URL?
> That is, I think, wrong.


This is an interesting theory. If true, it could provide an
explanation to the observed behavior, but I cannot completely follow
it.

> The '+' char has no special meaning in HTTP/1.1 (RFC 2616) [1], so in
> the path part of the URL it just means itself, the plus sign.

On the other hand, the same RFC provides a counter-example. Look at
section 3.2.3 "URI comparison". It says that characters other than
those in the "reserved" and "unsafe" sets are equivalent to their
%-encoded counterparts. The reserved set as defined in RFC 2396 (and
the later RFC 3986 that obsoletes it) include '+' character.

I believe the chapter 3.2.3 means that the characters in the reserved
set are not equivalent to their %-encoded counterparts, and in this
way, /contextroot/subcontext/sites/one+one%3cfive IS NOT equivalent to
/contextroot/subcontext/sites/one%2Bone%3cfive when doing URI
comparison.

> It is the HTML Forms spec [2] that makes it special, defining
> "urlencoding" used when submitting web forms through HTTP. It has
> special meaning only in the query part of the URL and only because of
> that part of HTML spec.

HTML Forms spec does define www-form-urlencoding, but I can't tell
from the spec whether it is limited to just the query part.

>> What my application actually sees after decoding: sites/one one<five
>
> What is your application code here? Where and how do you obtain the
> "decoded" value?

I am using Apache Commons URLCodec to decode the URL. This widely-used
utility class does not make the distinction between path and query
parts...

Let me explain my application to you before I provide the code example
to you. As you could guess from its name TeamCenterEmulator, my
application emulates a set of former URLs, continuing to serve the
pre-existing links while the legacy application is retired.

My application is configured with a CSV file containing a mapping
between an URL and a resource it is supposed to serve (in a dynamic
fashion, it is not a simple file). Say, the application could contain
the following mapping:

<former url>                        <response>
/sites/foo                            file1
/sites/bar                            file2
/sites/one%2Bone%cthree   file3
/sites/one%2Bone%cfour     file4
/sites/one%2Bone%cfive      file5
...

Once the application initializes, it reads the mapping into memory,
and if the request matches the former url EXACTLY, the matching
response is returned. This is the application spec. Note here that by
RFC 2616-compliant URI comparison, my application must regard request
/sites/one+one%cfive as a non-match!

Here is doGet from my servlet. Note that I am trimming the URL to
start from the "sites" part for obvious reasons...

        protected void doGet(HttpServletRequest request, HttpServletResponse
response) throws ServletException, IOException {
                super.doGet(request, response);
                if (config == null) {
                        config = new
ConfigurationFactory().createConfiguration(getServletContext().getInitParame
ter("teamCenterURLMapping"));
                }
                String urlSnippet = (getServletContext().getContextPath() +
"/" + getServletConfig().getServletName() + "/");
                String url = "";
                if (request.getRequestURI().length() > urlSnippet.length())
{
                        url =
request.getRequestURI().substring(urlSnippet.length());
                }
               try {
                        TeamCenterConfigurationItem item =
config.findByURL(url);
                        [...]
               catch (UnknownUrlException) {
               ...
               }
}

I am not going to post ConfigurationFactory, because it is not
interesting. It basically builds a HashMap based on the CSV file that
has URLCodec.decode()'d former urls as its keys, with the idea that if
we URL-decode the incoming request, we can search the HashMap for
matches.

Here is how the abovementioned findUrl method does just that:

        public TeamCenterConfigurationItem findByURL (String url) throws
UnknownUrlException {
                URLCodec codec = new URLCodec("UTF8");
                try {
                        url = codec.decode(url);
                        logger.info(url);
                } catch (DecoderException e) {
                        logger.error(e);
                        throw new UnknownUrlException (url);
                }
                if (config.containsKey(url)) {
                        return config.get(url);
                }
                throw new UnknownUrlException (url);
        }

What do you think? Is my approach valid? Am I somehow abusing
URLCodec? Should the request be (partially) decoded in some other way?

Best Regards,
Tero Karttunen

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: mod_jk: plus-character causes %-encoding problems

Reply via email to