> Why is '+' decoded to ' ' in the path part of the URL? > That is, I think, wrong.
This is an interesting theory. If true, it could provide an explanation to the observed behavior, but I cannot completely follow it. > The '+' char has no special meaning in HTTP/1.1 (RFC 2616) [1], so in > the path part of the URL it just means itself, the plus sign. On the other hand, the same RFC provides a counter-example. Look at section 3.2.3 "URI comparison". It says that characters other than those in the "reserved" and "unsafe" sets are equivalent to their %-encoded counterparts. The reserved set as defined in RFC 2396 (and the later RFC 3986 that obsoletes it) include '+' character. I believe the chapter 3.2.3 means that the characters in the reserved set are not equivalent to their %-encoded counterparts, and in this way, /contextroot/subcontext/sites/one+one%3cfive IS NOT equivalent to /contextroot/subcontext/sites/one%2Bone%3cfive when doing URI comparison. > It is the HTML Forms spec [2] that makes it special, defining > "urlencoding" used when submitting web forms through HTTP. It has > special meaning only in the query part of the URL and only because of > that part of HTML spec. HTML Forms spec does define www-form-urlencoding, but I can't tell from the spec whether it is limited to just the query part. >> What my application actually sees after decoding: sites/one one<five > > What is your application code here? Where and how do you obtain the > "decoded" value? I am using Apache Commons URLCodec to decode the URL. This widely-used utility class does not make the distinction between path and query parts... Let me explain my application to you before I provide the code example to you. As you could guess from its name TeamCenterEmulator, my application emulates a set of former URLs, continuing to serve the pre-existing links while the legacy application is retired. My application is configured with a CSV file containing a mapping between an URL and a resource it is supposed to serve (in a dynamic fashion, it is not a simple file). Say, the application could contain the following mapping: <former url> <response> /sites/foo file1 /sites/bar file2 /sites/one%2Bone%cthree file3 /sites/one%2Bone%cfour file4 /sites/one%2Bone%cfive file5 ... Once the application initializes, it reads the mapping into memory, and if the request matches the former url EXACTLY, the matching response is returned. This is the application spec. Note here that by RFC 2616-compliant URI comparison, my application must regard request /sites/one+one%cfive as a non-match! Here is doGet from my servlet. Note that I am trimming the URL to start from the "sites" part for obvious reasons... protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { super.doGet(request, response); if (config == null) { config = new ConfigurationFactory().createConfiguration(getServletContext().getInitParame ter("teamCenterURLMapping")); } String urlSnippet = (getServletContext().getContextPath() + "/" + getServletConfig().getServletName() + "/"); String url = ""; if (request.getRequestURI().length() > urlSnippet.length()) { url = request.getRequestURI().substring(urlSnippet.length()); } try { TeamCenterConfigurationItem item = config.findByURL(url); [...] catch (UnknownUrlException) { ... } } I am not going to post ConfigurationFactory, because it is not interesting. It basically builds a HashMap based on the CSV file that has URLCodec.decode()'d former urls as its keys, with the idea that if we URL-decode the incoming request, we can search the HashMap for matches. Here is how the abovementioned findUrl method does just that: public TeamCenterConfigurationItem findByURL (String url) throws UnknownUrlException { URLCodec codec = new URLCodec("UTF8"); try { url = codec.decode(url); logger.info(url); } catch (DecoderException e) { logger.error(e); throw new UnknownUrlException (url); } if (config.containsKey(url)) { return config.get(url); } throw new UnknownUrlException (url); } What do you think? Is my approach valid? Am I somehow abusing URLCodec? Should the request be (partially) decoded in some other way? Best Regards, Tero Karttunen --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org