[multiple inline responses] Rainer Jung wrote: > I doubt that such URLs are invalid - not based on any code inspection, > but simply on the fact that mod_jk decoded percent encoding before > forwarding for a long time (5.5 years, from Oct. 2001 to May 2007, > version 1.2.0 to 1.2.22). Since version 1.2.24 any bytes in the URI > expected to be unsafe are percent encoded before forwarding. At least > that's the default. If you use an non-default ForwardURIxxx option via > "JkOptions", then that behavior depend on the chosen setting. > > Nevertheless it makes sense to check and clarify. > > Which mod_jk version and JkOptions are you using?
We were indeed running with the "2007" default, ie resulting in ForwardURICompat which has an appropriate warning in the docs. But my point is not that a change in Tomcat could hit "us" - we will correct our config this week. My point is that invalidating these urls could break sites for folks that don't follow this mailing list and just update to the latest Tomcat ;-) Mark Thomas wrote: <snip> > While it is a little surprising that getRequestURI() returns > characters > outside of those defined for uric by RFC2396 given the circumstances I > think it is reasonable (for AJP) since that is what Tomcat received. > Arguably a byte that represents a character not in uric should be > re-encoded using %nn before including it in the return value for > getRequestURI() but I don't see a need to implement that. If it was > causing a problem somehow then I could be persuaded otherwise. > > > I am more surprised by the HTTP connector. Looking at the code it is > clear why this happens. The sequence is: > > 1. %nn -> byte > 2. normalise > 3. convert to characters > > Bytes that should have been %nn encoded but have not, simply skip the > first stage and then continue as normal. > > Where this could get messy is when the client converts multibyte > characters to bytes using one encoding and Tomcat converts those bytes > to characters using a different encoding. However, while this might > cause unexpected behaviour from the client's point of view I don't see > how this could cause a problem for Tomcat. Any sequence of bytes that > Tomcat ends up processing from stage 2 as a result of byte -> char > conversion issues onwards could be sent legally using %nn encoding. > > Tomcat could justifiably reject these requests as not > conforming to RFC > 2616. That said, RFC2616 also encourages servers to be tolerant about > that they receive from clients and I think this falls into that > category. As long as such behaviour does not cause a problem > for Tomcat > I think it is reasonable to leave the current behaviour as is. > > The leaves the behaviour of getRequestURI(). It is returning what the > client sent so no issue there. Again given a specific issue I'd be > prepared to look at %nn encoding for characters not in uric. I agree > access to the bytes would be ideal but since bytes are only necessary > when going above and beyond what is required by RFC 2616 it isn't > surprising that the Servlet EG opted to return a String here. I think we are talking about four alternatives on how to handle this. Here's my 2c about them: 1) Leave as is I wish getRequestURI() was declared with byte[] return value... It hurts to see these bytes copied straight into a string. I like this alternative the least. 2) Invalid, throw an error back at the client This is strict and clear, might surprise some folks if implemented in a point release. 3) Decode binary chars in getRequestURI() according to URIEncoding (ie, returning a fully decoded value.) This follows Postel's law. I like Postel's law. 4) Revert binary chars in getRequestURI() back to URL encoding (ie, returning a value with % notation.) This follows Postel's law. I like Postel's law. Best regards Mike --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org