To tell the truth, I spent half the day just understanding what you were 
talking about and I had to do a number of tests before I actually did... And 
now I believe that the issue you raised is a real problem. Meanwhile I will 
suggest an alternate solution at the end of this post.

My conclusions are:

1. mod_rewrite actually un-escapes the URL path
I think it should not, but there may be a good reason for it.

2. the built-in RewriteMap function "escape" does not escape reserved 
characters like & and $
I think there is a need for a built-in function that actually does encode these 
characters. Maybe that function should be the currently existing escape 
function, or maybe a new function needs to be added.



Given the following configuration:
   RewriteLogLevel 9
   RewriteMap encode int:escape
   RewriteRule ^/folder/([^/]*)/([^/]*) 
/cgi-bin/printenv?vara=${encode:$1}&varb=$2 [PT,QSA]

I would expect the query "GET /folder/apples&oranges/more?varc=rock%26roll" to 
be handled in the following way:

1:  (2) init rewrite engine with requested uri /folder/apples&oranges/more
2:  (3) applying pattern '^/folder/([^/]*)/([^/]*)' to uri 
'/folder/apples&oranges/more'
3:  (5) map lookup OK: map=encode key=apples&oranges -> val=apples%26oranges
4:  (2) rewrite /folder/apples&oranges/more -> 
/cgi-bin/printenv?vara=apples&oranges&varb=more
5:  (3) split uri=/cgi-bin/printenv?vara=apples&oranges&varb=more -> 
uri=/cgi-bin/printenv, args=vara=apples%26oranges&varb=more&varc=rock%26roll
6:  (2) forcing '/cgi-bin/printenv' to get passed through to next API 
URI-to-filename handler

Instead of the expected line 3, what actually happens is:
 (5) map lookup OK: map=encode key=apples&oranges -> val=apples&oranges

which tells me that the escape function, for whatever reason, does not escape 
reserved characters except ";" (I did not test "/" and "?").


Furthermore I would expect "GET /folder/apples%26oranges/more" to be handled in 
the following way:

1:  (2) init rewrite engine with requested uri /folder/apples%26oranges/more
2:  (3) applying pattern '^/folder/([^/]*)/([^/]*)' to uri 
'/folder/apples%26oranges/more'
3:  (5) map lookup OK: map=encode key=apples%26oranges -> val=apples%2526oranges
4:  (2) rewrite /folder/apples&oranges/more -> 
/cgi-bin/printenv?vara=apples%2526oranges&varb=more
5:  (3) split uri=/cgi-bin/printenv?vara=apples%2526oranges&varb=more -> 
uri=/cgi-bin/printenv, args=vara=apples%2526oranges&varb=more&varc=rock%26roll
6:  (2) forcing '/cgi-bin/printenv' to get passed through to next API 
URI-to-filename handler

Instead of the expected line 1, I get 
 (2) init rewrite engine with requested uri /folder/apples&oranges/more

i.e. mod_rewrite un-escapes the URL path, which is carried over to the 
remainder of the processing.




I _know_ it is mod_rewrite that un-escapes the URL path and only the URL path, 
because when I request the URL "GET 
/cgi-bin/printenv?vara=apples%26oranges&varb=more" which is not processed by 
mod_rewrite, it comes out as expected at the other end, and a query string in a 
request processed by mod_rewrite containing an escaped character makes it 
through unchanged.


The problem can be circumvented by implementing your own escape function along 
the lines of:
-------
encode.pl:
#!/usr/bin/perl

select STDOUT ; $|=1;
while ( <> )
{
   $_ =~ s/%/%25/;
   $_ =~ s/&/%26/;

   /* Add other translation rules here */

   print $s;
}

$|=0
--------

   RewriteEngine On
   RewriteLog /u01/apachetest/logs/rewrite_log
   RewriteLogLevel 9

   RewriteMap encode prg:/u01/apachetest/conf/encode.pl

   RewriteRule ^/folder/([^/]*)/([^/]*) 
/cgi-bin/printenv?vara=${encode:$1}&varb=${encode:$1} [PT,QSA]

Then the request GET /folder/apples&%25oranges/more?varc=rock%26roll will end 
up as
(2) rewrite /folder/apples&%oranges/more -> 
/cgi-bin/printenv?vara=apples%26%25oranges&varb=more
(3) split uri=/cgi-bin/printenv?vara=apples%26%25oranges&varb=more -> 
uri=/cgi-bin/printenv, args=vara=apples%26%25oranges&varb=more&varc=rock%26roll


Hope this works.

If nobody else replies with an explanation as to why mod_rewrite behaves the 
way it does, maybe you should file a bug report...

-ascs











>From RFC 2396:

2.2. Reserved Characters

   Many URI include components consisting of or delimited by, certain
   special characters.  These characters are called "reserved", since
   their usage within the URI component is limited to their reserved
   purpose.  If the data for a URI component would conflict with the
   reserved purpose, then the conflicting data must be escaped before
   forming the URI.

      reserved    = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
                    "$" | ","

   The "reserved" syntax class above refers to those characters that are
   allowed within a URI, but which may not be allowed within a
   particular component of the generic URI syntax; they are used as
   delimiters of the components described in Section 3.


2.4.2. When to Escape and Unescape
                                              Normally, the only time
   escape encodings can safely be made is when the URI is being created
   from its component parts; each component may have its own set of
   characters that are reserved, so only the mechanism responsible for
   generating or interpreting that component can determine whether or
   not escaping a character will change its semantics. Likewise, a URI
   must be separated into its components before the escaped characters
   within those components can be safely decoded.



---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: [EMAIL PROTECTED]
   "   from the digest: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to