Re: [WWWOFFLE-Users] is an URI with & and no ? valid?

Andrew M. Bishop Tue, 19 Jun 2007 21:06:45 -0700

Miernik <[EMAIL PROTECTED]> writes:

> I have a problem with a website:
> 
> Sniffed direct connection headers:
> 
> GET /register.html&termsread=1&agree_to_terms=1 HTTP/1.0
...


> HTTP/1.1 200 OK
...


> Now the same through wwwoffle:
> 
> GET /register.html%26termsread=1%26agree_to_terms=1 HTTP/1.0
...

> HTTP/1.1 301 Moved Permanently
> Location: 
> http://www.moneymakergroup.com/register.html&termsread=1&agree_to_terms=1
...


> GET /register.html%26termsread=1%26agree_to_terms=1 HTTP/1.0
...

> HTTP/1.1 301 Moved Permanently
> Location: 
> http://www.moneymakergroup.com/register.html&termsread=1&agree_to_terms=1
...

> .... and so on redirected back and forth endlessly.
> 
> Is this WWWOFFLE's fault of the website? Any workarounds (providing I want it
> proxied and cached)?

It is the fault of the website.

At the very least it should accept the URL encoding that has been
applied to the '&' character:

-------------------- RFC 1123 --------------------
      1.2.2  Robustness Principle

         At every layer of the protocols, there is a general rule whose
         application can lead to enormous benefits in robustness and
         interoperability:

                "Be liberal in what you accept, and
                 conservative in what you send"

-------------------- RFC 1123 --------------------

> Why does WWWOFFLE want to substitute & with %26 so much?

There is a document README.URL that is in the doc directory of the
WWWOFFLE source archive that explains how URLs are formed.

One of the key points is that WWWOFFLE must be able to always
recognise the same URL even if it is represented differently with the
encoding of some characters as %xx.  Keeping a character or encoding
it still references the same entity even if the URL looks different
(unless the character is being used for its reserved purpose at that
time).

-------------------- RFC 2396 --------------------
   In some cases, data that could be represented by an unreserved
   character may appear escaped; for example, some of the unreserved
   "mark" characters are automatically escaped by some systems.  If the
   given URI scheme defines a canonicalization algorithm, then
   unreserved characters may be unescaped according to that algorithm.
   For example, "%7e" is sometimes used instead of "~" in an http URL
   path, but the two are equivalent for an http URL.
-------------------- RFC 2396 --------------------


There is also a comment in the source code (miscencdec.c) that
explains the particular choices that I made:

 /*
   The characters in the range 0x00-0x1f and 0x7f-0xff are always disallowed.
   The '%' character is always disallowed because it is the quote character.
   RFC 1738 section 2.2 calls " <>"#%{}|\^~[]`" unsafe characters, I make an 
exception for '~'.
   RFC 1738 section 2.2 calls ";/?:@=&" reserved characters, I make an 
exception for ";/:=".
   I disallow "'" because it may lead to confusion.
 */

Later on in RFC 1738 it shows that the '&' character is allowed in the
path part of a URL.  The decision was made for WWWOFFLE that some some
characters would be encoded in the path and some would not.  Since
encoding or not doesn't change the entity that is referenced is
shouldn't make a difference except to readability.

Over the years people have argued about several of the choices that
have been made with respect to encoding certain characters.  I can't
keep changing WWWOFFLE backwards and forwards to meet peoples
requests.  There is one character that some servers insist must be
encoded and others insist must not be encoded (two separate WWWOFFLE
bug reports).  That case has no solution so I have given up trying to
keep WWWOFFLE up to date with every bug report since the server is
definitely in error and WWWOFFLE is in a grey area.

If you change WWWOFFLE then it would work, but make inaccessible any
cached files that use the old encoding method.  They would need to be
renamed before they could be accessed.

-- 
Andrew.
----------------------------------------------------------------------
Andrew M. Bishop                             [EMAIL PROTECTED]
                                      http://www.gedanken.demon.co.uk/

WWWOFFLE users page:
        http://www.gedanken.demon.co.uk/wwwoffle/version-2.9/user.html

Re: [WWWOFFLE-Users] is an URI with & and no ? valid?

Reply via email to