Miernik <[EMAIL PROTECTED]> writes: > I have a problem with a website: > > Sniffed direct connection headers: > > GET /register.html&termsread=1&agree_to_terms=1 HTTP/1.0 ...
> HTTP/1.1 200 OK ... > Now the same through wwwoffle: > > GET /register.html%26termsread=1%26agree_to_terms=1 HTTP/1.0 ... > HTTP/1.1 301 Moved Permanently > Location: > http://www.moneymakergroup.com/register.html&termsread=1&agree_to_terms=1 ... > GET /register.html%26termsread=1%26agree_to_terms=1 HTTP/1.0 ... > HTTP/1.1 301 Moved Permanently > Location: > http://www.moneymakergroup.com/register.html&termsread=1&agree_to_terms=1 ... > .... and so on redirected back and forth endlessly. > > Is this WWWOFFLE's fault of the website? Any workarounds (providing I want it > proxied and cached)? It is the fault of the website. At the very least it should accept the URL encoding that has been applied to the '&' character: -------------------- RFC 1123 -------------------- 1.2.2 Robustness Principle At every layer of the protocols, there is a general rule whose application can lead to enormous benefits in robustness and interoperability: "Be liberal in what you accept, and conservative in what you send" -------------------- RFC 1123 -------------------- > Why does WWWOFFLE want to substitute & with %26 so much? There is a document README.URL that is in the doc directory of the WWWOFFLE source archive that explains how URLs are formed. One of the key points is that WWWOFFLE must be able to always recognise the same URL even if it is represented differently with the encoding of some characters as %xx. Keeping a character or encoding it still references the same entity even if the URL looks different (unless the character is being used for its reserved purpose at that time). -------------------- RFC 2396 -------------------- In some cases, data that could be represented by an unreserved character may appear escaped; for example, some of the unreserved "mark" characters are automatically escaped by some systems. If the given URI scheme defines a canonicalization algorithm, then unreserved characters may be unescaped according to that algorithm. For example, "%7e" is sometimes used instead of "~" in an http URL path, but the two are equivalent for an http URL. -------------------- RFC 2396 -------------------- There is also a comment in the source code (miscencdec.c) that explains the particular choices that I made: /* The characters in the range 0x00-0x1f and 0x7f-0xff are always disallowed. The '%' character is always disallowed because it is the quote character. RFC 1738 section 2.2 calls " <>"#%{}|\^~[]`" unsafe characters, I make an exception for '~'. RFC 1738 section 2.2 calls ";/?:@=&" reserved characters, I make an exception for ";/:=". I disallow "'" because it may lead to confusion. */ Later on in RFC 1738 it shows that the '&' character is allowed in the path part of a URL. The decision was made for WWWOFFLE that some some characters would be encoded in the path and some would not. Since encoding or not doesn't change the entity that is referenced is shouldn't make a difference except to readability. Over the years people have argued about several of the choices that have been made with respect to encoding certain characters. I can't keep changing WWWOFFLE backwards and forwards to meet peoples requests. There is one character that some servers insist must be encoded and others insist must not be encoded (two separate WWWOFFLE bug reports). That case has no solution so I have given up trying to keep WWWOFFLE up to date with every bug report since the server is definitely in error and WWWOFFLE is in a grey area. If you change WWWOFFLE then it would work, but make inaccessible any cached files that use the old encoding method. They would need to be renamed before they could be accessed. -- Andrew. ---------------------------------------------------------------------- Andrew M. Bishop [EMAIL PROTECTED] http://www.gedanken.demon.co.uk/ WWWOFFLE users page: http://www.gedanken.demon.co.uk/wwwoffle/version-2.9/user.html
