Hello,

    This temporary trunk fix adds support for request URIs containing
'|' characters. Such URIs are used by popular Amazon product (and
probably other) sites: /images/I/ID1._RC|ID2.js,ID3.js,ID4.js_.js

Without this fix, all requests for affected URIs timeout while Squid
waits for the end of request headers it has already received(*).


The proper long-term fix is to allow any character in URI as long as we
can reliably parse the request line (and, later, URI components). There
is no point in hurting users by rejecting requests while slowly
accumulating the list of benign characters used by web sites but
prohibited by some RFC.


HTH,

Alex.
P.S. (*) which is probably another parsing regression bug (not addressed
by this temporary patch or the long-term fix discussed above), but I
have not tested whether the very latest trunk still suffers from this
other bug.
Temporary fix to restore compatibility with Amazon (and probably other) pages.

This temporary fix adds support for request URIs containing '|' characters.
Such URIs are used by popular Amazon product (and probably other) sites:
/images/I/ID1._RC|ID2.js,ID3.js,ID4.js_.js

The proper long-term fix is to allow any character in URI as long as we can
reliably parse the request line (and, later, URI components). There is no
point in hurting users by rejecting requests while slowly accumulating the
list of benign characters used by web sites but prohibited by some RFC.

=== modified file 'src/http/one/RequestParser.cc'
--- src/http/one/RequestParser.cc	2015-04-10 11:02:44 +0000
+++ src/http/one/RequestParser.cc	2015-06-24 05:45:41 +0000
@@ -110,40 +110,43 @@ static CharacterSet
 uriValidCharacters()
 {
     CharacterSet UriChars("URI-Chars","");
 
     /* RFC 3986 section 2:
      * "
      *   A URI is composed from a limited set of characters consisting of
      *   digits, letters, and a few graphic symbols.
      * "
      */
     // RFC 3986 section 2.1 - percent encoding "%" HEXDIG
     UriChars.add('%');
     UriChars += CharacterSet::HEXDIG;
     // RFC 3986 section 2.2 - reserved characters
     UriChars += CharacterSet("gen-delims", ":/?#[]@");
     UriChars += CharacterSet("sub-delims", "!$&'()*+,;=");
     // RFC 3986 section 2.3 - unreserved characters
     UriChars += CharacterSet::ALPHA;
     UriChars += CharacterSet::DIGIT;
     UriChars += CharacterSet("unreserved", "-._~");
+    UriChars.add('|'); // used by Amazon
+    // XXX: To be real-world compatible, accept anything we can reliably parse
+    // (e.g., any non-whitespace?), and not just what RFC says is valid.
 
     return UriChars;
 }
 
 int
 Http::One::RequestParser::parseUriField(Http1::Tokenizer &tok)
 {
     // URI field is a sequence of ... what? segments all have different valid charset
     // go with non-whitespace non-binary characters for now
     static CharacterSet UriChars = uriValidCharacters();
 
     /* Arbitrary 64KB URI upper length limit.
      *
      * Not quite as arbitrary as it seems though. Old SquidString objects
      * cannot store strings larger than 64KB, so we must limit until they
      * have all been replaced with SBuf.
      *
      * Not that it matters but RFC 7230 section 3.1.1 requires (RECOMMENDED)
      * at least 8000 octets for the whole line, including method and version.
      */

_______________________________________________
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev

Reply via email to