Please note: most of my comments in this long email are informational. Things which need to be taken into consideration when touching the parser.
Audit proper follows at the end. I request performance testing of the final version of this change vs lastest trunk at the time. If possible an emulated Slow-Loris DoS attack of up to 1MB of random printable ASCII and whitespace characters drip-fed in 1-5 bytes each 15 sec (followed by 1 LF) would be nice too. Avg socket close rate from Squid end is what matters. Though thats a wishlist since I've not done such on Squid since ~2010. On 23/07/2015 4:05 p.m., Alex Rousskov wrote: > Hello, > > The attached patch implements a simpler and more robust HTTP request It is less robust as many potential attacks are now accepted by the Parser as valid HTTP messages. These are implied in RFC 7230 section 9.2. Though only the common attacks are called out since there are so many posibilities. Squid may "work" as a consequence only because this code was the primary security protection layer. Internal parsers later on (if any) are sometimes less robust and picky. If this Parser is made extremely generic the latter parsing logic is going to have to change to find these same problems, which will only make Squid reject the same non-compliant traffic - after much wasted processing of the message. In these threads we have had around parsing I am getting the impression you think of the HTTPbis WG documents as restrictive arbitrary committee designs. In the contrary they are explicity tested documents based on what the real-world traffic has been actively observed doing over the past 10 years. HTTPbis has been luckier than most WG in the regard that more actual implementers are participating than academics. Where the real-world traffic was desirable the behaviour was extended to be allowed. Intro things like CR, VT, FF as 'tollerant' request-line delimiters, use of '#' for #fragment in URLs, explicit minimum URL length of 4KB, request method length limit, CR*CR tolerance, etc. Where the real-world traffic was tracked down to malicious behaviour or causing indirect vulnerability exposure it was explicitly forbidden. Intro things like removal of URI user-info segment, explicit single-digit HTTP version, forbidding of whitespace after "HTTP/1.1" token, single SP delimiter in 'strict' parse, etc. With that in mind while reading your next point ... > line parser. After getting a series of complaints and seeing admins > adding more and more exceptional URL characters to the existing trunk ... I hope you can see the real problem is the admins approach to that traffic. They are not just allowing it through Squid (with any interpretation problems we have). They are also dangerously exposing all those non-Squid implementations with known vulnerabilities to how this traffic gets *output* after the syntax interpretation by Squid. I have caved with r14157 on the URI-internal CharacterSet we accept, since I don't have such high confidence in the RFC3986 changes matching real-world as much as HTTPbis documents. That may prove to have been a mistake, since the key HTTPbis people had a lot of input to that RFC publication too. AFAIK, the only non-permitted URI characters in current trunk are the *binary* CTL characters. Squid does not even check for correct positioning of delimiters in tollerant parse at this stage of the processing. Which makes me wonder exactly what these "more and more" extra characters the admin are adding are? I hope you are referring to the r14157 set, as opposed to people blindly trying to let those request-line injection attacks Section 9.2 talks about work through their proxies. > parser, I took the plunge and rewrote the existing code using the > approach recently discussed on this list. This Http1::RequestParser has one purpose, and one only. To identify a standards compliant (RFC 1945 or 7230) HTTP/1 request-line. All other inputs are undefined behaviour intended to be handled by other Parsers. Some in future, some already existing (ie FTP). I have no problem with adding a Parser for handling non-HTTP garbage. But pretending this traffic is HTTP is plain wrong. It makes more sense to just TUNNEL the traffic instead of trying to apply a Parser to it. Anyhow, we have been over that already. > > The primary changes are: Removed incremental parsing and revised parsing > sequence to accept virtually any URI (by default and also configurable > as before). Firstly, I note that you still have incremental parsing in the code itself. Just not for fields internal to the request-line. The result of this change is that Squid will become vulnerable to even the most trivial of "Slow Loris" attacks. The very old Squid used to pass the request-line buffer through this first stage parser, then apply a second stage parser for sanity checks, and a third-stage parser for characterset checking. One of those would (relatively) quickly reject a whole attack message based on request-line contents alone. That was all merged down into the incremental Parser in the form of the syntax and character set validation you complain about. Of course Squid without input validation will work faster on most/compliant traffic than one that spends time validating inputs. Squid which re-parse the message from scratch on every byte received are vulnerable to Slow Loris even in common compliant-HTTP message parsing. This was a minor problem for large 64K-ish URLs, but not significant enough to fix and issue advisories etc. Now that SBuf raises Squid input limit to 2MB just for URL we would have to do something about it. I prefer incremental parse which fixes both that and the CPU load issues. > > Known side effects: > > * Drastically simpler code. .. and much more vulnerable Squid. Whole categories of attack against both Squid and its backends are now tolerated when they were rejected or greatly restricted earlier. + side channel attack on backends or cache in method name, raised from 16-byte to arbitrary length of attack code. + cache poisoning via request pipeline. Great! + XSS attack in both method name and URI. (Also, adding shell code injection as previously just SQL injection.) + Shell-shock vulnerability exposure to previously protected backends. + increased SQL-injection exposure to backends via URL. + increased JSON-injection exposure to backends. (though r14157 does these last bunch too, it was restricted not to occur in POSTs or arbitrary methods) > * Some unit test case adjustments. > * The new parser no longer treats some request lines ending with > "HTTP/1.1" as HTTP/0.9 requests for URIs that end with "HTTP/1.1". Following 'random garbage' with "HTTP/1.1" does not make it HTTP compliant. This adds direct and explicit violations of RFC 1945, 3896 and 7230. > * The new parser no longer re-allocates character sets while parsing > each request. Okay. I like that bit a lot. :-) > > Also removed hard-coded 16-character method length limit because I did > not know why that limit was needed _and_ there was a TODO to make it > configurable. Please let me know if there was a good reason for that > limit, and I will restore it. It is there to restrict damage from active malware in HTTP/1. Namely Slow Loris, random method cache busting and use of method names as an open botnet C&C channel. For backward compatibility it is an implementation-decided limit, but some small-ish size does need to be imposed. The smaller it can be made the better protected Squid is. So configurable makes sense. FWIW: All HTTP and extension methods are now registered. At the time I wrote the parse none exceeded 10 characters and it was looking like none would be long. I see someone has found an RFC that now pushes it up to 17. > > No changes to parsing HTTP header fields (a.k.a. the MIME block) were > intended. > > This patch requires "suffix parsing and skipping" patch from my previous > email to the list. Like that patch, this one is based on trunk r14083. I > will update to the current trunk if the patches are approved. Please let > me know if you want to review the updated versions as well. It needs to be compared against r14157 or later. Particularly since one of your claims is that it works better for the traffic using the extra characters where support was only added in r14157 ;-) FWIW: all my statements regarding "current trunk" are referring to trunk r14172. That does not affect the code audit details. Just the applicability of some of your statements and claims vs my counter points. I think I've use "current trunk" when r14157 was relevant but may have missed some. > > The patch changes are further detailed below. > > > * Removal of incremental request line parsing. > > Squid parsed the request line incrementally. That optimization was > unnecessary: > - most request lines are short enough to fit into one network I/O, > - the long lines contain only a single long field (the URI), and > - the user code must not use incomplete parsing results anyway. > > Incremental parsing made code much more complex and possibly slower than > necessary. > > The only place where incremental parsing of request lines potentially > makes sense is the URI field itself, and only if we want to accept URIs > exceeding request buffer capacity. Neither the old code, nor the > simplified one do that right now. With this patch as-is method length will begin to impact on that criteria as well. Non-incremental will also re-introduce admin complaints about high CPU consumption on traffic where they receive long URLs over a number of very small packets. ie the traffic which is normal, valid but looks like Slow Loris (or is actually). Note that real-world URLs have a *minimum* accepted limit of 4KB now, and sightings have been measured with URL as high as 200KB in length from some domains. Most origin servers wont accept >32KB though. Note also, with SBuf processing the natural limit of Squid buffering is raised from 64KB to 2MB. We have known Gigabit installations where the config value is significantly higher than 64KB (old Squid force it down). With current trunk those installations would have been using their actually-configured buffer sizes. Also; a major bug: Consider Squid parsing the pipeline of requests which are delivered in a single packet, and thus all exists in buffer simultaneously: " GET http://example.com/ HTTP/1.1\n \n GET http://attacker.invalid/ HTTP/1.1\n X-Custom: HTTP/1.1\n " All Squid default installations use relaxed parse with violations enabled. The new tok.prefix(uriFound, UriCharacters()) will treat this all as a single request-line: "GET http://example.com/...attacker.invalid/ HTTP/1.1" label. This has four nasty side effects: 1) request-injection vulnerability. Where the server named in the first request now gets to reply with whatever it likes to the second request and Squid will treat it as the response to whatever third request arrived in the later packet(s) on the connection. 2) information leak vulnerability #1. Where all the client headers, credentials, Cookies, or otherwise which would have been sent on the first requests mime headers is passed in a "URL" which may be logged, published or whatever (be it Squids own log, or the example.com origin). 3) information leak vulnerability #2. Where all the client headers, credentials Cookies, or otherwise which would have been sent on the second requests mime header is passed to the server handling the first request. 4) information leak vulnerability #3. The combined size of the two requests may trigger Squids URL-too-big error (or if we are lucky invalid-request). The error page contains a copy of the "URL". Resulting in the mime headers for the first request being presented to the components inside browser sandboxing. Which otherwise may not have been able to access them. - we already have recent complaints about this (4) being used by advertising malware to scrape user credentials and session cookies. > > * Accept virtually any URI (when allowed). > > 1. USE_HTTP_VIOLATIONS now allow any URI characters except spaces. Except? Ironic. SP (0x20) in URL is the most common malformed URI character. When violations are disabled or strict parser tested broken services using SP in URLs un-encoded are always the first to be identified. The only 0x00-0xFF character not actively seen in "URLs" in real-world traffic is LF. And I'm not entirely sure about that either, it may have been an artifact of LF use in the measurement tools HTTPbis used to collect the dataset. In any case we can't accept bare LF within URL for practical issues around finding the end of line. NP: yes it may sound like I'm agreeing with you on accepting any characters. However I distinguish between characters which are seen mainly in real-world *attack* traffic versus characters seen mainly in real-world *user* traffic. That matters a lot, and admin complain about blocking either in my experience. > 2. relaxed_header_parser allows spaces in URIs. > By "spaces" you mean the extended RFC 7230 compliant whitespace. Good. > The combination of #1 and #2 allows virtually any URI that Squid can > isolate by stripping method (prefix) and HTTP/version (suffix) off the > request line. This approach allows Squid to forward slightly malformed > (in numerous ways) URIs instead of misplacing on the Squid admin the > burden of explaining why something does not work going through Squid but > works fine when going directly or through another popular proxy (or > through an older version of Squid!). The current trunk accepts more real-world traffic malformations than older Squid did: * CR and NUL characters are accepted within URLs. * CR, VT, FF are accepted as delimiters between request-line fields. * whole empty lines are accepted prior to any request. * series of multiple CR (0-N) are accepted in line terminator * '#' is now accepted in URL ... and that is with --disable-http-violations. After you convinced me to open URI field to RFC 3986 violations (trunk r14157) the only new restrictions in current trunk AFAIK are method length and binary CTL codes. > > URIs in what Squid considers an HTTP/0.9 request obey the same rules. > Whether the rules should differ for HTTP/0 is debatable, but the current > implementation is the simplest possible one, and the code makes it easy > to add complex rules. Which means we can no longer claim RFC 1945 compliance. The old RFC is remarkably restrictive. MUST level requirements on individual characters in the URL (via RFC 1738), their positioning, and method / prefix "GET " (with exactly one explicitly SP character delimiter required). NP: RFC 7230 formally dropped the requirement to also support RFC 1945 HTTP/0.9 simple-request syntax. It has not been seen in any of the traffic measured by HTTPbis WG in the past 5 years. So is now optional for on-wire traffic parsing. I left RFC 1945 compliance to reduce the unnecessary restrictions on what Squid accepts. Removing it alone will allow us to dramatically simplify the logics in the existing parser (~15% ?) without any of the other changes you are trying to get through in this patch. > > * Code simplification. > > RequestParser::parseRequestFirstLine() is now a simple sequence of > sequential if statements. There is no longer a path dedicated for the > [primarily academic?] strict parser. The decisions about parsing > individual fields and delimiters are mostly isolated to the > corresponding methods. Strict parsing is popular in some circles outside academia. Namely anyone interested in reporting and fixing the issues promoted under the decription "watermelon metrics" a few years back. CDN and Cloud providers with Squid frontends also have an interest. Since their RESTful services are explicitly bounded in regards to URI needs and earlier input validation means less load on backend service. They are also in a prime position to fix internal logic resulting in malformed URLs. The strict parse not having to bother with 2-pass reverse parsing offers faster REST transaction latency. As I mentioned when the topic of this re-write came up earlier, I request performance testing for this change. We lost a lot of performance optimizations in the last re-write and any further polishing needs to counter that one way or another. > > * Unit test cases adjustments. > > Removal of incremental request line parsing means that we should not > check parsed fields when parsing fails or has not completed yet. > > Some test cases made arguably weird decisions apparently to accommodate > the old parser. The expectations of those test cases are [more] natural now. Which ones? The units were *all* testing actual real-world input cases or an equivalent of a real-world cases. Most of them are actually testing groups of stuations as spot-checks. I only marked those which seemed unusually weird to me as "real world" to prevent others also feeling like removing them. Your definition of weird may vary and the set may be incomplete regarding true weirdness. Case in point; the way Bing sends raw JSON code in URLs is not tested for, and the URI variant of shell-shock malware PoC is not tested for. > > I did not attempt to fix rampant test code duplication, but I did add > optional (and disabled by default) debugging, to help pin-point failures > to test sub-cases that CPPUNIT cannot see. > > Changing request methods to "none" in test sub-cases with invalid input > was not technically necessary because the new code ignores the method > when parsing fails, but it may help whoever would decide to reduce test > code duplication (by replacing hand-written expected outcomes for failed > test cases with a constant assignment or function call). > Now the audit... * I believe we can retain incremental parsing despite your changes here. As noted above its use was a fairly major protection against some very trivial attack vectors. - I agree the current incremental implementation is not optimal, but rather fix than removing completely IMO. * please use doxygen '///' comments to document. * the new CharacterSet methods seem like they should be moved to the parent Http::Parser since they will also be useful for generic URI parsing within both Request and Reply mime headers. - although note that I was hoping to followup the bug 1961 changes with a class URI parser that gets called by RequestParser to handle the URI field properly instead of the multi-pass parse we have for URI now. So they may not be existing long-term anyway. * please also note that current trunk does not use ::Parser::Tokenizer since r14113. in Http::One::RequestParser::parseMethodField(): * assert() within the Parser is not acceptible. - If there is any possibility of that case happening it it a DoS vulnerability in Squid - which MUST be handled cleanly with a parser error. * maxMethodLength is a Security Considerations requirement of RFC 7230. - Part of the protections against injection attacks implied by section 9.2 (the category of method field injection vulnerabilities). - Also more specifically a Slow Loris attack protection. - we/admin get to decde what it is, but a limit should be kept. Probably double it to 32 is better than 16 now? in Http::One::RequestParser::parseUriField(): * changes to maxUriLength allow SquidString assertions to be encountered. Sorry if the comment was not clear enough. The entire request-line (method, URI, version) have to be stored in 64KB String object later for error page display, storage and/or ICP/HTCP query use, etc. - Also, now that method has no length limit the calculation for methods of 64KB length may produce a negative value for URLs available/accepted size. This is another reason to restrict methods. - I know CoAdvisor does not like it. But the RFC it is testing for does not actually say we have to support 64KB. Just a minimum of 4KB. Apparently real-world server limits is 32KB for software other than Squid anyway. So 64K - 36 bytes URI length should not be an issue. in Http::One::RequestParser::UriCharacters(): * strict parse should always output StrictChars regardless of violations build. - consider that config directive being OFF to mean the admin wants strict compliant parsing. Not partially-relaxed parsing. - violations are for affecting relaxed parse mode, since its just extending the relaxation beyond what the standards permit. * RelaxedCharsPlusWhite includes LF character. - this will result in some traffic parsing the entire Mime header portion of messages as part of the URL. see above for details. in Http::One::RequestParser::parseHttpVersionField(): * "Rewrite if we ever _need_ to return 505" - this has already come to pass. HTTP/2 prefixes are already used on port 443 and port 80 traffic by all software listed as supporting "direct" HTTP/2 in <https://github.com/http2/http2-spec/wiki/Implementations>. Probably others as well since that set includes frameworks like Perl Protocol::HTTP2. Not just specific applications. - the restriction of HTTP-version from N.N numbers to DIGIT.DIGIT characters was another explicitly tested HTTPbis WG change. Accepting in Http::One::RequestParser::skipDelimiter(): * please note how HTTP Parser use HttpTokenizer for specific skip*() things like what this function does. in Http::One::RequestParser::parseRequestFirstLine(): * the search for LF is done before method validation. - This has a major impact on real-world traffic handling; + Squid hangs when *any* protocol lacking a LF character in its client request handshake uses port 80 or 443. + oh yeah, and Christos "unknown protocol" handling feature of Squid-4 is now useless for those. - The existing code was VERY carefully crafted to validate the prefix method+delim before trying to find the LF and reverse-parse the remaining line. - validating method first is also significantly faster for rejecting binary protocols. in src/tests/testHttp1Parser.cc file: * please do not use #ifdef. - use #if defined() for things we do not control if mere existence is sufficient. - use plain #if for things where 0 vs 1 value matters. * please use std::endl instead of "\n" debug line terminators - it ensures the proper portable newline syntax is used for the OS being tested on. * in testResults() one FYI: - CPPUINT macros catch assert and exceptions for code run inside them. This used to be useful for catching unexpected throws and asserts during CPPUNIT_ASSERT_EQUAL(expect.parsed, output.parse(input)) * the "in method" -> "after method" change is incorrect IMO: - the HTTP syntax has SP-delimiter as the end of method. An SP delimiter is present in the input test. So the \0 byte and \x16 ae *within* the syntax pre-whitespace delimited area. Bytes in that area are *all* called "method" in HTTP. In the real-world traffic the \0 case was derived from the sender was intending to send 3 bytes of method but incorrectly sent 4 and included the method c-string terminator *inside* its emitted method. The real-world traffic the \x16 case is derived from includes both a binary framing protocol with valid chars mistaken for method, and a DoS with sender just delivering random characters. Both embeded binary in random prefix/suffix positions for the method string. Again the binary is inside the method field when interpreted as HTTP. - these tests are specifically doing triple-duty testing the case where binary prefixed by "GET" may be confused with GET method, the case of binary in methods, and the case where binary is treated as whitespace. The \x16 is the generic test, the \0 a special-case for injection attack detection and NUL handling correctness. - in summary; if you feel the test case needs another alphanumeric letter after the binary ccode to suit your ideas of "in method" you can add one. Otherwise this is needless change. * now that you have made both relaxed and strict parsers produce identical output for the "GET HTTP/1.1\r\n" test case. - please add a second test for the remaining case of "GET HTTP/9.9\r\n" which should now be what produces a HTTP/0.9 request for URI "HTTP/9.9" (relaxed) or an invalid-request error (strict). - note the parser violates RFC 1945 by accepting two sequential SP characters as method delimiter. The standard defines exactly one. With the second being part of the URI (yuck). I wont ask for fixing that. * the too-long method case will need re-instating in appropriate form with the method length limit. I'm happy to see this new code passes all the units with so few changes necessary to them. Though that does suggest we need a looped test to validate all CTL characters individually instead of just a single "GET\x16" method test. I think that is all. Amos _______________________________________________ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev