#22233: Reconsider behavior on .z URLs with Accept-Encoding header -------------------------------------------------+------------------------- Reporter: nickm | Owner: ahf Type: defect | Status: | assigned Priority: Medium | Milestone: Tor: | unspecified Component: Core Tor/Tor | Version: Severity: Normal | Resolution: Keywords: 034-triage-20180328, | Actual Points: 034-removed-20180328 | Parent ID: | Points: Reviewer: | Sponsor: | Sponsor4 -------------------------------------------------+-------------------------
Comment (by Hello71): Replying to [comment:4 yawning]: > Replying to [comment:3 arma]: > > FYI, my wget didn't send any accept-encoding header. Neither did Sebastian's. Maybe Yawning's did? You can tell it to *add* an accept- encoding header, but then what do you expect. > > `wget http://example.com` on my system does this: > > {{{ > GET / HTTP/1.1 > User-Agent: Wget/1.19.1 (linux-gnu) > Accept: */* > Accept-Encoding: identity > Host: example.com > Connection: Keep-Alive > }}} > > Python's HTTP client also includes the header with `identity`. > > > I think the issue here is more that there are two ways to indicate you want compression -- adding a .z to the url, and saying so in the accept- encoding header -- and we should build the two by two decision matrix and do the smart thing for all four cases. > > Yes. The existing code tries to treat `.z` as `Accept-Encoding: deflate`, which is a shortcut, and not always correct. Assuming we do not want to double compress, what I would consider working behavior looks like: > > || File || Accept-Encoding || Action || > || `foo` || N/A || `foo` || > || `foo` || `identity` || `Content-Encoding: identity`, `foo` || > || `foo` || `deflate` || `Content-Encoding: deflate`, `deflate(foo)` || > || `foo` || `identity, deflate` || `Content-Encoding: deflate`, `deflate(foo)` || > || `foo` || `identity, gzip` || `Content-Encoding: gzip`, `gzip(foo)` || > || `foo` || `gzip` || `Content-Encoding: gzip`, `gzip(foo)` || > || `foo` || `deflate, gzip` || `Content-Encoding: gzip`, `gzip(foo)` || > || `foo.z` || N/A || `deflate(foo)` || > || `foo.z` || `identity` || `Content-Encoding: identity`, `deflate(foo)` || > || `foo.z` || `deflate` || `406 Not Acceptable` || > || `foo.z` || `identity, deflate` || `Content-Encoding: identity`, `deflate(foo)` || > || `foo.z` || `identity, gzip` || `Content-Encoding: identity`, `deflate(foo)` || > || `foo.z` || `gzip` || `406 Not Acceptable` || > || `foo.z` || `deflate, gzip` || `406 Not Acceptable` || > > (`gzip` used as a placeholder algorithm for "Something that is supported that is not `deflate`) > > The current code mishandles the cases in the table that should either double compress or return `406`. I believe this is not consistent with modern HTTP and web client behavior. I am fairly sure that modern web clients do one of the following: 1. send Accept-Encoding: deflate, gzip (or gzip, deflate) 2. if the response is Content-Encoding: deflate or gzip, transparently decompress it. 3. process the decompressed content as the type indicated in Content-Type. 1. do not send Accept-Encoding, or send Accept-Encoding: identity 2. do not decompress the content 3. process the content as the type indicated in Content-Type. Note that not sending any Accept-Encoding is identical to sending Accept- Encoding: identity, as specified in RFC 7231 (https://tools.ietf.org/html/rfc7231#section-5.3.4). I am fairly sure that this behavior also does not depend on the file extension of the URL. Therefore, it is not correct to return 406 if the server thinks that compressing the content is stupid (note that this is not just the case for gzipped files. it also applies to image files, video files, font files, and so on; too many for the browser to even attempt to make a comprehensive list of file extensions). Instead, it should simply not compress the content, not send Content-Encoding: identity, and send it as is. You can see this behavior if you execute for example `curl --compressed -v torproject.org`. Compression is offered, but the server doesn't want to bother, so it just doesn't compress it. This is supported by https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept- Encoding, which says "As long as the identity value, meaning no encoding, is not explicitly forbidden, by an identity;q=0 or a *;q=0 without another explicitly set value for identity, the server must never send back a 406 Not Acceptable error.". Therefore, I think your table should look more like this: > || File || Accept-Encoding || Action || > || `foo` || none or `identity` || no Content-Encoding, `foo` || > || `foo` || `deflate` || `Content-Encoding: deflate`, `deflate(foo)` || > || `foo` || `gzip` || `Content-Encoding: gzip`, `gzip(foo)` || > || `foo` || `deflate, gzip` || `Content-Encoding: deflate` or `gzip`, `deflate(foo)` or `gzip(foo)` respectively || > || `foo.z` || none or `identity` || no Content-Encoding, `deflate(foo)` || > || `foo.z` || `deflate` || no Content-Encoding, `deflate(foo)` || > || `foo.z` || `gzip` || no Content-Encoding, `deflate(foo)` || > || `foo.z` || `deflate, gzip` || no Content-Encoding, `deflate(foo)` || I doubt there exist any actual modern web clients than do not fit one of these. If there are, it's probably fine to send them whatever as long as they accept it, explicitly or implicitly. Note that this guarantees that anybody who requests `foo` will see the actual contents of `foo` in their browser, or saved to their disk or whatever. Additionally, anybody who requests `foo.z` will always receive a deflated version of `foo`, and (theoretically) will not have their browser decompress it behind their backs. Also, we do not unnecessarily compress anything twice. For what it's worth, my wget also sends `Accept-Encoding: identity` by default. I'm using wget 1.19.5. -- Ticket URL: <https://trac.torproject.org/projects/tor/ticket/22233#comment:12> Tor Bug Tracker & Wiki <https://trac.torproject.org/> The Tor Project: anonymity online
_______________________________________________ tor-bugs mailing list tor-bugs@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs