I have found a problem when WWWOFFLE uses ETags in queries. The problem is not with WWWOFFLE but with the web server it connects to.
Some of you might have noticed that WWWOFFLE (2.7, not 2.6) sometimes reloads images that shouldn't be reloaded. They obviously haven't changed and an analysis of their content shows no modification. My WWWOFFLE is set to re-validate after 10 minutes (I'm on broadband and I use WWWOFFLE as a "standard" proxy). So I see it more often than other who only check pages once per session. Doing manual requests to the web server shows that the same entity can have multiple ETags. It could be explained if the site is served by a pool of servers, each one generating a slightly different value. This is bad for WWWOFFLE since it uses this value in its validating queries to the originating server. Depending on which server you're getting, the ETag can be different, and the server send the whole file once again. After a series of test I can tell that this is not uncommon. Every server pool out there might behave like this. I've added an option to WWWOFFLE (Online section) so that it only uses ETags when necessary. You can read a technical explanation below. You'll find 3 patches attached to this message. On for the standard version of WWWOFFLE 2.7g/h (it works on both), and one for those who have already modified it with the "preserve cache" patch from Paul Rombouts (works on 2.7g/h too). And the last one adds the comment of the option to your config file (it will give one error message because it tries 2 different ways to patch, but it works). I've tested the fix for the last weeks, and it works perfectly. Enjoy ------ Here is a technical description of the problem: First an example: www.garfield.com $ telnet www.garfield.com 80 GET http://www.garfield.com/images/front/Jan03_09.jpg HTTP/1.1 Host: www.garfield.com If-Modified-Since: Thu, 09 Jan 2003 23:01:04 GMT "304" replies will come with these 3 different ETags. ETag: "40deb2-33ce-3e1dff30" ETag: "1e9fa4-33ce-3e1dff30" ETag: "1a789c-33ce-3e1dff30" You have to close telnet each after each request to see the problem. And give it at least 10 trials to display a different ETag. Don't forget the empty line after the headers. Currently WWWOFFLE generates a "If-None-Match:" header in it requests if an ETag is present in its cached copy. My first fix for the problem was to add an option to disable completely the use of ETags. Then I read the RFC (2616) and have found a more subtle solution. http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html ------ 14.19 ETag ... The entity tag MAY be used for comparison with other entities from the same resource. ------ Here is the first problem. How can we (and WWWOFFLE) know that the server is the same "resource" if there is more than one behind the same domain, or worse, behind the same IP ? This paragraph links to: 13.3.3 Weak and Strong Validators which explains what kind of validators we can expect in headers, and how they can be used in requests. It tells us that "ETag" is a strong validator and that "Last-Modified" is a weak one. It also gives us a way to use "Last-Modified" as a strong validator. ------ ... - The validator is about to be used by a client in an If- Modified-Since or If-Unmodified-Since header, because the client has a cache entry for the associated entity, and - That cache entry includes a Date value, which gives the time when the origin server sent the original response, and - The presented Last-Modified time is at least 60 seconds before the Date value. ... This method relies on the fact that if two different responses were sent by the origin server during the same second, but both had the same Last-Modified time, then at least one of those responses would have a Date value equal to its Last-Modified time. The arbitrary 60- second limit guards against the possibility that the Date and Last-Modified values are generated from different clocks, or at somewhat different times during the preparation of the response. An implementation MAY use a value larger than 60 seconds, if it is believed that 60 seconds is too short. ------ My solution is to use a strong validator in the request (if one is available), but not ETag if another strong is available ("Last-Modified"). I've added a tests in the code that generates the queries. If an ETag is present, check if the "Last-Modified" date can be considered strong. It compares the dates in "Last-Modified" and "Date". If both are present and the gap is greater than 60 seconds, it considers "Last-Modified" strong. If "Last-Modified" is strong it doesn't add a "If-None-Match:" header. AMB doesn't likes this solution. Instead he prefers the option disabling the header. I prefer mine because it doesn't weaken the queries made by WWWOFFLE. Here is a table with the strength of the queries. (use a fixed pitch font to read it) Keep in mind that it only applies if an ETag is present. (other queries are identical in WWWOFFLE standard, or with my patch, or with AMB prefered solution). 0 ... 60s ... 10min ... always ETag use-strong use-strong* use-strong* ETag only if <60 use-str no/yes-str** no/yes-str** ETag disabled no-weak no-strong no-strong (*) can give false positive (200 instead of 304 with identical entity) (**) if one of the dates is missing, ETag is used. You can see that my solution ensures that the query uses a strong validator. It's the best of the two worlds. It doesn't have the problem of false positive, and always uses a strong validator (if one is available). Those who want to discuss the technical part of this fix are free to join the developers' list. (I'll give the address to those interested) -- Marc
AlwaysUseETag.tar.gz
Description: Binary data
