I have found a problem when WWWOFFLE uses ETags in queries. The problem is
not with WWWOFFLE but with the web server it connects to.

Some of you might have noticed that WWWOFFLE (2.7, not 2.6) sometimes
reloads images that shouldn't be reloaded. They obviously haven't changed
and an analysis of their content shows no modification.
My WWWOFFLE is set to re-validate after 10 minutes (I'm on broadband and I
use WWWOFFLE as a "standard" proxy). So I see it more often than other who
only check pages once per session.

Doing manual requests to the web server shows that the same entity can have
multiple ETags. It could be explained if the site is served by a pool of
servers, each one generating a slightly different value. This is bad for
WWWOFFLE since it uses this value in its validating queries to the
originating server. Depending on which server you're getting, the ETag can
be different, and the server send the whole file once again.
After a series of test I can tell that this is not uncommon. Every server
pool out there might behave like this.

I've added an option to WWWOFFLE (Online section) so that it only uses
ETags when necessary. You can read a technical explanation below.
You'll find 3 patches attached to this message. On for the standard version
of WWWOFFLE 2.7g/h (it works on both), and one for those who have already
modified it with the "preserve cache" patch from Paul Rombouts (works on
2.7g/h too). And the last one adds the comment of the option to your config
file (it will give one error message because it tries 2 different ways to
patch, but it works).

I've tested the fix for the last weeks, and it works perfectly.

Enjoy


------
Here is a technical description of the problem:

First an example:
www.garfield.com

$ telnet www.garfield.com 80

GET http://www.garfield.com/images/front/Jan03_09.jpg HTTP/1.1
Host: www.garfield.com
If-Modified-Since: Thu, 09 Jan 2003 23:01:04 GMT


"304" replies will come with these 3 different ETags.
ETag: "40deb2-33ce-3e1dff30"
ETag: "1e9fa4-33ce-3e1dff30"
ETag: "1a789c-33ce-3e1dff30"

You have to close telnet each after each request to see the problem. And give
it at least 10 trials to display a different ETag. Don't forget the empty line
after the headers.


Currently WWWOFFLE generates a "If-None-Match:" header in it requests if an
ETag is present in its cached copy.

My first fix for the problem was to add an option to disable completely the
use of ETags. Then I read the RFC (2616) and have found a more subtle
solution.

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

------
14.19 ETag 
... The entity tag MAY be used for comparison with other entities from the
same resource. 
------

Here is the first problem. How can we (and WWWOFFLE) know that the server
is the same "resource" if there is more than one behind the same domain, or
worse, behind the same IP ?

This paragraph links to:

13.3.3 Weak and Strong Validators

which explains what kind of validators we can expect in headers, and how
they can be used in requests. It tells us that "ETag" is a strong validator
and that "Last-Modified" is a weak one.

It also gives us a way to use "Last-Modified" as a strong validator.

------
...
      - The validator is about to be used by a client in an If-
        Modified-Since or If-Unmodified-Since header, because the client
        has a cache entry for the associated entity, and

      - That cache entry includes a Date value, which gives the time
        when the origin server sent the original response, and

      - The presented Last-Modified time is at least 60 seconds before
        the Date value.
...
This method relies on the fact that if two different responses were sent by
the origin server during the same second, but both had the same Last-Modified
time, then at least one of those responses would have a Date value equal to
its Last-Modified time. The arbitrary 60- second limit guards against the
possibility that the Date and Last-Modified values are generated from
different clocks, or at somewhat different times during the preparation of the
response. An implementation MAY use a value larger than 60 seconds, if it is
believed that 60 seconds is too short.
------

My solution is to use a strong validator in the request (if one is
available), but not ETag if another strong is available ("Last-Modified").

I've added a tests in the code that generates the queries.
If an ETag is present, check if the "Last-Modified" date can be considered
strong. It compares the dates in "Last-Modified" and "Date". If both are
present and the gap is greater than 60 seconds, it considers
"Last-Modified" strong. If "Last-Modified" is strong it doesn't add a
"If-None-Match:" header.


AMB doesn't likes this solution. Instead he prefers the option disabling
the header.

I prefer mine because it doesn't weaken the queries made by WWWOFFLE.

Here is a table with the strength of the queries. (use a fixed pitch font
to read it)
Keep in mind that it only applies if an ETag is present. (other queries are
identical in WWWOFFLE standard, or with my patch, or with AMB prefered
solution).

                        0    ...     60s    ...    10min    ...
always ETag            use-strong   use-strong*   use-strong*
ETag only if <60       use-str      no/yes-str**  no/yes-str**
ETag disabled          no-weak      no-strong     no-strong

(*)  can give false positive (200 instead of 304 with identical entity)
(**) if one of the dates is missing, ETag is used.


You can see that my solution ensures that the query uses a strong validator.
It's the best of the two worlds. It doesn't have the problem of false
positive, and always uses a strong validator (if one is available).


Those who want to discuss the technical part of this fix are free to join
the developers' list. (I'll give the address to those interested)

-- 
Marc

Attachment: AlwaysUseETag.tar.gz
Description: Binary data

Reply via email to