Will do tomorrow.
On 18/09/2008, at 10:39 PM, Amos Jeffries wrote:
Mark Nottingham wrote:
I've got a user who's running a pair of peered accelerators, using
both stale-while-revalidate and max_stale.
Occasionally, they see extremely old content being served; e.g., if
CC: max-age is 60s, they might see something go by which is
1000-3000 seconds old (but still within the max_stale window).
The pattern that appears to trigger this is when a resource with an
in-cache 200 response starts returning 404s; when this happens,
Squid will start returning TCP_NEGATIVE_HIT/200's. E.g. (traffic
driven by squidclient),
1713703.815 0 127.0.0.1 TCP_STALE_HIT/200 5234 GET http://
server1//5012904 - NONE/- application/json
1221713703.979 164 0.0.0.0 TCP_ASYNC_MISS/404 193 GET http://server
1/5012904 - FIRST_UP_PARENT/back-end-server1 text/plain
1221713711.431 0 127.0.0.1 TCP_NEGATIVE_HIT/200 5234 GET http://server
1/5012904 - NONE/- application/json
1221713720.978 0 127.0.0.1 TCP_NEGATIVE_HIT/200 5234 GET http://server
1/5012904 - NONE/- application/json
1221713723.483 0 127.0.0.1 TCP_NEGATIVE_HIT/200 5234 GET http://server
1/5012904 - NONE/- application/json
As you can see, stale-while-revalidate kicks in, and the async
refresh brings back a 404, but that doesn't get stored properly.
Looking at the code, I *think* the culprit is storeNegativeCache(),
which will, assuming that max_stale is set (either in squid.conf or
response headers), block the new response from updating the cache
-- no matter what its status code is.
It makes sense to do this for 5xx status codes, because they're
often transient, and reflect server-side problems. It doesn't make
as much sense to do this for 4xx status codes, which reflect client-
side issues. In those cases, you always want to update the cache
with the most recent response (and potentially negative cache it,
if the server is silly enough to not put a freshness lifetime on it).
The interesting thing, BTW, is that this only happens when
collapsed forwarding is on, because this in httpReplyProcessHeader:
if (neighbors_do_private_keys && !
Config.onoff.collapsed_forwarding)
httpMaybeRemovePublic(entry, reply);
masks this behaviour.
Thoughts? I'm not 100% on this diagnosis, as the use of peering and
stale-while-revalidate make things considerably more complex, but
I've had pretty good luck reproducing it... I'm happy to attempt a
fix, but wanted input on what approach people preferred. Left to my
own devices, I'd add another condition to this in
storeNegativeCache():
if (oe && !EBIT_TEST(oe->flags, KEY_PRIVATE) && !EBIT_TEST(oe-
>flags, ENTRY_REVALIDATE))
to limit it to 5xx responses.
I'd agree with you based on that analysis. Can you add a bugzilla
entry with a patch that does it?
Amos
--
Please use Squid 2.7.STABLE4 or 3.0.STABLE9
--
Mark Nottingham [EMAIL PROTECTED]