On 9/09/2012 4:19 a.m., Alex Rousskov wrote:
On 09/07/2012 09:13 PM, Amos Jeffries wrote:

Also, any revalidation requests done later must be done on the
original request URL. Not the stored URL nor the potentially different
current client request URL.
This sounds like a very important point that could justify storing the
original request URL -- exactly the kind of information I was asking
for, thank you!

Why do we have to use the original request URL for revalidation instead
of the current one? We use current, not original request headers (we do
not store the original ones), right? Is it better to combine current
headers with the original URL than it is to use the current URL with
current headers?

Revalidation requires very precise variant targeting to ensure updated headers received from the revalidation is not corrupting the cached object copy. Regardless what people may think the YouTube URLs and other sites being de-duplicated with store-url *are* actually pointing at different files on different servers with potentially different hashes or encoding details. Particularly in the cases where the HD and standard definition variants of a video are store-url mapped to the same cache object.

The URL and ETag are both critical details to preserve here. Also, anything else which is used for specific Squid->upstream identification of the resource being revalidated.

The store URL rewriting feature essentially assumes that any request URL
that maps to URL X is equivalent and, hence, any response to any request
URL that maps to URL X is equivalent. Why not use that assumption when
revalidating? If we receive a 304, we can keep using the stored content.
If we receive new response content, should not we assume that the stored
content [under the original URL] is stale as well?

Assumes is the right word. They are equivalent only in the proxy administrators thoughts. Which may be wrong or right. We have to let them be wrong sometimes and cause clients display problems, but we should not let them cause local cache corruption with revalidation updating cached objects meta data from incorrect variant sources.


Again, I am not trying to say that using original URL for revalidation
is wrong -- I am just trying to understand what the design constraints are.

We could simply re-fetch and store a new copy from the new client request details. Revalidation is an optimization, but requires correct identification of the particular resource and variant we have in cache. That goes for anything in cache, store-url is just tricky in that the client-side request can't present us the accurate details for server-side.


Thank you,

Alex.
P.S. The above still does not justify storing the rewritten URL(s), of
course.

No. I think those are only useful for key purposes and can be discarded once the object in cache is located for a HTI, or stored fro a MISS.

Amos

Reply via email to