On 9/09/2012 4:19 a.m., Alex Rousskov wrote:
On 09/07/2012 09:13 PM, Amos Jeffries wrote:
Also, any revalidation requests done later must be done on the
original request URL. Not the stored URL nor the potentially different
current client request URL.
This sounds like a very important point that could justify storing the
original request URL -- exactly the kind of information I was asking
for, thank you!
Why do we have to use the original request URL for revalidation instead
of the current one? We use current, not original request headers (we do
not store the original ones), right? Is it better to combine current
headers with the original URL than it is to use the current URL with
current headers?
Revalidation requires very precise variant targeting to ensure updated
headers received from the revalidation is not corrupting the cached
object copy. Regardless what people may think the YouTube URLs and other
sites being de-duplicated with store-url *are* actually pointing at
different files on different servers with potentially different hashes
or encoding details. Particularly in the cases where the HD and standard
definition variants of a video are store-url mapped to the same cache
object.
The URL and ETag are both critical details to preserve here. Also,
anything else which is used for specific Squid->upstream identification
of the resource being revalidated.
The store URL rewriting feature essentially assumes that any request URL
that maps to URL X is equivalent and, hence, any response to any request
URL that maps to URL X is equivalent. Why not use that assumption when
revalidating? If we receive a 304, we can keep using the stored content.
If we receive new response content, should not we assume that the stored
content [under the original URL] is stale as well?
Assumes is the right word. They are equivalent only in the proxy
administrators thoughts. Which may be wrong or right. We have to let
them be wrong sometimes and cause clients display problems, but we
should not let them cause local cache corruption with revalidation
updating cached objects meta data from incorrect variant sources.
Again, I am not trying to say that using original URL for revalidation
is wrong -- I am just trying to understand what the design constraints are.
We could simply re-fetch and store a new copy from the new client
request details. Revalidation is an optimization, but requires correct
identification of the particular resource and variant we have in cache.
That goes for anything in cache, store-url is just tricky in that the
client-side request can't present us the accurate details for server-side.
Thank you,
Alex.
P.S. The above still does not justify storing the rewritten URL(s), of
course.
No. I think those are only useful for key purposes and can be discarded
once the object in cache is located for a HTI, or stored fro a MISS.
Amos