Re: [squid-users] caching for 60 minutes, ignoring any header

Ron Klein Mon, 23 Sep 2013 02:13:47 -0700

I'll describe the real scenario in a more detailed way, but I can'tdisclose all of it.

There are a few machines, let's name them M1 to M9, that are processingdata.From time to time, those machines should make HTTP requests to externalservers, that are business partners. All of these HTTP requests are inthe same format and have the following request headers:

* User-Agent: undisclosed_user_agent
* |Accept-Encoding: gzip, deflate
* Host: the_hostname_of_the_external_server
|* Expect: [nothing]
* Pragma: [nothing]
That's it, nothing more, nothing less.

On those servers, as we agreed, there should be an xml file in aspecific path. For instance:

http://foo.com/bar/daily-orders.xml
(I can't disclose the exact path here)

These files are re-generated from time to time. How often? I can't tell,and it's not up to me.Now, since there are a few thousands of business partners that generatethese xml files for my business, I thought that caching these xml filesin a single machine would be a good idea, since it should reduceexternal traffic.Therefore, I installed Squid3 on a specific machine, and updated M1-M9HTTP clients to use the proxy server instead of directly fetching thexml files.For business considerations, when an xml file is cached, I don't need itto be as fresh as possible. I want to reduce outgoing traffic as much asI possible.My business partners don't care about it, too. They also don't want tochange anything at all in their web servers. That's a fact I can'tchange what so ever.

All I want is to have a local copy of the xml file for every externalserver, that would be considered as "fresh" from T0 to T0+60minutes. Formy business needs, that's what I need. And if some of the xml files arecached somewhere else, which is a rare scenario for this case, then Ican ignore that (business-wise)

I initially thought that the favicons example would simplify things(since a lot of web sites have favicons, and it's a common knowledge),but I wasn't aware of the special case of favicons. I apologize for thetime wasted about my simplified example.


I hope I shed more light about the subject.

Thanks!

On 23-Sep-13 11:21, Amos Jeffries wrote:

On 23/09/2013 7:21 p.m., Ron Klein wrote:
My example of favicons was to simplify the question. The real case isdifferent.
Then please tell us the real details. In full if possible.
favicon is one of the special-case type of URLs and like Eliezer and Ialready mentioned there are some specific usage for them whichdirectly causes problems with your stated goals or even using it as asimplified test case. Perhapse your real case is also using similarspecial-case URLs with other problems - but nobody can assist withthat if you hide details.
So please at least avoid "favicon" references for the remainder ofthis discussion. You have indicated that they are irrelevant.
I want to cache all "favicons" (that is, other resources, internallyused) for 60 minutes.
For a given "favicon", I'd like to have the following caching policy:
Anywho, ignoring all the protocol and UA special-case behaviourfactoids because you said that was a fake example...
The period of 60 minutes should start when the first consumerconsumes the favicon. Let's mark the time for that first request asT0 (T Zero).
Your policy assumes and requires that your proxy is the only onebetween users and the origin server. If your upstream at any stagehave a proxy the object age will not meet your T0 criterion - this iswhy Last-Modified and Age headers are used in HTTP. To indicate anobjects time since creation regardless of whether the object mighthave been newely generated by the origin, altered by an intermediaryor stored for some time by an intermediary or the origin itself(server-side caching or static archive).
FWIW: I am working with a client at present who want to do this typeof caching for every URL in existence, but only for a few minutes.They have a growing list of domain names where the policy has to bedisabled due to problems it causes to user traffic.
During T0 until T0+60minutes, this favicon should be considered as"fresh", in terms of caching.
The single value of 60 in the refresh_pattern line "max" field alongwith override-expire override-lastmod meets the above criteria.
However as I said earlier, freshness does not guarantee a HIT. Thereare many other HTTP features which need to be considered on top ofthat freshness to determine whether it HITs or MISSes.
After T0+60minutes, this favicon should be considered as "stale", interms of caching, and should be re-fetched by Squid, upon request.
There is no such thing as a refetch in HTTP caching.
There is only MISS or REFRESH. The revalidation may happentransparently at any time and you never see it.
The favicon would be cached even if the original server explicitlyinstructed not to cache nor store the favicon.
The refresh_pattern ignore-private and ignore-no-store meet thatcriteria in a way. The object result from the current transaction willbe left in the cache regardless of what might happen to it on anyfuture or past ones.
Yes, I know it might be considered a bad practice,
As stated your caching policy is not particularly bad. The use/need ofignore-private and ignore-no-store is the only bad thing and thestrong sign that you are possibly violating some law...
and perhaps illegal to some readers,
... so consulting a lawyer is recommended.
We provide those controls in Squid for specific use-cases. Yours mayor may not be one of those it is hard to tell from a fake example.
but I assure you that the other servers (the real web servers) thatprovide the responses, are business partners and they gave me theirapproval to override their caching policy. However, they don't wantto change their configuration and it's totally up to me to create mycaching layer.
They may not be willing to alter their public cache controls, butSurrogate-Control features available in Squid offer an alternativetargeted caching policy to be emitted by their servers for your proxy.This assumes they are willing to setup such alternative policy and youconfigure your proxy as a reverse-proxy for their traffic.
Your whole problem would be solved by the upstream simply sending:Surrogate-Control: max-age=3600;your_proxy_fqdn
And another thing: the clients are not web browsers. The clientsconsuming these resources ("favicons" for sake of simplicity) aresoftware components using HTTP as their transport protocol.
Thanks for any advice on the subject.
Well...
you have a set of URLs with undefined behaviour differences from thenotably special-case ones in your example ...being fetched by clients with undefined but very big behaviourdifferences from the UA which would be fetching your example URLs ...
... and you want us to help with specific details about why yourconfig is not working as expected?
 As the old cliche goes "insufficient data".

Amos

Re: [squid-users] caching for 60 minutes, ignoring any header

Reply via email to