On 23/12/10 03:55, Volker-Yoblick, Adam wrote:
Ah sorry I wasn't clear. The URIs are NOT unique after the GUID part is removed. That was
the purpose of this exercise, to make them "more unique", but not so unique so
that every deploy gets stored using a different cache key for the exact same file.
Our tool doesn't send If-Modified-Since headers, and even if it did, we don't
care about that. We want to cache forever, or at least until squid cleans up
old files using its replacement algorithm.
IMS is not about caching so much as about what gets sent to the client.
Anyways the non-uniqueness is the problem to avoid. Your custom header
seems like it might be workable for your specific needs. Just check that
is is not causing a lot of needless MISS as the custom header undoes the
GUID removal.
To clarify whats in my head about your scenario (correct where wrong
please).
- You have a bunch of objects under a URL http://example.com/$GUID/*
- some of the objects are identical copies
- some are distinct for each $GUID value
- some of these are expected to change over time and need to be re-fetched
correct?
I think you need to store the non-identical ones with unchanged URL
while the identical ones can be stored with a shared URL.
To cache the non-unique ones requires ETag variant support which Squid
does not (yet) do properly. And it looks/sounds like the source server
does not either.
Amos
-----Original Message-----
From: Amos Jeffries [mailto:squ...@treenet.co.nz]
Sent: Tuesday, December 21, 2010 11:43 PM
To: squid-users@squid-cache.org
Subject: Re: [squid-users] Hacking squid to handle custom http header lines
On 22/12/10 11:15, Volker-Yoblick, Adam wrote:
Greetings,
This e-mail shows you how to modify the squid source to support your
own custom http header lines in 3 easy steps. Feel free to ignore if
you don't care =)
I recently modified my local version of the squid 3.1.9 source so it
would play nice with some of our proprietary tools. Basically, I had
to modify the URL that was getting included in the MD5 used as the key
for cache storage/lookup. (see my e-mail with subject "Caching Content
from Dynamic URLs (very hacky solution)" for code details)
Because I modified the URL, I ran into some collisions, where the "relative url" for two
different files was the same, meaning the cache "key" was the same. Essentially, the
cache would think a requested file was already cached, but the cached file did not match the
requested one.
This is why we are reluctant to allow it. Particularly in absence of an
absolute-url.
So I needed a way to make the MD5 used for each file more unique without using
the full url (since the full url in our case is super dynamic and ends up being
TOO unique). I decided on a custom header line in our tool that sends the http
request (since our tool knows the expected last modified time of the file it's
requesting). Here's what the http request header looks like:
GET http://*****/foo.txt HTTP/1.0
Connection: Keep-Alive
Expected-DateTime: 1292611588
As you can see, the custom header line is "Expected-DateTime" followed by an
int.
Then, I made the following changes to the squid source:
1. In HttpHeader.h, add a new value to the http_hdr_type enum.
The one I added was called HDR_EXPECTED_DATETIME
2. In HttpHeader.cc, I made two changes.
The first was in the HeadersAttrs array. I had add the name of the
header, the enum, and the data type, as follows:
{"Expected-DateTime", HDR_EXPECTED_DATETIME, ftStr}
I left it a string, but it could have been an int as well.
The second change was to add the enum to the RequestHeadersArr array.
static http_hdr_type RequestHeadersArr[] = {
HDR_AUTHORIZATION, HDR_FROM, HDR_HOST,
HDR_IF_MATCH, HDR_IF_MODIFIED_SINCE, HDR_IF_NONE_MATCH,
HDR_IF_RANGE, HDR_MAX_FORWARDS, HDR_PROXY_CONNECTION,
HDR_PROXY_AUTHORIZATION, HDR_RANGE, HDR_REFERER,
HDR_REQUEST_RANGE,
HDR_USER_AGENT, HDR_X_FORWARDED_FOR, HDR_SURROGATE_CAPABILITY,
HDR_EXPECTED_DATETIME
};
Note that it goes in this array because I'm supporting the new header
line in http requests only.
If you want it supported in replies as well, you should probably put it
in the GeneralHeadersArr array.
3. In store_key_md5.cc, we need to make sure the new header line gets used when
calculating the cache key.
In storeKeyPublicByRequestMethod, after "SquidMD5Update(&M, (unsigned char
*) relativeUrl, strlen(relativeUrl));", I added the following:
// Add the last modified time from the header into the cache key
// This is to avoid 2 files with the same relativeUrl from
stomping eachother
if (request->header.has(HDR_EXPECTED_DATETIME))
{
const char *expectedDateTime =
request->header.getStr(HDR_EXPECTED_DATETIME);
SquidMD5Update(&M, (unsigned char *) expectedDateTime,
strlen(expectedDateTime));
}
As you can see, I'm checking if the header has an entry of type
"HDR_EXPECTED_DATETIME". If it does, I get the value, and I update the MD5 used
as the cache key.
I think some further investigation needs to be done as to why the MD5
collisions are occuring when the URI are supposedly unique per object after the
stripped piece is removed.
It sounds like the stripped bit is not an unique as you thought or the objects are not as
cacheable as you thought. They are certainly not "unique" as per your earlier
claims.
FWIW: is any of this needed if your tool sends If-Modified-Since headers?
Amos
--
Please be using
Current Stable Squid 2.7.STABLE9 or 3.1.9
Beta testers wanted for 3.2.0.3
--
Please be using
Current Stable Squid 2.7.STABLE9 or 3.1.9
Beta testers wanted for 3.2.0.3