Last time long ago There was a talk about URL storing the original request URL at the swap_file Meta data.

Now it strikes me again while testing something.
the code of:
http://bazaar.launchpad.net/~squid/squid/trunk/view/head:/src/StoreMetaURL.cc#L39 (25 lines of code)

##start
bool
StoreMetaURL::checkConsistency(StoreEntry *e) const
{
    assert (getType() == STORE_META_URL);

debugs(20, DBG_IMPORTANT, "storeClientReadHeader: URL checkConsistency wasn't used ");
        return true;

    if (!e->mem_obj->original_url)
        return true;

    if (strcasecmp(e->mem_obj->original_url, (char *)value)) {
        debugs(20, DBG_IMPORTANT, "storeClientReadHeader: URL mismatch");
debugs(20, DBG_IMPORTANT, "\t{" << (char *) value << "} != {" << e->mem_obj->original_url << "}");
        return false;
    }

    return true;
}

##end

The code responsible to check the consistency of a cached file\object URL against the current requested URL.
It's being used at store_client.cc and move from there in newer revisions.
In the old revision 4338 it states that the meaning of this code is:
"Check the meta data and make sure we got the right object."

The problem is that it only being checked while a file is being fetched from UFS(what I have checked) while from RAM it wont be checked. The result is that when store_url_rewrite feature is being used the check points on inconsistency between the request url and the object in cache_dir (naturally).

Disabling this check will make my life easy with store_url making it from "not" to "works".

So I have couple options how to "fix" the issue:
1. disable this check.
2. disable this check for only store_url_rewritten requests.
3. adding the store_url meta object into the cache file and use it to identify the expected url.
4. add on\off switch to disable this check.
5. others?

After a small talk with alex I sat down and made some calculations about MD5 collision risks.
The hash used to make the index hash is a string from "byte + url".
For most caches that I know of there is a very low probability for collision considering the amount of objects and urls.

Yes we are talking about many many objects and it is possible but it's not only the URL hash but some other unknowns like request and response headers which makes this whole calculation a bit far from reality to hit and taking it from 2^64 chance of collision to more then 2^124. It seems to me like it will take some amount of time until I will see(never seen) hash collision.

What do you think?
Have you seen real world scenario of collision?

Eliezer

Reply via email to