MD5 and URL validation (continue to other very old thread)

Eliezer Croitoru Wed, 21 Nov 2012 11:06:28 -0800

Last time long ago There was a talk about URL storing the originalrequest URL at the swap_file Meta data.


Now it strikes me again while testing something.
the code of:

http://bazaar.launchpad.net/~squid/squid/trunk/view/head:/src/StoreMetaURL.cc#L39(25 lines of code)


##start
bool
StoreMetaURL::checkConsistency(StoreEntry *e) const
{
    assert (getType() == STORE_META_URL);

debugs(20, DBG_IMPORTANT, "storeClientReadHeader: URLcheckConsistency wasn't used ");

        return true;

    if (!e->mem_obj->original_url)
        return true;

    if (strcasecmp(e->mem_obj->original_url, (char *)value)) {
        debugs(20, DBG_IMPORTANT, "storeClientReadHeader: URL mismatch");

debugs(20, DBG_IMPORTANT, "\t{" << (char *) value << "} != {"<< e->mem_obj->original_url << "}");

        return false;
    }

    return true;
}

##end

The code responsible to check the consistency of a cached file\objectURL against the current requested URL.

It's being used at store_client.cc and move from there in newer revisions.
In the old revision 4338 it states that the meaning of this code is:
"Check the meta data and make sure we got the right object."

The problem is that it only being checked while a file is being fetchedfrom UFS(what I have checked) while from RAM it wont be checked.The result is that when store_url_rewrite feature is being used thecheck points on inconsistency between the request url and the object incache_dir (naturally).

Disabling this check will make my life easy with store_url making itfrom "not" to "works".


So I have couple options how to "fix" the issue:
1. disable this check.
2. disable this check for only store_url_rewritten requests.

3. adding the store_url meta object into the cache file and use it toidentify the expected url.

4. add on\off switch to disable this check.
5. others?

After a small talk with alex I sat down and made some calculations aboutMD5 collision risks.

The hash used to make the index hash is a string from "byte + url".

For most caches that I know of there is a very low probability forcollision considering the amount of objects and urls.

Yes we are talking about many many objects and it is possible but it'snot only the URL hash but some other unknowns like request and responseheaders which makes this whole calculation a bit far from reality to hitand taking it from 2^64 chance of collision to more then 2^124.It seems to me like it will take some amount of time until I willsee(never seen) hash collision.


What do you think?
Have you seen real world scenario of collision?

Eliezer

MD5 and URL validation (continue to other very old thread)

Reply via email to