Miha Vrhovnik schrieb: > "Alexander Bauer" <[email protected]> wrote on 6.1.2010 10:48:31: >> Hi Miha, > Hi Alexander, > >> what do you mean with collisions ? >> Two different Message-IDs with the same Hash ? >> In that case you should never get 2% collisions. > > If you re-read my message closely especially the part about my stats, you'll > see that I wrote "Minimum is a zero about 2% of messages." That means that I > have approx 2% of messages without the actual Message-Id. Also as you quoted > The field is optional although if not present most of up to date mail servers > will add it.
Ok, now i understand. You are right that a simple Hash of the Message-ID is not enought to make a dup-check, of course. But it is a nice solution for the field-length problem. >> A hash over the hole header, is not a good Idea. Because if you get a >> message twice from different Mailservers, you will not detect the duplicate. > True. But I would still compare at least Date and From fields. Yes, i think this is a very good approach. A MD5 or SHA1 hash of normalized values from Message-ID + Date + From should result in a good identifier for dup-checks. Regards, Alex ------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ synalist-public mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/synalist-public
