Miha Vrhovnik schrieb:
> "Alexander Bauer" <[email protected]> wrote on 6.1.2010 10:48:31:
>> Hi Miha,
> Hi Alexander,
> 
>> what do you mean with collisions ?
>> Two different Message-IDs with the same Hash ?
>> In that case you should never get 2% collisions.
> 
> If you re-read my message closely especially the part about my stats, you'll 
> see that I wrote "Minimum is a zero about 2% of messages." That means that I 
> have approx 2% of messages without the actual Message-Id. Also as you quoted 
> The field is optional although if not present most of up to date mail servers 
> will add it.

Ok, now i understand. You are right that a simple Hash of the Message-ID 
is not enought to make a dup-check, of course. But it is a nice solution 
for the field-length problem.

>> A hash over the hole header, is not a good Idea. Because if you get a 
>> message twice from different Mailservers, you will not detect the duplicate.
> True. But I would still compare at least Date and From fields.

Yes, i think this is a very good approach.
A MD5 or SHA1 hash of normalized values from Message-ID + Date + From 
should result in a good identifier for dup-checks.

Regards,
Alex


------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
synalist-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/synalist-public

Reply via email to