Re: Bayes duplicate message detection algorithm?

RW Fri, 13 May 2016 10:57:37 -0700

On Fri, 13 May 2016 12:44:40 -0500 (CDT)
David B Funk wrote:

> What algorithm does Bayes use to detect that it has already 'seen' a
> given message?
> 
> When I receive a bolus (say 40~60) of 'phish' messages from a
> compromised Hotmail/gmail/yahoo account which are mostly the same
> (body, many headers same, only recipients, Message-ID, Date, and a
> few Received headers are different) if I feed all of them to Bayes,
> it will learn only about 10% of them, the other 90% will be ignored
> as 'already seen'.
> 
> So how does Bayes decide that it has 'already seen' a given message
> when it actually hasn't (it has already seen one that is -almost-
> identical).


It's a hash of part of the body and the date header.

Re: Bayes duplicate message detection algorithm?

Reply via email to