Hello Bart, Devs, Friday, February 13, 2004, 12:33:27 PM, you wrote, concerning Bayes:
BS> (I hope the use of message-id for this goes by the wayside soon, BS> before spammers get the bright idea to steal old message-id headers BS> from nonspam usenet or list archives and insert them into newly BS> generated spam.) Actually, a new spam-detecting mechanism could be to look for duplicate message ids. I've received multiple spams all using the same message id. a) If a ham is sent to my domain with four recipients here, then because of the way I run SA, I could process that email four times, once for each mailbox. That's expected. And it's expected that each of those emails will have identical bodies, and identical subjects. b) I receive spam where in a given day I can receive similar spam, identical message ids, but with different subject headers (usually random words or letters added to a subject), and/or with different bodies (sometimes minor random differences, sometimes very different messages). c) I receive spam where on Jan 2 I can receive spam with a given message ID, and I can receive spam (similar or not) with identical message ids on Jan 14, Jan 30, Feb 12, etc. I suggest that if we could store a record with three or four fields, message-id, checksum(subject), checksum(body), and maybe time(firstseen), we could use this as a database, and apply a rule (maybe named DUPLICATE_MESSAGEID) where either (1) checksums don't match, or (2) time(now) is significantly different from time(firstseen). Does this seem like a worthwhile approach? Bob Menschel
