Hi everyone,

Could anyone shed some light for me on how and when (and especially why) Bayes often generates its own Message-Ids when learning, instead of using the one provided in the message? I have a lot of Message-Ids that are '@sa-generated' in my Bayes database.
This also makes it a bit hard to check if a message was indeed correctly learned, because the real Message-Id never makes it into bayes_seen.


This is the scenario I'm worried about:
1.) A spam (e.g. a stock-spam) goes trough our filter-machine and gets falsely autolearned as ham. I've seen this happen quite a few times.


2.) If the recipient happens to be one of our exchange-users, they forward it back (as attachement). The mail is stripped out of the attachement and fed back into bayes.

If the Message-Id is not the same when the spam-email comes around the second time it does get learned as spam, but never gets to correct the wrong auto-learning. In theory it would mean you could never get a Bayes-probability over 50% for that particular spam... which indeed seems to happen for some of the stock-spams of late.

Regards, Paul Boven.




Reply via email to