On 15/11/17 15:16, Reindl Harald wrote:


Am 15.11.2017 um 15:47 schrieb Sebastian Arcus:
On 15/11/17 09:56, Reindl Harald wrote:

Am 15.11.2017 um 09:41 schrieb Sebastian Arcus:
I can't really train the bayesian filter on these emails, as it would start to affect ham emails classification

this is a unproven claim!

we have here phishings in bayes which are classified with BAYES_99 where my human eyes hardly can distinct them between origin messages classified with BAYES_00 - you just need to train both and bayes will find the differences over time

I'm not sure I understand this? In my limited knowledge of how bayesian filters work, I assumed that if the words are the same/similar between emails, they should produce similar bayes scores, no? Do you have any links to explanations of how this would work - as I am keen not to affect the wrong way the bayes databases I built over time

bayes also takes headers into account as well as a lot of invisible stuff, fact is that we block all the DHL phishings which existed the last years and short ago i saw some appearently new with a foreign envelope/from address failing SPF where a dhl.com server sent on behalf of the customer and that thing was even without whitelist_auth correctly classified with BAYES_00

and yes, i have QA scriptts iterating over all the spam and ham samples collected since 2014, test the current bayes classification, alerts if spam does not get BAYES_99 or ham not BAYES_00 and in that case "sa-retrain.sh smaple-path" which makes 5 copies with some modified headers like message-id and retrains them

Interesting - thank you for the details. Is this your person mailbox(es) - or a larger setup?

Reply via email to