On 15/11/17 15:16, Reindl Harald wrote:
Am 15.11.2017 um 15:47 schrieb Sebastian Arcus:
On 15/11/17 09:56, Reindl Harald wrote:
Am 15.11.2017 um 09:41 schrieb Sebastian Arcus:
I can't really train the bayesian filter on these emails, as it
would start to affect ham emails classification
this is a unproven claim!
we have here phishings in bayes which are classified with BAYES_99
where my human eyes hardly can distinct them between origin messages
classified with BAYES_00 - you just need to train both and bayes will
find the differences over time
I'm not sure I understand this? In my limited knowledge of how
bayesian filters work, I assumed that if the words are the
same/similar between emails, they should produce similar bayes scores,
no? Do you have any links to explanations of how this would work - as
I am keen not to affect the wrong way the bayes databases I built over
time
bayes also takes headers into account as well as a lot of invisible
stuff, fact is that we block all the DHL phishings which existed the
last years and short ago i saw some appearently new with a foreign
envelope/from address failing SPF where a dhl.com server sent on behalf
of the customer and that thing was even without whitelist_auth correctly
classified with BAYES_00
and yes, i have QA scriptts iterating over all the spam and ham samples
collected since 2014, test the current bayes classification, alerts if
spam does not get BAYES_99 or ham not BAYES_00 and in that case
"sa-retrain.sh smaple-path" which makes 5 copies with some modified
headers like message-id and retrains them
Interesting - thank you for the details. Is this your person mailbox(es)
- or a larger setup?