Am 11.12.2015 um 18:42 schrieb Martin Gregorie:
For instance, I have two portmanteau rules, SALE (contains sales phrases like "huge discount") and PRODUCT (contains phrases like "fur coat") that are ANDed by a meta called SALESPAM. The nice thing about this approach is that, once the SALE and PRODUCT lists have grown to a decent size the SALESPAM meta starts to fire on previously unseen combinations without generating FPs. The only downside is that, unlike Bayes, you have to build the lists manually but thats probably no worse to do than building a hand-crafted Bayes DB like Reyndl does
hand crafted bayes? worse? what a nonsense what's handcrafted there? that i don't trust autolearn and don't like autoexpire?well, how many of you trained chistmas spam this year while my bayes did know it from last year?
how many of you are train the same spam types again and again because spammers are aware of autoexpire and just need to stop using a campaign for some weeks until 99% of default setups has forgotten about it
what i do is just KEEP all training messages so that i can rebuild my bayes at every point in them without start learning from scratch
since "bayes_token_sources all" coming with the last release as well as "normalize_charset 1" enabled later and chnaged it#s behavior with the lastest release i know why - well, i did know that from the first moment "keep the corps if later something in the tokenizer changes"
signature.asc
Description: OpenPGP digital signature