perl -p -i -e 's/__/T_/g' /usr/share/spamassassin/updates_spamassassin_org/*

This converts the rules. I'm doing something very interesting. It's going to take a few days to see if it works.

I'm applying the same techniques of my evolution filter to the SA rule names.

I extract the names and then run them into a program that create all combinations up to 4 levels and learn those combos as either spam or ham.

Then after building a ham and spam corpus sets I take the test message - create set of rule combinations and then do set campares against the to ham and spam sets.

What I'm looking for is combos matching ham and NOT matching spam - or - combinations matching spam and NOT matching ham.

In theory I should be able to create thousands of combination rules for both ham and spam that all have a very high probably of being accurate. It's just an interesting experiment to see how well it works.

Right now I have 151728 ham combination, 113632 spam combinations. Of those only 22933 are in both sets. It's only been learning for one day. I want to see where it is after a week.

Buy changing the rules from __ to T_ I exposed a lot more rule names. The way this works is that I don't need to know what rules are ham rules or spam rules in advance. And I don't need to score them. The filter figures it all out on it's own. So the rule names are just information.

I think this trick will make SA far more accurate. We'll see. I want to give it till at least Friday for the system to learn. I'm also storing hit counts so that I could pick out maybe the best 1000 rules and publish them.

Anyhow - that's what I'm up to and so far results are good. But because it's early in the learning cycle most message are not yet producing significant scores. The ones that are producing scores are making the right call however.



On 02/02/16 20:19, Dave Funk wrote:
You can do that but it requires editing all your rule files, altho then you see those matches in all your reports.

If you just want to test one particular message, just use the -D option to spamassassin and grep for ' got hit: '

Mar 11 21:51:44.203 [5074] dbg: rules: ran header rule __MIME_VERSION ======> got hit: "<YES>" Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __TO_HEADER_EXISTS ======> got hit: "<" Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __TOCC_EXISTS ======> got hit: "<YES>" Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __KAM_UPS2 ======> got hit: "negative match" Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __KAM_JURY3 ======> got hit: "negative match" Mar 11 21:51:44.205 [5074] dbg: rules: ran header rule __HAS_FROM ======> got hit: "<YES>"

(Yes, Marc, you probably already know this, this is for the other people who might be following this thread ;)

On Tue, 2 Feb 2016, Marc Perkel wrote:

Never mind ....

I found that if I change __ to T_ that it does what I want.


On 02/02/16 18:05, Marc Perkel wrote:

On 02/02/16 17:55, Marc Perkel wrote:
Normally SA creates a header that has a list of the names of rules that matched. It skips the listing of hidden rules that start with __ .

Is there a command where I can easily tell SA to include the hidden rules in the report in the headers so I can see all of it?


I'm also - I suppose asking it to list rules that match that produce no scores.

body __LATE_RICH_RELATIVE /\blate .{0,15}(?:father|wife|widow|husband|general|president|daughter|son|minister|client)/i

body __CT_CLICK /\b(click(ing)? (here|now|this|on|below|.{0,9}(hyper)?link))|visit(ing)?this link\b/i

body      __BENEFICIARY            /\bbeneficiary\b/i

body __CT_BEGGER /\b(kind assist[ae]nce|feed my family|need (of )?your help|donat(e|ion))\b/i

body __CT_CONTACT /\b((contact(?:ing) you|contact (information|me|email|number|us)|your contact))|to (inform|email) you/i

body __CT_REPLY_TO_ME /\b(reply to me|please reply|my email address|private email|contact me|prompt response|reply from you|hearing from you|assist me)/i

body __CT_DYING /\b(diagnosed with|months to live|dying of|transplant)\b/i

body      __CT_UNITED_NATIONS      /\bUnited Nations?\b/i

meta __CT_STRANGER CT_MY_NAME_IS || CT_DEAR_FRIEND || CT_DEAR_SOMETHING || CT_SIR_MADAM || CT_INTRODUCE

meta __CT_MONEY CT_TRANSFER_MONEY || CT_THE_SUM_OF || CT_EARN_MONEY || LOTS_OF_MONEY || MILLION_USD || FUZZY_MILLION || GIVE_YOU_MONEY || __CT_BANK || BILLION_DOLLARS || US_DOLLARS_2 || ADVA$

meta __CT_VICTIM __BENEFICIARY || CT_LATE_PRESIDENT || CT_LATE_RICH_RELATIVE || __CT_DYING

meta __CT_FORM FILL_THIS_FORM || FILL_THIS_FORM_LONG || T_FILL_THIS_FORM_SHORT

meta __CT_CONFIDENTIAL CT_PRIVATE_EMAIL || CT_PRIVATE_PHONE || CONFIDENTIAL_SCAM1 || CONFIDENTIAL_SCAM2

meta __CT_NOW CT_ACT_NOW || CT_DO_IT_TODAY || CT_URGENT_RESPOND

meta      CT_GOD_BENEFICIARY       __CT_GOD && __CT_VICTIM
describe  CT_GOD_BENEFICIARY       God and Beneficiary
score     CT_GOD_BENEFICIARY       4

meta      CT_GOD_BEGGER            __CT_GOD && __CT_BEGGER
describe  CT_GOD_BEGGER            Begging in Religious Language
score     CT_GOD_BEGGER            3






--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Reply via email to