Re: Question about spam report header

Marc Perkel Tue, 02 Feb 2016 21:44:02 -0800

perl -p -i -e 's/__/T_/g' /usr/share/spamassassin/updates_spamassassin_org/*

This converts the rules. I'm doing something very interesting. It'sgoing to take a few days to see if it works.

I'm applying the same techniques of my evolution filter to the SA rulenames.

I extract the names and then run them into a program that create allcombinations up to 4 levels and learn those combos as either spam or ham.

Then after building a ham and spam corpus sets I take the test message -create set of rule combinations and then do set campares against the toham and spam sets.

What I'm looking for is combos matching ham and NOT matching spam - or -combinations matching spam and NOT matching ham.

In theory I should be able to create thousands of combination rules forboth ham and spam that all have a very high probably of being accurate.It's just an interesting experiment to see how well it works.

Right now I have 151728 ham combination, 113632 spam combinations. Ofthose only 22933 are in both sets. It's only been learning for one day.I want to see where it is after a week.

Buy changing the rules from __ to T_ I exposed a lot more rule names.The way this works is that I don't need to know what rules are ham rulesor spam rules in advance. And I don't need to score them. The filterfigures it all out on it's own. So the rule names are just information.

I think this trick will make SA far more accurate. We'll see. I want togive it till at least Friday for the system to learn. I'm also storinghit counts so that I could pick out maybe the best 1000 rules andpublish them.

Anyhow - that's what I'm up to and so far results are good. But becauseit's early in the learning cycle most message are not yet producingsignificant scores. The ones that are producing scores are making theright call however.




On 02/02/16 20:19, Dave Funk wrote:

You can do that but it requires editing all your rule files, althothen you see those matches in all your reports.
If you just want to test one particular message, just use the -Doption to spamassassin and grep for ' got hit: '
Mar 11 21:51:44.203 [5074] dbg: rules: ran header rule __MIME_VERSION======> got hit: "<YES>"Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule__TO_HEADER_EXISTS ======> got hit: "<"Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __TOCC_EXISTS======> got hit: "<YES>"Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __KAM_UPS2======> got hit: "negative match"Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __KAM_JURY3======> got hit: "negative match"Mar 11 21:51:44.205 [5074] dbg: rules: ran header rule __HAS_FROM======> got hit: "<YES>"
(Yes, Marc, you probably already know this, this is for the otherpeople who might be following this thread ;)
On Tue, 2 Feb 2016, Marc Perkel wrote:
Never mind ....

I found that if I change __ to T_ that it does what I want.


On 02/02/16 18:05, Marc Perkel wrote:
On 02/02/16 17:55, Marc Perkel wrote:
Normally SA creates a header that has a list of the names of rulesthat matched. It skips the listing of hidden rules that start with__ .
Is there a command where I can easily tell SA to include the hiddenrules in the report in the headers so I can see all of it?
I'm also - I suppose asking it to list rules that match that produceno scores.
body __LATE_RICH_RELATIVE /\blate.{0,15}(?:father|wife|widow|husband|general|president|daughter|son|minister|client)/i
body __CT_CLICK /\b(click(ing)?(here|now|this|on|below|.{0,9}(hyper)?link))|visit(ing)?this link\b/i
body      __BENEFICIARY            /\bbeneficiary\b/i
body __CT_BEGGER /\b(kind assist[ae]nce|feed myfamily|need (of )?your help|donat(e|ion))\b/i
body __CT_CONTACT /\b((contact(?:ing) you|contact(information|me|email|number|us)|your contact))|to (inform|email) you/i
body __CT_REPLY_TO_ME /\b(reply to me|please reply|myemail address|private email|contact me|prompt response|reply fromyou|hearing from you|assist me)/i
body __CT_DYING /\b(diagnosed with|months tolive|dying of|transplant)\b/i
body      __CT_UNITED_NATIONS      /\bUnited Nations?\b/i
meta __CT_STRANGER CT_MY_NAME_IS || CT_DEAR_FRIEND|| CT_DEAR_SOMETHING || CT_SIR_MADAM || CT_INTRODUCE
meta __CT_MONEY CT_TRANSFER_MONEY ||CT_THE_SUM_OF || CT_EARN_MONEY || LOTS_OF_MONEY || MILLION_USD ||FUZZY_MILLION || GIVE_YOU_MONEY || __CT_BANK || BILLION_DOLLARS ||US_DOLLARS_2 || ADVA$
meta __CT_VICTIM __BENEFICIARY ||CT_LATE_PRESIDENT || CT_LATE_RICH_RELATIVE || __CT_DYING
meta __CT_FORM FILL_THIS_FORM ||FILL_THIS_FORM_LONG || T_FILL_THIS_FORM_SHORT
meta __CT_CONFIDENTIAL CT_PRIVATE_EMAIL ||CT_PRIVATE_PHONE || CONFIDENTIAL_SCAM1 || CONFIDENTIAL_SCAM2
meta __CT_NOW CT_ACT_NOW || CT_DO_IT_TODAY ||CT_URGENT_RESPOND
meta      CT_GOD_BENEFICIARY       __CT_GOD && __CT_VICTIM
describe  CT_GOD_BENEFICIARY       God and Beneficiary
score     CT_GOD_BENEFICIARY       4

meta      CT_GOD_BEGGER            __CT_GOD && __CT_BEGGER
describe  CT_GOD_BEGGER            Begging in Religious Language
score     CT_GOD_BEGGER            3


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Re: Question about spam report header

Reply via email to