Re: Re: No rule updates since 1/1/17

David Jones Sat, 25 Aug 2018 07:13:38 -0700

Tom,

Let me know if you are still interested in setting up a masschecker. That goes for anyone on this list as well. I have worked out thesorting issue pretty well now and my ena-weekX masscheckers are now thelargest contributions to the RuleQA corpus keeping the nightly rulescoring updating regularly the past year.


http://ruleqa.spamassassin.org/ (see the ena-weekX in the green box)

New/more masscheckers are always welcome and will help you learn thebest way to tune your SA platform to get every last drop of accuracyfrom your local meta rules. We could really use masscheckers withprimary languages not English to add/improve core SA rules.


Here's my setup:

- I have an iRedmail server that I split copies of most of my email toan internal-only email domain "sa.ena.net."

- The iRedmail server has Sieve rules (easily managed by RoundCube)based on certain rule hits and scores from my main Internet edgeMailScanner filtering that move them into Ham and Spam folders asunread. Mail scoring in the middle -- not high enough for obvious Spamor low enough for obvious Ham are left in the main Inbox.

- I spend a few minutes each day visually scanning the Subjects of theunread email then mark them as Read.

- If I find a zero-hour email in the main Inbox, then I move it to aSpamCop folder. A script that runs every 5 minutes to check the SpamCopfolder, strips of some extra Received headers from my internal hops,then submits it as an attachment to my SpamCop account.

- A script moves the Maildir email to 4 other masschecker VMs to splitout the load so they will be able to submit their results quickly. Ena-week0 is the last week of ham/spam that is still on the iRedMailserver. Ena-week1-4 are running on the other 4 masschecker VMs to givea total of 5 weeks of recent corpus. I currently have 100,939 Ham and292,001 Spam in ena-week0-4.

- I run a local Bayesian train on the ena-week0 Ham and Spam folder tomy Redis-based Bayes storage shared across my 8 MailScanner nodes and myiRedMail/amavis server. This method has shown to keep my Bayes scoresvery accurate.


Hope someone finds this information helpful.

Dave


On 01/20/2017 01:02 PM, Tom Hendrikx wrote:

On 20-01-17 19:46, David Jones wrote:

From: Kevin Golding <k...@caomhin.org>
Sent: Friday, January 20, 2017 11:59 AM
To: users@spamassassin.apache.org
Subject: Re: No rule updates since 1/1/17

On Fri, 20 Jan 2017 17:26:01 -0000, Bill Keenan
<developerli...@wjkeenan.org> wrote:

What is the fix needed so /usr/bin/sa-update starts getting updates? I
too have not received an update from updates.spamassassin.org
<http://updates.spamassassin.org/> since 1-Jan-17.

Besides updates.spamassassin.org <http://updates.spamassassin.org/>,
what other rule sets are commonly used? Hundreds of spam messages are
getting through with only updates.spamassassin.org
<http://updates.spamassassin.org/> rules.

This seems like a good time to mention
https://wiki.apache.org/spamassassin/NightlyMassCheck
If more people can contribute, even just a small corpora of mail, then
updates will be published more frequently. At the moment a very small
number of people provide data, meaning there is very little margin for
error.

I would like to help with the nightly masscheck but I don't have the
resources to manually check ham and spam.  This also gets into the
grey area of how people define spam.  I also have a very good MTA
setup with RBLs and DNS checks that block most of the spam before
it reaches SA in MailScanner.  My SA only has to block a very small
percentage of my definition of spam so I am not sure how helpful
my mail filtering platform can be even though it's very accurate.

Dave

I think I can say the same about my platform, but since this issue keeps
popping up I just applied for an account just to find out if my
contribution could help. I can't speculate so I'm just gonna try if it
helps :)

Kind regards,
        Tom


--
David Jones

Re: Re: No rule updates since 1/1/17

Reply via email to