[ removed cross-posting to SURBL list ] Chris Santerre <[EMAIL PROTECTED]> writes:
> Sort of. I didn't know you guys did that nightly. Very nice. I'm > looking for a more localized process that doesn't require a run of > anything. Having counters generated means just checking totals. And > even if one didn't use the exact same rules as another, they could > easily combine totals for the ones they do. One of the problems with developing entirely on local messages is that you inevitably end up with a huge bias from your corpus. First, in development of the rules and second in testing them. One of the nice things about the nightly run is that you get graded on how well your rule works on other corpora that you don't have the ability to tune against, at least not easily. Sometimes, there is a bug that needs to be fixed, so I do ask for FPs and FNs from mass-check submitters from time to time, but if you end up "fixing" rules via exceptions (especially more than one or two), then the rule is probably not going to be stable once you get outside of the larger test set. >> We've been doing this for well over a year and it works great. If only >> we had more active developers working on rules... > I'm not quite sure how to take that last line. We need active rule developers. New rules used to make their way into CVS relatively quickly because that was the only place for them to go. SARE is making very nice strides in developing new rules, those rules aren't being integrated into SpamAssassin quickly at all and everyone is suffering. - it's more work for users - there's less QA and only manual scoring of SARE rules - SpamAssassin is not being well-maintained to integrate these rules efficiently and with low overlap, so speed and efficiency suffer. I'm not saying that I want SARE to go away! SARE does a better job tracking new rule sets than was possible before, but we need to avoid falling to a non-optimal pattern of where effort is going. Developers come and go and we've maintained a strong core team for the Perl code in SpamAssassin, but the number of people actively working on rules is lower now (since January, about 2/3 of SA 3.0 test rule work is the work of one person, 94% is two people). What I think would work better and what I'd like to see: - Some of the experienced SARE developers also become SpamAssassin developers (with commit access soon enough) so that the best rules are quickly integrated into the SVN tree. - Use (and further development) of the infrastructure of the SpamAssassin project to ship rule updates for existing SpamAssassin releases using SARE rules. and the big one: - Shift from using maintenance releases for rule updates to automated official rule updates for stable SpamAssassin releases (think: cron job that you can trust). - There are a number of killer rules in SA 3.0 SVN that have been through extensive QA and would require minimum development to test. Those could have been deployed in general-release quality for 2.6x, I'd like to see something set up now for 3.0 SVN. - The perceptron is also fast to run, so with a bit of work to make it easier to run (and especially if we can get rid of score sets): - we can use it to generate scores for new rules - and eventually, all scores can also be updated regularly - In addition, the plug-in architecture of SA 3.0 will make it somewhat more feasible to do automated updates for non-trivial rules, so now is the time. Daniel -- Daniel Quinlan anti-spam (SpamAssassin), Linux, http://www.pathname.com/~quinlan/ and open source consulting