RE: Potential new auto-learning strategy

Gray, Richard 2 Mar 2005 15:25:59 -0000

For various reasons (some political, some technical) we don't use bayes here. It can be very frustrating, but I'm sure you guys know what its like to have your hands tied by corporate wrangling.

The reason I proposed a more complex logic than the one you suggest was to handle down-scoring rules that performed poorly as well as up-scoring effective rules. By using a fixed score, you run the risk either setting it too low and the system taking too long to learn, or too high (it has been demonstrated that this can cause chaotic behaviour in some systems). By using a function that calculates X based on the overall score of the message, the other rules hit, and diminished by the learn rate, the system can quickly cover the large gap, but when the distance between the two scores becomes small, the changes to the score values are appropriate small, tending the system towards stability (assume spammers don't change tactic)

Should 2 particular rules occur commonly together, this would also have the effect of balancing out score changes across them both, relative to their base values.

I'd like to get into doing this, but work is swamped (I don't get to play with spam all day :( ). If there are other people keen on doing this then maybe we can get a collaboration going.

From: Chris Santerre [mailto:[EMAIL PROTECTED]
Sent: 02 March 2005 15:16
To: Gray, Richard; users@spamassassin.apache.org
Subject: RE: Potential new auto-learning strategy

There has been a lot of talk about dynamic scoring. Most people argue that Bayes is a good substitute for it already. But not if you don't use Bayes ;)

I think its a worthy idea for testing. Although the logic could be fairly simple. Like using the top hitting rules script in a cron job. pulling out the N'th top rules and adding X points to them based on the hits. Thats something I've wanted to play with, but had no time.

--Chris

-----Original Message-----
From: Gray, Richard [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 02, 2005 7:03 AM
To: users@spamassassin.apache.org
Subject: Potential new auto-learning strategy

I saw an article a while back about some DJs who were using perl as a mixing tool by writing perl code that edited itself while it ran in a loop. I thought this was kind of cool.

I studied AI at university, and remember a good bit of discussion regarding feedback systems.

So, to combine the two, I was thinking of how to use SA in a similar structure, and propose a dynamic weighting system for SA rules. Consider the scores that a base installation of SA gives to its rules, but when shown messages to learn from, it modifies the score weighting of the rules rather than the bayes system.

I'll not throw out a discussion regarding learning rates and so, but I can imagine the logic being loosely based on how much influence the rule had on the total score, the distance of the final result from the spam/ham boundary, and the learning rate chosen by the administrator.

Any feedback?

R

---------------------------------------------------
This email from dns has been validated by dnsMSS Managed Email Security and is free from all known viruses.

For further information contact [EMAIL PROTECTED]

---------------------------------------------------
This email from dns has been validated by dnsMSS Managed Email Security and is free from all known viruses.

For further information contact [EMAIL PROTECTED]

RE: Potential new auto-learning strategy

Reply via email to