http://bugzilla.spamassassin.org/show_bug.cgi?id=2910
------- Additional Comments From [EMAIL PROTECTED] 2004-01-11 12:10 ------- Subject: RE: Fast SpamAssassin score learning tool. Thought it might be useful to archive the related discussion on the SA Dev. List, So am repeating a couple of the related e-mails below. > From: Henry Stern [mailto:[EMAIL PROTECTED] > Sent: Saturday, January 10, 2004 11:50 AM > To: 'Gary Funck'; [email protected] > Cc: 'Spam Assassin Dev'; [EMAIL PROTECTED] > Subject: RE: Neural Net scoring > > > > -----Original Message----- > > From: Gary Funck [mailto:[EMAIL PROTECTED] > > Sent: January 10, 2004 3:29 PM > > To: [email protected] > > Cc: Spam Assassin Dev; [EMAIL PROTECTED] (Henry Stern) > > Subject: RE: Neural Net scoring > > > > Thanks. Here's the link: > > http://bugzilla.spamassassin.org/show_bug.cgi?id=2910 > > > > This looks interesting. I echo Sidney's follow-up: > > > > "That's impressive. How close are the results to those of the GA? That's > > actually two questions: 1) How close is the scoring that the perceptron > > comes up with to the scoring that the GA comes up with? and 2) How much > > difference in spam categorization results is there between using the > > scores generated by the perceptron and those generated by the GA?" > > Some of the scores are the same, others are different. The GA has some > added constraints that are required because it works on a global level (it > looks at mean performance of solutions over the training set) where > stochastic gradient descent looks at performance on individuals. > > > This approach looks like it does a good job of mixing some of > the benefits > > of a the current additive scoring approach and a Neural Net. The final > > neural > > net that is derived is much simpler than a full-fledged net, but it has > > the > > advnatage of being simple to understand, and maps well onto the existing > > framework. > > The current additive scoring approach is precisely equivalent to a > perceptron with a linear transfer function and a threshold activation > function. What I do is use a different activation function for training > (threshold activation functions are discontinuous and therefore not > differentiable) and then map the results to a threshold perceptron. > > > It would've been interesting to see what sorts of scores this approach > > produced, > > and how well they worked in practice. (There's also a question of > > copyright > > that > > would need to be resolved for this approach to gain wider use.) > > Once the preprocessing stuff is worked out, I'll write a white paper that > discusses the results in detail. As for copyright, I've signed an Apache > CLA. > > Henry > ---------------------------------------------------------------------- > From: Phillip Evans [mailto:[EMAIL PROTECTED] > Sent: Saturday, January 10, 2004 5:40 PM > To: [EMAIL PROTECTED] > Subject: Re: New rule type suggestion > > > G'day. I think you're thinking too deeply about this <g>. To clarify: > > MLP's basically do two things to determine a result: > 1. Identify features; and > 2. Correlate between those features. > > One problem with ANNs, particularly in the area of text processing, is > getting something meaningful into it. This is why things like > Hidden Markov > Models (ie: statistical models) are more commonly used (NB: This is a > completely unsubstantiated statement based on work I did years ago). > > SA is already identifying features so IMO we don't need an ANN to do that. > What we need is something that can correlate features to classifications. > But wait! We have one of those already - the Bayes engine. > > NB: I don't think that the correlation between the presence of > certain rules > in a message and that message being classified as SPAM is all > that complex - > certainly doesn't need a hidden layer in an MLP. SPAM messages are being > generated by people (drongos, granted, but people none-the-less) and are > heavily constrained by the protocol they have to use. There's no > scope here > for an obtuse n-dimensional inverse bicubic relationship that > only ANNs (and > rocket scientists with too much time on their hands) can identify. Having > said that, there's certainly an argument for automatically > determining rule > weightings. > > > Now I haven't been working with SA for long and I don't know Perl > (you might > get sick of me saying that over the next few weeks) so I don't know the > internals of the SA Bayes engine. I am going to assume that the Bayes > engine works as others out there (eg: POPFile) work: by tokenising the > message text and then weighting features. > > The SA rules are identifying features that the Bayes engine doesn't > currently identify. The idea would be to identify the features > using the SA > rules and feed those into the Bayes engine for consideration. > Now the Bayes > engine can (automagically) weight the rule-based features and Hey > Presto! - > we have the meta-rule rule. Not only that, you have visibility of the > weighting assigned to each rule so humans can easily tweak them without > getting inexplicable results. Alternatively you could just feed > additional > tokens based upon the rules into the current Bayes processing. > > > As a final comment, the existing SA rule weightings are manually set and > this would seem to be causing problems that people are now trying to solve > (using, for example, the Fast SA Score Learning Tool). If you wanted to I > reckon you could change the existing SA rules engine to be > completely Bayes > driven (ie: take away the manually set weightings altogether). This might > require initially writing some rules for identifying valid e-mail > so it can > identify what messages *should* look like but this set shouldn't need to > change much over time. > > Phil. > > PS: I don't want to imply that the Fast SA Score Learning Tool isn't the > best thing since sliced bread. It looks like pretty cool stuff > to me - keep > up the good work Henry! > ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
