Duncan Findlay writes: > Hi everybody, > > As you may already know, Steven Birk and I have been working on our > 4th year undergraduate project in Math and Engineering at Queen's > University. > > The goal of our project was to examine the use of logistic regression > as a potential replacement for the Perceptron/GA currently used by the > SpamAssassin project. > > It's now done, and it's available here: > http://people.apache.org/~duncf/FindlayBirkThesis.pdf > > Basically, we've found a technique that shows promise as a possible > replacement, but requires some modifications in order to handle some > of the restrictions the SpamAssassin projects puts on scores. > > I hope to try to make those modifications in the next month or so, but > I have no idea how well it will turn out, or how easy it will be. > > The paper may be an interesting read for people not too familiar with > the way the scoring process works now, as it discusses many of the > issues that differentiate the scoring process from most other machine > learning problems. (Then again, it might just be boring.)
thanks Duncan -- a great read, and looks promising! Would it help btw if we came up with a spec for what a score-generation tool needs to generate, in terms of score ranges and so on? This would also be useful for the future (I'm sure there'll be more... ;) that'd be related to http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376 ... --j.