Re: Score Generation for Apache SpamAssassin

Justin Mason Thu, 26 Apr 2007 04:16:22 -0700

Duncan Findlay writes:
> Hi everybody,
> 
> As you may already know, Steven Birk and I have been working on our
> 4th year undergraduate project in Math and Engineering at Queen's
> University.
> 
> The goal of our project was to examine the use of logistic regression
> as a potential replacement for the Perceptron/GA currently used by the
> SpamAssassin project.
> 
> It's now done, and it's available here:
> http://people.apache.org/~duncf/FindlayBirkThesis.pdf
> 
> Basically, we've found a technique that shows promise as a possible
> replacement, but requires some modifications in order to handle some
> of the restrictions the SpamAssassin projects puts on scores.
> 
> I hope to try to make those modifications in the next month or so, but
> I have no idea how well it will turn out, or how easy it will be.
> 
> The paper may be an interesting read for people not too familiar with
> the way the scoring process works now, as it discusses many of the
> issues that differentiate the scoring process from most other machine
> learning problems. (Then again, it might just be boring.)


thanks Duncan -- a great read, and looks promising!

Would it help btw if we came up with a spec for what a score-generation
tool needs to generate, in terms of score ranges and so on?
This would also be useful for the future (I'm sure there'll be
more... ;)

that'd be related to
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376 ...

--j.

Re: Score Generation for Apache SpamAssassin

Reply via email to