hi -- This would indeed be possible -- just take the contents of the ruleset dir in /usr/share/spamassassin , throw out most of it, and keep just 23_bayes.cf . Then when you run spamd, tell it to use that rules dir instead of the default.
You should probably also add a 99_local.cf which contains a "bayes_path" directive pointing at the custom bayes dbs for the word list you're using. then each server would have to use a different rules dir, and you'd have multiple servers, one for each word list. and also -- turn off "bayes_auto_learn" ;) --j. On Mon, Mar 23, 2009 at 23:11, Randy J. Ray <rj...@corp.oodle.com> wrote: > Having gone over the FAQ and other doc-sections on the wiki, I haven't been > able to answer my questions. So here's hoping the user-community can help! > > My company is currently using a home-brew solution for applying naive Bayes > filtering to data. Currently, what we're doing is basically spam filtering > on email messages that pass through our system. However, we have a need to > do filtering on other content, filtering that isn't the same as > spam-testing. In a nutshell, we currently use the "bogofilter" application > to classify messages, and invoke it with different word-list files to > represent different filtering requirements. But this isn't going to scale > well for us as written, and I'm the lucky soul tasked with coming up with a > better way. > > I'd like to adapt SA to this, if I can. I've used it in the past (and my ISP > for my personal email is fiercely loyal to it), but only ever for basic > email analysis. What I need, in this case, is a scalable Bayesian > classifier. I see from the docs that using SA will get me a usable > client/server model, which would take care of most of the scaling issues by > making it easier for us to move the classifier to a dedicated machine (if > needed, or at least a less-loaded one). What I *can't* puzzle out from the > docs, is how to set up such a daemon to do *only* the Bayes part, not the > rest of the typical spam checking (for one thing, these won't be email > messages and thus will not have any SMTP headers at all). Also, I (we) would > need to be able to either have the one daemon dynamically choose the > database/word-list to use when judging a message, or run multiple instances > that each look at a different db/word-list. > > Is this do-able with SA? I had hoped that there would be a more general > solution around bogofilter, either a client/server application pair or a > more API/library-based interface to calling it for training and for > evaluation. But there isn't (not that I can find, anyway). And SA is a > system with a long history and a solid code-base, so it seemed worthwhile to > at least check and see if this was possible. > > Thanks in advance for any help, advice, etc. > > Randy > -- > """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" > Randy J. Ray Oodle, Inc. > http://www.oodle.com > rj...@corp.oodle.com > >