hi --

This would indeed be possible -- just take the contents of the ruleset
dir in /usr/share/spamassassin , throw out most of it, and keep just
23_bayes.cf . Then when you run spamd, tell it to use that rules dir
instead of the default.

You should probably also add a 99_local.cf which contains a
"bayes_path" directive pointing at the custom bayes dbs for the word
list you're using.  then each server would have to use a different
rules dir, and you'd have multiple servers, one for each word list.

and also -- turn off "bayes_auto_learn" ;)

--j.

On Mon, Mar 23, 2009 at 23:11, Randy J. Ray <rj...@corp.oodle.com> wrote:
> Having gone over the FAQ and other doc-sections on the wiki, I haven't been
> able to answer my questions. So here's hoping the user-community can help!
>
> My company is currently using a home-brew solution for applying naive Bayes
> filtering to data. Currently, what we're doing is basically spam filtering
> on email messages that pass through our system. However, we have a need to
> do filtering on other content, filtering that isn't the same as
> spam-testing. In a nutshell, we currently use the "bogofilter" application
> to classify messages, and invoke it with different word-list files to
> represent different filtering requirements. But this isn't going to scale
> well for us as written, and I'm the lucky soul tasked with coming up with a
> better way.
>
> I'd like to adapt SA to this, if I can. I've used it in the past (and my ISP
> for my personal email is fiercely loyal to it), but only ever for basic
> email analysis. What I need, in this case, is a scalable Bayesian
> classifier. I see from the docs that using SA will get me a usable
> client/server model, which would take care of most of the scaling issues by
> making it easier for us to move the classifier to a dedicated machine (if
> needed, or at least a less-loaded one). What I *can't* puzzle out from the
> docs, is how to set up such a daemon to do *only* the Bayes part, not the
> rest of the typical spam checking (for one thing, these won't be email
> messages and thus will not have any SMTP headers at all). Also, I (we) would
> need to be able to either have the one daemon dynamically choose the
> database/word-list to use when judging a message, or run multiple instances
> that each look at a different db/word-list.
>
> Is this do-able with SA? I had hoped that there would be a more general
> solution around bogofilter, either a client/server application pair or a
> more API/library-based interface to calling it for training and for
> evaluation. But there isn't (not that I can find, anyway). And SA is a
> system with a long history and a solid code-base, so it seemed worthwhile to
> at least check and see if this was possible.
>
> Thanks in advance for any help, advice, etc.
>
> Randy
> --
> """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
> Randy J. Ray          Oodle, Inc.
>  http://www.oodle.com
> rj...@corp.oodle.com
>
>

Reply via email to