RW <rwmailli...@googlemail.com> writes:

>> I'm wanting to setup a spam trap, that should receive nothing but
>> actual spam, and feed that into spamassassin in some way. I'm
>> wondering the best way to automate feeding that data back to the
>> system.
>> 
>> Would it be best used for bayes tuning? It seems not, because it would
>> be 100% spam. 
>
> As long as there is ham from other sources and it doesn't ruin
> token retention, it shouldn't be a problem. Ideally you would only
> feed spam that doesn't reach BAYES_99 and is low-scoring.

That is the problem, our bayes database is not well fed. Its a global
database, and even with trusted 'feeders', it would drift fairly
the wrong way because usually people only trained with spam that did not
get caught, and didn't feel comfortable using their ham.

I've considered the idea of creating a per-user bayes dbs, but then I
couldn't use a spam-trap's caught spam to train all of those dbs,
because I wouldn't really have a clear idea of if those individual bayes
dbs were getting any ham.

>> Would it be better to use it for mass-check and contribute some to
>> the overall rule scoring?
>
> If you use it for Bayes or mass-checks I'd suggest not relaxing any
> pre-SpamAssassin checks. Some people do that to keep the numbers up,
> but optimizing around spam that doesn't reach SpamAssassin seems like a
> bad idea to me.

Each of the mails is 100% spam, so what I'd like to do is have an
automated way to tune my rule scoring, or improve/add rules based on
what gets sent there.

If I have to manually inspect each message by hand, and manually craft
rules, then it doesn't seem like this will scale very well at all.

-- 
        micah

Reply via email to