> One thing I want to do is write a little C program that connects to Postgres 
> (or Perl but with a C client just like spamc/d) and reports on the tests that 
> *all* messages score on.

wouldn't it be easier to integrate this into spamd?  You'd already have 
your db client set up that way.

> For example:
> X-Spam-Status: No, hits=4 required=6 tests=FROM_NAME_EQ_FROM_ADDR

ahh, databases.  that's what I'm good at (can't seem to explain that to my 
boss, unfortunately, since perl programmers are in more demand)...

Sounds like you've got it right..  You'd need two tables, something like:

Create Table messages (
    m_id       bigint primary key auto_increment,
    score      float(5,2) not null default 0,
    date       timestamp not null,
    message    text not null default '',
    otherjunk  text not null default ''
);

Create Table logs (
    m_id       bigint not null,
#spamassassin hit indentifier, eg. FROM_NAME_EQ_FROM_ADDR
    hit_type   smallint not null,

    index(m_id),
    index(hit_type)
)

damn, I'm rusty with this sql stuff.  Anyway, you should really only need
2 tables, since most of your other calculations should be able to be 
handled with joins between these two.

I'll stop here since this kind of thing is probably better left for a dev
list.

> With this kind of database I can get a very good look at what tests are in 
> normal messages and which specific tests are scoreing best this week/month as 
> compared to before.  (the date would also be inserted but I do those 
> automatically, not in the select itself.)
> What do you all think?

if you start doing something like this, you're running into the chance 
that people will want to start using this information for something 
similar to the razor project (which I've avoided due to a rather large 
number of complaints about false positives on their mailing list).  Since 
once you have a database, people are bound to want to use them to track 
more than just hit-usage statistics.

imho, this is a great idea...  combine that with some advanced header/body 
parsing (ala spamcop) and we could create a pretty hefty database of 
spammers.  (my one concern would be that I've noticed my *actual* opt-in 
lists are still getting marked as spam for things like removal statements, 
etc, so we'd need some way to whitelist certain companies that are known 
not to spam, or at least inform people to set up local whitelists for that 
kind of thing)...

-Chris


_______________________________________________________________

Hundreds of nodes, one monster rendering program.
Now that’s a super model! Visit http://clustering.foundries.sf.net/
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to