> wouldn't it be easier to integrate this into spamd?  You'd already have
> your db client set up that way.

You're absolutely correct.  duh on my part.  :-)

> Sounds like you've got it right..  You'd need two tables, something like:
>
> Create Table messages (
>     m_id       bigint primary key auto_increment,
>     score      float(5,2) not null default 0,
>     date       timestamp not null,
>     message    text not null default '',
>     otherjunk  text not null default ''
> );

>
> Create Table logs (
>     m_id       bigint not null,
> #spamassassin hit indentifier, eg. FROM_NAME_EQ_FROM_ADDR
>     hit_type   smallint not null,
>
>     index(m_id),
>     index(hit_type)
> )

Heh yeah I was thinking something like

create sequence msgid_seq;

create table sascores(
        sa_msgid                int,
        sa_test         varchar(100),
        sa_tscore               float not null
        primary key (sa_msgid, sa_test)
);

and

create table samsgs(
        sa_msgid                int primary key,
        sa_score                float not null,
        sa_cdate                datetime default text 'now'
);

(the sa_tscore would be used to track test scores over the life of the 
database.)

Now I really want to do this.  I'll see what I'm up to this weekend.  :-)

> if you start doing something like this, you're running into the chance
> that people will want to start using this information for something
> similar to the razor project (which I've avoided due to a rather large
> number of complaints about false positives on their mailing list).  Since
> once you have a database, people are bound to want to use them to track
> more than just hit-usage statistics.

What really can you track with this besides scoring and the correlation of 
current email styles and how the tests react to them?  I was also thinking of 
maybe adding some data from the headers which would track where the email 
came from but then again I don't want to recreate the razor or another SA 
clone.  :-)

Offhand, how does Razor get false positives?  I thought that it was MD5-based 
and the email had to be exact?

> imho, this is a great idea...  combine that with some advanced header/body
> parsing (ala spamcop) and we could create a pretty hefty database of
> spammers.  (my one concern would be that I've noticed my *actual* opt-in
> lists are still getting marked as spam for things like removal statements,
> etc, so we'd need some way to whitelist certain companies that are known
> not to spam, or at least inform people to set up local whitelists for that
> kind of thing)...

Yes, that is why I'm thinking of creating this database -- we can see what 
tests are consistently bad and modify/eliminate them.  I have a terrible 
problem with opt-in lists being tagged, as well as financial lists.

Regards,
Andrew

_______________________________________________________________

Hundreds of nodes, one monster rendering program.
Now that’s a super model! Visit http://clustering.foundries.sf.net/
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to