> wouldn't it be easier to integrate this into spamd? You'd already have
> your db client set up that way.
You're absolutely correct. duh on my part. :-)
> Sounds like you've got it right.. You'd need two tables, something like:
>
> Create Table messages (
> m_id bigint primary key auto_increment,
> score float(5,2) not null default 0,
> date timestamp not null,
> message text not null default '',
> otherjunk text not null default ''
> );
>
> Create Table logs (
> m_id bigint not null,
> #spamassassin hit indentifier, eg. FROM_NAME_EQ_FROM_ADDR
> hit_type smallint not null,
>
> index(m_id),
> index(hit_type)
> )
Heh yeah I was thinking something like
create sequence msgid_seq;
create table sascores(
sa_msgid int,
sa_test varchar(100),
sa_tscore float not null
primary key (sa_msgid, sa_test)
);
and
create table samsgs(
sa_msgid int primary key,
sa_score float not null,
sa_cdate datetime default text 'now'
);
(the sa_tscore would be used to track test scores over the life of the
database.)
Now I really want to do this. I'll see what I'm up to this weekend. :-)
> if you start doing something like this, you're running into the chance
> that people will want to start using this information for something
> similar to the razor project (which I've avoided due to a rather large
> number of complaints about false positives on their mailing list). Since
> once you have a database, people are bound to want to use them to track
> more than just hit-usage statistics.
What really can you track with this besides scoring and the correlation of
current email styles and how the tests react to them? I was also thinking of
maybe adding some data from the headers which would track where the email
came from but then again I don't want to recreate the razor or another SA
clone. :-)
Offhand, how does Razor get false positives? I thought that it was MD5-based
and the email had to be exact?
> imho, this is a great idea... combine that with some advanced header/body
> parsing (ala spamcop) and we could create a pretty hefty database of
> spammers. (my one concern would be that I've noticed my *actual* opt-in
> lists are still getting marked as spam for things like removal statements,
> etc, so we'd need some way to whitelist certain companies that are known
> not to spam, or at least inform people to set up local whitelists for that
> kind of thing)...
Yes, that is why I'm thinking of creating this database -- we can see what
tests are consistently bad and modify/eliminate them. I have a terrible
problem with opt-in lists being tagged, as well as financial lists.
Regards,
Andrew
_______________________________________________________________
Hundreds of nodes, one monster rendering program.
Now that�s a super model! Visit http://clustering.foundries.sf.net/
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk