> wouldn't it be easier to integrate this into spamd? You'd already have > your db client set up that way.
You're absolutely correct. duh on my part. :-) > Sounds like you've got it right.. You'd need two tables, something like: > > Create Table messages ( > m_id bigint primary key auto_increment, > score float(5,2) not null default 0, > date timestamp not null, > message text not null default '', > otherjunk text not null default '' > ); > > Create Table logs ( > m_id bigint not null, > #spamassassin hit indentifier, eg. FROM_NAME_EQ_FROM_ADDR > hit_type smallint not null, > > index(m_id), > index(hit_type) > ) Heh yeah I was thinking something like create sequence msgid_seq; create table sascores( sa_msgid int, sa_test varchar(100), sa_tscore float not null primary key (sa_msgid, sa_test) ); and create table samsgs( sa_msgid int primary key, sa_score float not null, sa_cdate datetime default text 'now' ); (the sa_tscore would be used to track test scores over the life of the database.) Now I really want to do this. I'll see what I'm up to this weekend. :-) > if you start doing something like this, you're running into the chance > that people will want to start using this information for something > similar to the razor project (which I've avoided due to a rather large > number of complaints about false positives on their mailing list). Since > once you have a database, people are bound to want to use them to track > more than just hit-usage statistics. What really can you track with this besides scoring and the correlation of current email styles and how the tests react to them? I was also thinking of maybe adding some data from the headers which would track where the email came from but then again I don't want to recreate the razor or another SA clone. :-) Offhand, how does Razor get false positives? I thought that it was MD5-based and the email had to be exact? > imho, this is a great idea... combine that with some advanced header/body > parsing (ala spamcop) and we could create a pretty hefty database of > spammers. (my one concern would be that I've noticed my *actual* opt-in > lists are still getting marked as spam for things like removal statements, > etc, so we'd need some way to whitelist certain companies that are known > not to spam, or at least inform people to set up local whitelists for that > kind of thing)... Yes, that is why I'm thinking of creating this database -- we can see what tests are consistently bad and modify/eliminate them. I have a terrible problem with opt-in lists being tagged, as well as financial lists. Regards, Andrew _______________________________________________________________ Hundreds of nodes, one monster rendering program. Now that’s a super model! Visit http://clustering.foundries.sf.net/ _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk