R: Auto_increment vs SERIAL key types

Giampaolo Tomassoni Tue, 10 Oct 2006 12:28:35 -0700

> Another issue may be AWL files, (I suppose a spamassassin question
> also?).  Every 'new' ip/email incoming will create a new  PRIMARY KEY
> (username,email,ip).  If two connections, one on each box, first one
> wins, replication stops and you need to manually issue a bunch of
> commands to skip (two?) transactions and restart slave.


To my opinion, the best way to implement awl is to have a table for each server 
which is basicly one-way replicated (from the only originating server to the 
others in the cluster). The table is to be made up of the fields timestamp, 
username, email, ip, and score. Please note I sayd just "score", not "count" + 
"totscore".

Then, the database may offer a view which merges the tables replicated from the 
various servers (the one "managed" by the server and the ones managed by the 
other servers) in such a way that spamassassin may simply access it like a 
"standard" awl table. Ie, something like:

        select username, email, ip, count(*) as count, sum(score) as totscore
        from (
                select username, email, ip, score from awl0
                union all select username, email, ip, score from awl1
                ...
                union all select username, email, ip, score from awlN
        )
        group by username, mail, ip

The view should be made in such a way that an insert or an update into it would 
automatically trigger an insert in the awl table managed by the server.

Of course, the underlying sql engine has to support views and, most important, 
updates to a view. Maybe I'm wrong, but this is something that mysql doesn't 
do. Besides, that's one of the reasons for which I prefer much more postgresql.

You may see that the timestamp field is defined but never used. The idea is 
that the timestamp field is meant to record the time at which a new entry 
entered into the database. This way one may also implement some methods to 
delete "stale" entries. Ie.: suppose a source (email+ip pair) was used to send 
mostly ham and it did does for, say, one year. It may have reached a very high 
totscore and count. Well, now suppose your reliable source started sending a 
lot of spam. Would you like to have to wait a month or so before its 
whitelistening score would start to lower enough to allow the spam detector not 
to pass that stuff? Well, no. One may, in example, have a sql script run, say, 
hourly from a cron job which deletes awl entries older than, say, three months.

Do you like it?

-----------------------------------
Giampaolo Tomassoni - IT Consultant
Piazza VIII Aprile 1948, 4
I-53044 Chiusi (SI) - Italy
Ph: +39-0578-21100

R: Auto_increment vs SERIAL key types

Reply via email to