On Fri, 14 Nov 2003, Carl R. Friend wrote:

>    For the assembled group -- is it possible to do a DB lookup,
> either in an eval() or some other mechanism, in a "uri" rule?
> If we could do a DB lookup on URIs (or, more properly, the
> domain portion of URIs) I think that'd be a win (at, of course,
> the expense in human time).
>

I've been thinking about that exact topic. The Bayes engine
already parses and tokenizes hostnames from URIs (the UD: tokens).
If there were a hash DB made with the spam-site hostname as key and
score,description as value (something like the sendmail access db)
then it should be pretty easy to take those UD: tokens and do a
lookup and add results to total score.

It would be much faster and use less memory than the various
collections of regex spam-host rules that have been discussed
here (such as William's or Chris's "evilrules" ;).
(I have a 15,000 line sendmail access db that doesn't bother
it a bit ;).

Another advantage is that it would be possible to update the
database 'hot' (IE without having to kill and restart spamd,
the way that you have to do to update regex rules).

It might even be possible to automate the updating of the
database. (take hostnames found by Bayes in spam, do DNS lookup
and add if IP in spamhaus nets, in trusted DSBLs, has short TTL,
etc).

I can see one of two different implementations:
1) Have the value be just "score,description" and synthesize the
rule name from the hostname (EG:

spammer.com     1.2,Spamhause business site
->      rule == L_URI_SPAMMER_COM
        score == 1.2
        description == "Spamhause business site"

2) have the value be a triple, "name,score,description" and
explicitly store all attributes:

spammer.com     MEDS_SPAMHAUS,1.2,Spamhause business site

1) would be simpler to update and use up less memory,
2) would be more flexible and let you combine several different
sites into one class of rule.
Asumption, once a host matches a rule, subsequent matches on that
rule name would be ignored.

Probably should also add some kind of time-stamp to each entry to
facilitate automated updating.

-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{



-------------------------------------------------------
This SF. Net email is sponsored by: GoToMyPC
GoToMyPC is the fast, easy and secure way to access your computer from
any Web browser or wireless device. Click here to Try it Free!
https://www.gotomypc.com/tr/OSDN/AW/Q4_2003/t/g22lp?Target=mm/g22lp.tmpl
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to