From: Raul Dias [mailto:[EMAIL PROTECTED]
> 
> Hit Control+Enter before.
> 
> I want to write a few plugins to improve SA, so I would like to know
> opnions about an AWL implementation for hosts instead of addresses.
> 
> The idea is to help flag hosts as spammers.
> 
> So, if a host is a zombie/spam server, it will help too keep false
> negatives from that host down.
> 
> If a host sends more spam than ham, legitimate users might be flagged as
> spam.  This could be annoying, but whitelists and AWL should help here.
> This is not much different than using an DNSRBL.
> 
> Hosts that send 50%/50% spam/ham, the HAWL would have no real use.
> 
> Hosts with more ham than spam, HAWL will lower FPs, but increase FNs.
> 
> I dont think that the mean value for HAWL should be calculates in the
> same way for AWL, or at least should have limits.  
> 
> In the scenario of low spam and high ham from a host, the mean value
> could be real low, so that any spam comming from that host would be flag
> as ham, which would defeat the use of SA in the first place. This is
> where a ratio limit would help.
> 
> So, some common configuration options would be:
>  - High limit
>  - Low limit
>  - Mean weight
>  - Mean multiplicator (e.g. 0.5)
>  - host whitelist
>  - low ratio limit (the low ration from spam/ham counts that would 
>    make HAWL be ignored [0] )
>  - high ratio limit (same, for upper limit).
> 
> 
> So, what do you think? what to change? what to take care of? 

Raul,

I think your idea is not bad in the overall.

However, I wouldn't get too much "confused" by weight and multiplicators, but I 
would adopt somethink very close to the actual AWL implementation.

The main problem with your approach is that spammers tend to "spread" they 
sendings from many hosts. This may result in few scores (if any) to average for 
each received message. Also, there should be some realtime database reporting 
per-ip spam/ham ratios (I see http://www.senderbase.org/ has some, but I didn't 
dig to much about how to get their reports). So, it could be usefull to use 
this realtime indexes instead of reling on a few entries in a database.

Finally, one way to do what you would, is probably using the very same AWL db: 
its key is composed by both the e-mail sorce and the first two octets of ip 
source. Thereby, one could SUM() (in a SQL meaning) all the rows about a given 
ip address and, maybe, average the incoming mail's score also with the 
resulting value.

Giampaolo

> 
> -Raul Dias
> 

Reply via email to