Re: AWL functionality messed up?

Jeff Mincy Thu, 28 May 2009 05:27:25 -0700

   From: Linda Walsh <sa-u...@tlinx.org>
   Date: Wed, 27 May 2009 17:28:35 -0700
   
   Jeff Mincy wrote:
   >    From: Linda Walsh <sa-u...@tlinx.org>
   >    Date: Wed, 27 May 2009 12:48:43 -0700
   >    
   >    Bowie Bailey wrote:  >    ----
   >    At face value, this seems very counter productive.
   >    
   > You still aren't understanding the wiki or the AWL scoring or what AWL
   > is trying to do.
   ----
        Ah, but it only seems I'm daft, today...:-)
   
   >    If I get spam from 1000 senders, they all end up in my
   >    AWL???
   >    
   > yes.   every email+ip address pair that sends you email winds up in
   > your AWL with an average score for that pair.  This is ok.
   ----
        GRRR....not so ok in my mindset, but ... and ... errr..
   well that only makes it more confusing, in a way...since I was
   only 99% certain that I'd never gotten any HAM from hostname
   '518501.com' (thinking for a short period that AWL might be classify
   things by hosts as reliable or not, instead of, or in addition to
   by email-addr), but I'm 99.97% certain I've never gotten any HAM
   from user 'paypal.notify' (at) hostname '5185
   
It is using the relay IP address, not the hostname...
You've most likely received some other spam from this email+ip pair
that was scored as ham.  Hard to tell without seeing the original
scores.
   
   >    AWL should only be added to by emails judged to be 'ham' via
   >    the feed back mechanisms --, spammers shouldn't get bonuses for
   >    being repeat senders...
   >    
   > You are getting too attached to the 'whitelist' part of the name.
   > Pretend AWL stands for average weighting list.
   =====
        Aw...come on.  Isn't the world difficult enough without
   changing white to black or white to weighing?  I mean, we humans
   have enough trouble agreeing on what our symbols, "words" mean in
   relation to concepts and all without ya goin' and redefining perfectly
   good acceptable symbols to mean something else completely and still
   claim it to be some semblance of English.   No wonder most of the
   non-techno-literate humans on this world regard us techies with
   a hint of suspicion regarding the difficulty of problems.  We go around
   redefining words to suit reality and catch the heat when the rest of
   the world doesn't understand our meaning:
   
I don't think AWL is the best possible name for the functionality,
simply because it is easy to misinterpret.


   > AWL isn't whitelisting spammers.   It is pushing the score to the
   > average for that sender.   The sender can have a high average or a low
   > average.   
   ---
        An average?  So it keeps the scores of all the past emails of every 
email we 
   ever got sent?  Must just store a weighted average -- otherwise
   the space (hmm...someone said something about 80MB+ auto-whitelist DB
   files?)....
   
AWL tracks the total score and the number of messages.

        Why not call it the Historically Based Score Normalizer or
   HBSN module?  Db file could be "historical-norms" or something.
   
Call it BOB if that will help ...
   
   > If the previous email from a particular sender was FP or FN then AWL
   > will have an incorrect average and will wind up doing or trying to do
   > the wrong thing with subsequent email for that sender.
   ----
        Maybe it shouldn't add in the 'average' unless it exceeds
   the 'auto-learning threshold'??  I.e. something like the
   'bayes_auto_learn_threshold_nonspam' for HAM and the
   'bayes_auto_learn_threshold_spam' for SPAM.  Assuming it doesn't
   already do such a thing, it would make a little sense...so as
   not to train it on 'bad data'...
   
Perhaps.   I don't have a particularly strong opinion.

        When I run "sa-learn --spam <email>" over a message, can I
   assume (or is it the case) that telling SA, a message was 'spam'
   would assign a sufficiently large value to the 'HBSN' value for that
   sender to reduce any effect of having falsely (if it is likely to happen)
   incorrect value?
   
Nope.

        Or might I at least assume that each "sa-learn" over a message
   will modify it's AWL score appropriately?
   
no.  You shouldn't assume.  sa-learn doesn't modify the AWL entry.
You can use spamassassin --add-to-blacklist.

   > You can remove addresses using spamassassin --remove-from-whitelist
   ----
        Yes...saw that after visiting the wiki.  Is there a
   --show-whitelist-with-current-scores-and-their-weight switch as well
   (as opposed to one that only showed the addr's in the white list, or only
   showed the non-weighted scores)?
   
If I understand what you are asking for here, you can add an X-Spam-AWL
header that gives you the current scores:
  add_header all AWL awl=_AWL_, mean=_AWLMEAN_, count=_AWLCOUNT_, 
prescore=_AWLPRESCORE_
The awl scores are stored in a database file.  You can do db type
things with the awl file.

        Thanks...and um...
        How difficult would it be to have the name of the module reflect
   what it's actually doing?  maybe roll out a name change with the next
   ".dot" release of SA?  (3.3? 3.4?)  Might alleviate some amount of
   confusion(?)...

This has come up before.   Changing the meaning of AWL would probably
help.  Changing the acronym would be more work and would be disruptive.

        Does the AWL also keep track of when it last saw an 'email' addr
   so it can 'expire' the oldest entries so the db doesn't grow to eventually
   consume all forms of matter and energy in the universe?  :-)
   
No.  It doesn't expire.  Yes.  It just grows mainly depending on how
much spam you've gotten.  Nearly every spam message has a different
made up or forged email address.

There is a check_whitelist script that can clean out various entries.

Also, you can store the AWL in an SQL db.  The sql table can have
timestamps and can do expiration.

-jeff

Re: AWL functionality messed up?

Reply via email to