On 30-Apr-2009, at 11:50, Charles Gregory wrote:
On Thu, 30 Apr 2009, LuKreme wrote:
First off, I suppose that if you get real mail from someone who has
only ever been seen as a spam sender, then yes, the first mail
would be penalized. But is this ever the case?
(nod) Any time someone's address has been used as a spoofed sender
before that legitimate sender makes first contact with a new
correspondent. But as I understand your logic, there is no 'rule' to
distinguish the 'first' AWL entry as 'special' from all the rest...
just that 'others' exist...
Right.
Let's lay out the logic here:
2 AWL is positive or does not exist
a Check for other AWL entries using same address but different hosts.
i If there is an AWL with a negative score, then multiply by
-0.2 and
add to score
So any AWL with a negative score still helps the new mail be negative?
The sender's legit mail helps new spam?
No, the senders AWL HURTS new spam. I fthe score is -2 from the AWL
then -2 * -0.2 = 0.4
ii If there is an AWL with a positive score, under 5.0, then
multiply by
0.1 and add
iii If there is an AWL with a positive score over 5.0, then
multiply it
by 0.4 and add
So in the unlikely event that spam (from a different server)
precedes legitimate mail, the legit sender gets a postitive
adjustment before they have a chance to score negative...
As I understand it the AWL is added after all others, but yes, the
FIRST legitimate mail will be penalized.
Note that this logic will also be problematic when sender has
multiple mail servers. Many senders get a few points positive...
This will only be an issue if those multiple servers have positive AWL
scores.
c if total amount added is over some threshold, normalize on that
threshold
(3 points? 5? 8?)
Now let's presume that the sender is spoofed by spammers on ten
different
IP's, producing ten different AWL entries. How will you distinguish
the legit sender's IP (except by hoping they have scored
negative?)... You will simply add up ALL the IP AWL's and score
*any* mail from the sender
with a significant positive adjustment....
As far as I can tell, though it's not easy to be sure, legitimate
senders have negative AWL scores.
3 AWL is negative
{ crickets }
But how often does that really happen? As I said, most people get a
*few* points on legit mail.
But it's not the points on the mail, it is only the AWL listing that
we're looking at.
The idea being that an average score of 0.8 will 'average' with a
fluke spammy mail and keep the score lower.... But your way is
adding those small scores to essentially ALL mail unless the lucky
sender never mentioned viag.... ooops. There goes *my* score.... LOL
OK, how do we parse out the AWL numbers then so we can see what sorts
of AWL numbers exist for legit senders. As I understand it, if an
email comes in from a know sender who was average 0.8 and this email
scores 3.0, a negative AWL will be applied to normalize the email
closer to 0.8, right? The AWL score is not 0.8, but 3.0 - (AWL value)?
Maybe it makes sense to only do this check if the message has at
least scored positive?
Again, a significant proportion of ham gets a few points.
So yes, if b...@example.com has never emailed me except for a bunch
of spam, then yeah, the message is going to get bumped up in its
score, but how often does that happen? Does that ever happen?
Happens for me all the time. I get dictionary spam with a random
client's address as sender, and then I get an inquiry from the
client about all these 'bounces' they are receiving. Naturally, they
quote the bounce, which includes some spam sign, and the client is
off to a good start with a moderately spammy mail to me. (smile)
But bob could also e-mail you three or four times, getting a small
positive score, then you get spammed "from Bob" with high scores
from a botnet (and I usually get several copies of a spam like
that), and the next time bob e-mails, he gets logic 2.a.ii spplied
above for each and every AWL for his address. Could be hefty....
Er.. ok. Perhaps I am misunderstanding the AWL. As I understand it,
if a bunch of spam comes in from a server with average scores of 7.0
and a new message comes in with a score of 4, it will have a POSITIVE
AWL applied to normalize at 7.0. If a message comes from a know
sender with an average score of 2, and this email scores 4, it will
get a NEGATIVE AWL score to normalize closer to 2.0, right? Since
this is a negative AWL 2.a.ii would not apply because the AWL is
negative, so section 2 is skipped entirely and we are at 3. AWL is
negative => {crickets}.
Also, lets say b...@example.com sends a message after a bunch of
spams have been sent, and say that message scores -1.0, plus an AWL
adjustment of 5.0 based on the above.
I'm sure there are some people who *would* 'fit your model' and have
negative scores on their legit mail and not be hurt by the proposed
rule.
I think we are talking at cross purposes, and that's likely my fault.
I am talking about the AWL adjustment being either positive or
negative. Mail that is more spammy than usual will get penalized up.
Mail that is less spammy than usual will not be affected.
Which, for any yahoo mailing list will be a different server many
times.
And so if your yahoo list scores slightly positive, all those
different yahoo servers will all add to the score. Ditto hotmail,
gmail, etc.
OK, if the value is 0.1 then it would take up to 50 outbound servers
with even distribution to add 5.0 points.
I can see what you *want* to do. I just don't see a practical way to
do it.
That's quite possible. As I said initially, it's jut an idea I had to
make the AWL penalize botnets much more. If it can't be done, that's
fine. I think there's some promise here though.
I'm not married to this idea, I just think there's something here that
might be worth trying.
--
These budget numbers are not just estimates, these are the actual
results for the fiscal year that ended February the 30th.
- GWB