On 11/23/11 6:55 PM, Christian Grunfeld wrote:
Hi,
I have an idea to discuss here with experts !
What is the main MAIN difference between spam and ham ?
...
...
Answer: spam is "one way ticket" and ham is 99.99% "round trip" !
What research can you cite for these figures? I beg to differ. Think of
all the ticketing systems, confirmation tickets by e-mail,
invoices-by-email, mail to info@ addresses etc. Do you really, really
propose to blacklist / mark as spam them all, first time?
(legit notifications can be "one way ticket" but you can mark them as
ham later)
What do I mean? you never never answer (or it is really strange) a
spam message. Average users, who someone said that are stupid and more
stupid when they are in front of a machine, also dont respond to a
spammy message. At least if they are marked as spam.
If your assumption was true, there was no spam today. If nobody would
ever answer to spam messages, there was no reason for spammers to keep
spamming.
So the idea is...in this days where the ratio of spam/ham is about 80%
(put the ratio you want but be sure it is high enough) lets start with
marking all incomings as spam !
Past days when the ratio of spam/ham was 5% or 10% it was quite logic
that the reverse was true. That is, all incomings were ham and we
tried with a lot of methods to extract or mark the bad emails!
We spent 15 years (up to now) with the Presumption of innocence
analogy of "Everyone charged with a criminal offence shall be presumed
innocent until proved guilty according to law". This approach is
wasting a lot of resources because of the high spam/ham ratio!
Nowdays its easier to invert the logic!
*mark all incomings as spam the first time
*check spam folder always
Many users do this only sporadically, if they do. Some users don't know
where to find the spam folder. Some organizations do not deploy per-user
quarantine area or spam folders. Etc. etc.
*mark as ham....
What mechanism do you propose to have the MUA tell the MTA/MDA that
something is not spam? Also take into account the tens or maybe even
hundreds of different MUA's around (thick clients, webmail clients,
applications etc. etc.) which need to be modified to support your idea...
or (here is the relationship with the first question)
...just answer emails to the people you allways comunicates as you
always did. Here you round the trip and legitimate the sender !
For this we need a modified version of SA autowhitelist not based on
scores but on trusted or answered emails !
Flaws ?
Yes, many. Think of the automatic out-of-office replies, think of all
messages that are sent from noreply@ addresses these days (where the
originating organization tries to make clear by naming it 'noreply@'
that replies are not welcome), think of (solicited) newsletters, mailing
lists etc. etc.
False positives....yes, ONLY the first time for each sender! just
answer your good mails and they´ll become ham next time. Mails not
answered (spam) remains as spam next and next and next !
False negatives...yes, if someone impersonates in the From: as someone
trusted by you (phising). But this could be reduced using the same
methods as autowhitelist uses keeping in a DB pairs of senders - IPs.
Greylists also uses DBs like this.
So, what do we have to waste resources on tons of rules, tons of perl
code, tons of regex if we know that 80% is spam?
Yes, why do you think the world spends so much resources on the spam
problem if the solution would be so easy to implement...?
/rolf