Re: A New Approach: Find the Ham

2007-02-12 Thread Duncan Findlay
On Mon, Feb 12, 2007 at 11:00:06PM -0500, Duncan Findlay wrote: > On Sun, Feb 11, 2007 at 11:10:53PM -0500, Duncan Findlay wrote: > > I've read most of the e-mails on this topic and I think the underlying > > problem is that this method relies on knowing exactly which profiles > > (i.e. combination

Re: A New Approach: Find the Ham

2007-02-12 Thread Duncan Findlay
On Sun, Feb 11, 2007 at 11:10:53PM -0500, Duncan Findlay wrote: > I've read most of the e-mails on this topic and I think the underlying > problem is that this method relies on knowing exactly which profiles > (i.e. combinations of rules) valid ham can hit. After re-reading your message with your

Re: HTML mail (was Re: A New Approach: Find the Ham)

2007-02-12 Thread John Rudd
Kelson wrote: Tom Allison wrote: Personally, I think HTML email should be outright discarded from the start. If you look at this arguement presented by the OP then it reinforces the idea that most ascii is ham and most html is spam. Therefore, reject delivery of all html based email. Or to b

Re: HTML mail (was Re: A New Approach: Find the Ham)

2007-02-12 Thread Kenneth Porter
--On Monday, February 12, 2007 12:50 PM -0800 Kelson <[EMAIL PROTECTED]> wrote: In other words, what can adequately replace text/html in the non-plaintext multipart/alternative section such that HTML becomes irrelevant for legitimate uses? Microsoft Word? PDF? RTF? Any of those would be wor

Re: HTML mail (was Re: A New Approach: Find the Ham)

2007-02-12 Thread Kelson
Gene Heskett wrote: With all due respect, that's 100% BS. MIME was invented to handle the non-ascii stuff, and does it very well except for M$, who couldn't follow a std rule with a loaded 44 magnum stuck in Bills ear. 100% BS? So end-users don't like formatting in their messages? Email is

RE: HTML mail (was Re: A New Approach: Find the Ham)

2007-02-12 Thread Coffey, Neal
Gene Heskett wrote: > On Monday 12 February 2007 13:27, Kelson wrote: >> Now, if you can come up with another markup language for formatting >> email... >> >> [...] >> * And you can get all the major email clients to use it for formatted >> composition instead of HTML (so end users can still ma

Re: HTML mail (was Re: A New Approach: Find the Ham)

2007-02-12 Thread Gene Heskett
On Monday 12 February 2007 13:27, Kelson wrote: >Tom Allison wrote: >> Personally, I think HTML email should be outright discarded from the >> start. If you look at this arguement presented by the OP then it >> reinforces the idea that most ascii is ham and most html is spam. >> Therefore, reject

HTML mail (was Re: A New Approach: Find the Ham)

2007-02-12 Thread Kelson
Tom Allison wrote: Personally, I think HTML email should be outright discarded from the start. If you look at this arguement presented by the OP then it reinforces the idea that most ascii is ham and most html is spam. Therefore, reject delivery of all html based email. Or to be more succinct

Re: A New Approach: Find the Ham

2007-02-12 Thread Dan
Duncan & Michael, Thank you for the careful thought and detailed input. Please read my "Protype Config" email of yesterday afternoon. This is not as it appears, NOT a weighted ham finding rules approach but rather a non weighted ham tuned spam finding rules approach. Its unconventional

Re: A New Approach: Find the Ham

2007-02-12 Thread michael moncur
I agree that this isn't going to be the best approach. Detecting ham is simply more difficult: 1. New types of ham emerge more often than new types of spam. Spammers generally stick to tried-and-true subjects while ham is all over the place. 2. Ham is more personalized than spam. Everyone gets v

Re: A New Approach: Find the Ham

2007-02-11 Thread Duncan Findlay
Hey Dan, I've read most of the e-mails on this topic and I think the underlying problem is that this method relies on knowing exactly which profiles (i.e. combinations of rules) valid ham can hit. I see a number of problems: - How do we actually generate the profiles that are to be considered ha

Re: A New Approach: Find the Ham

2007-02-11 Thread .rp
On 10 Feb 2007 at 11:43, Dan wrote: > I've developed a new approach to scoring that I want to 1) share with > everyone and 2) make into a working system thats as accurate as what > I've already built, but easier to use. First, the theory: >[...] > NEW SITUATION > Ham is now the tiniest minorit

RE: A New Approach: Find the Ham

2007-02-11 Thread Philip Seccombe
ssin Users Subject: Re: A New Approach: Find the Ham On Sat, 10 Feb 2007 15:14:56 -0500, Miles Fidelman <[EMAIL PROTECTED]> wrote: >Dan wrote: >> I've developed a new approach to scoring that I want to 1) share with >> everyone and 2) make into a working system thats as acc

Re: A New Approach: Find the Ham

2007-02-11 Thread Theo Van Dinter
On Sat, Feb 10, 2007 at 08:22:41PM +, Nigel Frankcom wrote: > What do Theo, Matt & Co have to say? They've been doing this a lot > longer than us. Unless I'm missing something, this approach is the standard "block everything except for what we explicitly want to receive". Which is great, if y

RE: A New Approach: Find the Ham

2007-02-11 Thread Giampaolo Tomassoni
From: tom [mailto:[EMAIL PROTECTED] > > On Feb 10, 2007, at 3:19 PM, Giampaolo Tomassoni wrote: > > > From: Tom Allison [mailto:[EMAIL PROTECTED] > >> Personally, I think HTML email should be outright discarded from > >> the start. > >> If you look at this arguement presented by the OP then it >

Re: A New Approach: Find the Ham

2007-02-11 Thread tom
On Feb 10, 2007, at 3:19 PM, Giampaolo Tomassoni wrote: From: Tom Allison [mailto:[EMAIL PROTECTED] Personally, I think HTML email should be outright discarded from the start. If you look at this arguement presented by the OP then it reinforces the idea that most ascii is ham and most html is

Re: A New Approach: Find the Ham

2007-02-11 Thread Justin Mason
Long-time SpamAssassin users with a good memory might recall back in SpamAssassin 2.4x, we included quite a few ham-targeting rules, such as "was this sent using User-Agent: Mozilla?", "is this formatted like a reply to a previous message?", "does it include headers from a mailing list?" and "is i

Re: A New Approach: Find the Ham

2007-02-11 Thread John Andersen
On Saturday 10 February 2007, Dan wrote: > On Feb 10, 2007, at 14:38, Mathieu Bouchard wrote: > > How do you ever find FPs if you have so many TP to sort through   > > that it's not even worth sorting through FP+TP to find the FP ?   > > IMHO, that'd be why we assume that mails are ham rather than

Re: A New Approach: Find the Ham

2007-02-11 Thread John Rudd
Giampaolo Tomassoni wrote: From: Miles Fidelman [mailto:[EMAIL PROTECTED] Dan wrote: I've developed a new approach to scoring that I want to 1) share with everyone and 2) make into a working system thats as accurate as what I've already built, but easier to use. First, the theory: NEW ASSUM

Re: A New Approach: Find the Ham

2007-02-10 Thread Burak Ueda
Good point, but will cause trouble UNLESS we find a way to recognize ham 100%. And it must me exactly 100% (99% won't be enough). As other users said, with current system, if we can filter 70-80 of the spam, remaining 20-30% will only be an annoyance, but ham will be delivered. But with the ne

Re: A New Approach: Find the Ham

2007-02-10 Thread Dan
On Feb 10, 2007, at 14:38, Mathieu Bouchard wrote: How do you ever find FPs if you have so many TP to sort through that it's not even worth sorting through FP+TP to find the FP ? IMHO, that'd be why we assume that mails are ham rather than assume that they are spam. I haven't found FP rev

Re: A New Approach: Find the Ham

2007-02-10 Thread Mathieu Bouchard
On Sat, 10 Feb 2007, Dan wrote: With Find the Ham, whitelisting is almost obsolete. When you find an FP, How do you ever find FPs if you have so many TP to sort through that it's not even worth sorting through FP+TP to find the FP ? IMHO, that'd be why we assume that mails are ham rather th

Re: A New Approach: Find the Ham

2007-02-10 Thread Raul Dias
> NEW SITUATION > Ham is now the tiniest minority of all email. > > NEW ASSUMPTION > All messages are spam unless x,y,z score says they're ham. > > NEW APPROACH > Block everything, then create rules to not catch what you do want. > ie, build tests that target the spam (keeping all the tests yo

Re: A New Approach: Find the Ham

2007-02-10 Thread Dan
On Feb 10, 2007, at 12:14, Miles Fidelman wrote: Dan wrote: I've developed a new approach to scoring that I want to 1) share with everyone and 2) make into a working system thats as accurate as what I've already built, but easier to use. First, the theory: NEW ASSUMPTION All messages are s

Re: A New Approach: Find the Ham

2007-02-10 Thread Mark Samples
Is that the same as whitelisting, maybe I do not understand, but a very rigorous approach would be a whitelist methodology which, once a new account is created, they send email to everyone they want to communicate with, and it 'autowhitelists' those addresses, so you can only receive from those

Re: A New Approach: Find the Ham

2007-02-10 Thread Dan
Clarifications: 1) I'm not talking about generating new rules. Rules stay the same. I'm describing a new scoring process only. 2) This would not be a replacement to SA, but an improvement. Just a new way to process results already generated by SA. Ideally, this would be a replacement

RE: A New Approach: Find the Ham

2007-02-10 Thread Giampaolo Tomassoni
From: Miles Fidelman [mailto:[EMAIL PROTECTED] > > Dan wrote: > > I've developed a new approach to scoring that I want to 1) share with > > everyone and 2) make into a working system thats as accurate as what > > I've already built, but easier to use. First, the theory: > > > > NEW ASSUMPTION >

Re: A New Approach: Find the Ham

2007-02-10 Thread Nigel Frankcom
On Sat, 10 Feb 2007 15:14:56 -0500, Miles Fidelman <[EMAIL PROTECTED]> wrote: >Dan wrote: >> I've developed a new approach to scoring that I want to 1) share with >> everyone and 2) make into a working system thats as accurate as what >> I've already built, but easier to use. First, the theory:

Re: A New Approach: Find the Ham

2007-02-10 Thread urgrue
This would be easier to filter. It would also be more adaptive to a statistical approach than a regex approach. Personally, I think HTML email should be outright discarded from the start. If you look at this arguement presented by the OP then it reinforces the idea that most ascii is ham a

Re: A New Approach: Find the Ham

2007-02-10 Thread urgrue
One consideration is that spam getting through is never more than an annoyance. Ham getting caught can be a big problem. So any kind of "deny by default" system has to deal with how to respond to people sending you mail that gets trapped and provide a way for the sender to "get approval". How

RE: A New Approach: Find the Ham

2007-02-10 Thread Giampaolo Tomassoni
From: Tom Allison [mailto:[EMAIL PROTECTED] > > >> CHALLENGE > >> All filtering software is written to score for results that equal > >> spam -> catch the bad > >> > >> SOLUTION > >> Make filtering software score for results that equal ham -> uncatch > >> the good. > >> > >> > >> Your thoughts

RE: A New Approach: Find the Ham

2007-02-10 Thread Giampaolo Tomassoni
From: Tom Allison [mailto:[EMAIL PROTECTED] > > >> CHALLENGE > >> All filtering software is written to score for results that equal > >> spam -> catch the bad > >> > >> SOLUTION > >> Make filtering software score for results that equal ham -> uncatch > >> the good. > >> > >> > >> Your thoughts

Re: A New Approach: Find the Ham

2007-02-10 Thread Miles Fidelman
Dan wrote: I've developed a new approach to scoring that I want to 1) share with everyone and 2) make into a working system thats as accurate as what I've already built, but easier to use. First, the theory: NEW ASSUMPTION All messages are spam unless x,y,z score says they're ham. NEW APPROA

Re: A New Approach: Find the Ham

2007-02-10 Thread Tom Allison
CHALLENGE All filtering software is written to score for results that equal spam -> catch the bad SOLUTION Make filtering software score for results that equal ham -> uncatch the good. Your thoughts? How can this method "spend less time and energy"? Aren't you going to build a "mirro

Re: A New Approach: Find the Ham

2007-02-10 Thread Nigel Frankcom
On Sat, 10 Feb 2007 20:52:17 +0100, "Giampaolo Tomassoni" <[EMAIL PROTECTED]> wrote: >From: Dan [mailto:[EMAIL PROTECTED] >> >> I've developed a new approach to scoring that I want to 1) share with >> everyone and 2) make into a working system thats as accurate as what >> I've already built,

RE: A New Approach: Find the Ham

2007-02-10 Thread Giampaolo Tomassoni
From: Dan [mailto:[EMAIL PROTECTED] > > I've developed a new approach to scoring that I want to 1) share with > everyone and 2) make into a working system thats as accurate as what > I've already built, but easier to use. First, the theory: > > > > SITUATION > In the beginning, all email w

A New Approach: Find the Ham

2007-02-10 Thread Dan
I've developed a new approach to scoring that I want to 1) share with everyone and 2) make into a working system thats as accurate as what I've already built, but easier to use. First, the theory: SITUATION In the beginning, all email was ham. When spam came along, we left the ham alone