[SAtalk] large numbers of tiny scores = SPAM!

2002-05-29 Thread Rob Winters
Maybe it's because I'm filtering my SPAM for later manual "safety" screening as opposed to deleting it completely, but it seems like some of the algorithmically generated scores just pussy-foot around too much. Here's are scores for a recent slip-through, a common pyramid scheme message: X-Spa

Re: [SAtalk] large numbers of tiny scores = SPAM!

2002-05-29 Thread Vivek Khera
> "RW" == Rob Winters <[EMAIL PROTECTED]> writes: RW> RAZOR_CHECK version=2.20 = 3.0 (a manual score, soon to be 5.0) Before you give razor the ability to block your mail all by itself, consider the false positives from mailing lists. Apparently there are some fools out there that are inten

Re: [SAtalk] large numbers of tiny scores = SPAM!

2002-05-29 Thread Kingsley G. Morse Jr.
On Wed:11:43, Rob Winters wrote: [...] > SA does not give any credit to the cumulative effect [...] It seems to me that properly weighted scores would avoid this problem. I'd like to think that a good optimization algorithm, such as a genetic algorithm, could do the job. Thanks, Kingsley __

Re: [SAtalk] large numbers of tiny scores = SPAM!

2002-05-29 Thread Tony L. Svanstrom
On Wed, 29 May 2002 the voices made Kingsley G. Morse Jr. write: > On Wed:11:43, Rob Winters wrote: > [...] > > SA does not give any credit to the cumulative effect > [...] > > It seems to me that properly weighted scores would > avoid this problem. I'd like to think that a good > optimization al

Re: [SAtalk] large numbers of tiny scores = SPAM!

2002-05-29 Thread dman
On Wed, May 29, 2002 at 08:45:41PM +0200, Tony L. Svanstrom wrote: | On Wed, 29 May 2002 the voices made Kingsley G. Morse Jr. write: | > On Wed:11:43, Rob Winters wrote: | > [...] | > > SA does not give any credit to the cumulative effect | > [...] | > | > It seems to me that properly weighted sc

Re: [SAtalk] large numbers of tiny scores = SPAM!

2002-05-29 Thread Brian May
Thats why in spam assassin you can set the scores yourself... fit them for your needs.. Brian - Original Message - From: "Rob Winters" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, May 29, 2002 8:43 AM Subject: [SAtalk] large numbers of tiny score

Re: [SAtalk] large numbers of tiny scores = SPAM!

2002-05-29 Thread Kingsley G. Morse Jr.
Good point. Combinations of some rules may be more indicative of spam than others. It would be great if the GA could infer the boolean logic, as well as the scores. Thanks, Kingsley On Wed:20:45, Tony L. Svanstrom wrote: > On Wed, 29 May 2002 the voices made Kingsley G. Morse Jr. write: > > >

Re: [SAtalk] large numbers of tiny scores = SPAM!

2002-05-29 Thread Rob Winters
At 03:23 PM 5/29/2002, Brian May wrote: >Thats why in spam assassin you can set the scores yourself... fit them for >your needs.. Well, adjusting the scores won't necessarily make the tool better. I'm sure that the computationally-derived scores are excellent. In fact, I submit that you've pro

Re: [SAtalk] large numbers of tiny scores = SPAM!

2002-05-29 Thread Nathan Neulinger
Wouldn't it be possible to add this as just another test in the GA? A rule that looks at all the previous rules that matches. Just make sure the GA doesn't do anything with it until the other rules are calculated. Some percentage of the score could be the multiplication factor that is used. On

Re: [SAtalk] large numbers of tiny scores = SPAM!

2002-05-29 Thread rODbegbie
Rob Winters wrote: > How about a "bonus" for cumulative effect? Why not do a second-level > analysis after scoring; something like: > > 3 positive score matches - add 1.0 > 4 positive score matches - add 2.0 > 5 positive score matches - add 4.0 > 6 positive score matches - add 8.0 This reminds me

Re: [SAtalk] large numbers of tiny scores = SPAM!

2002-05-29 Thread Michael C. Berch
On Wednesday, May 29, 2002, at 08:43 AM, Rob Winters wrote: > How about a "bonus" for cumulative effect? Why not do a second-level > analysis after scoring; something like: > > 3 positive score matches - add 1.0 > 4 positive score matches - add 2.0 > 5 positive score matches - add 4.0 > 6 positi

RE: [SAtalk] large numbers of tiny scores = SPAM!

2002-05-29 Thread Michael Moncur
> I came up with the name "Five-Card Charlie", which is a reference to the > game of Blackjack, where under some rules the player wins if he has any > hand of five cards and does not bust (exceed 21). I figured if any > message tripped 5 positive tests, the chances of it being non-spam were > ve

Re: [SAtalk] large numbers of tiny scores = SPAM!

2002-05-30 Thread Matt Sergeant
Kingsley G. Morse Jr. wrote: > Good point. Combinations of some rules may be more > indicative of spam than others. > > It would be great if the GA could infer the boolean > logic, as well as the scores. It's possible that you could group the rules that matched, and feed it into the score gener

Re: [SAtalk] large numbers of tiny scores = SPAM!

2002-05-30 Thread Kingsley G. Morse Jr.
On Thu:10:01, Matt Sergeant wrote: > Kingsley G. Morse Jr. wrote: > > Good point. Combinations of some rules may be more > > indicative of spam than others. > > > > It would be great if the GA could infer the boolean > > logic, as well as the scores. > > It's possible that you could group the ru

Re: [SAtalk] large numbers of tiny scores = SPAM!

2002-05-30 Thread Michael C. Berch
On Wednesday, May 29, 2002, at 10:51 PM, Michael Moncur wrote: >> I came up with the name "Five-Card Charlie", which is a reference to >> the >> game of Blackjack, where under some rules the player wins if he has any >> hand of five cards and does not bust (exceed 21). I figured if any >> mess

Re: [SAtalk] large numbers of tiny scores = SPAM!

2002-05-30 Thread Duncan Findlay
On Thu, May 30, 2002 at 10:01:27AM +0100, Matt Sergeant wrote: > Kingsley G. Morse Jr. wrote: > >Good point. Combinations of some rules may be more > >indicative of spam than others. > > > >It would be great if the GA could infer the boolean > >logic, as well as the scores. > > It's possible that

Re: [SAtalk] large numbers of tiny scores = SPAM!

2002-05-31 Thread Matt Sergeant
Duncan Findlay wrote: > On Thu, May 30, 2002 at 10:01:27AM +0100, Matt Sergeant wrote: > > Clearly, we can not do this with EVERY combination, unless Craig has a > lot of CPU to spare. There are just under 400 rules right now. If we > ended up with 400 tests, there would be 79800 doubles and 1058

Re: [SAtalk] large numbers of tiny scores = SPAM!

2002-06-01 Thread Craig R Hughes
Rob Winters wrote: RW> SA does not give any credit to the cumulative effect that would be obvious RW> to any human reading the "tests=" line, let alone the message itself. I RW> mean, look at *this* one!" This is currently true, and is basically a function of how the score-setting (and score eva

Re: [SAtalk] large numbers of tiny scores = SPAM!

2002-06-01 Thread Craig R Hughes
It's now on again in CVS -- still need to rescore the phrases against the updated corpus, but the scores in there now are decent. C rODbegbie wrote: r> This reminds me... Whatever happened to the discussion of turning Spam r> Phrases back on? I think Craig said it was something he'd be looking

Re: [SAtalk] large numbers of tiny scores = SPAM!

2002-06-01 Thread Craig R Hughes
Michael C. Berch wrote: MCB> No, because the GA (if I understand how it is used correctly) only MCB> considers rules individually, and not in combination (by number or MCB> specifically). What I and some others have argued is that in many cases MCB> tripping 5 low-scoring rules may be a better i

Re: [SAtalk] large numbers of tiny scores = SPAM!

2002-06-01 Thread Craig R Hughes
Duncan Findlay wrote: DF> Clearly, we can not do this with EVERY combination, unless Craig has a DF> lot of CPU to spare. There are just under 400 rules right now. If we DF> ended up with 400 tests, there would be 79800 doubles and 10586800 DF> triplets. We really don't care about *EVERY* combin

Re: [SAtalk] large numbers of tiny scores = SPAM!

2002-06-01 Thread Skip Montanaro
RW> How about a "bonus" for cumulative effect? Why not do a second-level RW> analysis after scoring; something like: RW> RW> 3 positive score matches - add 1.0 RW> 4 positive score matches - add 2.0 RW> 5 positive score matches - add 4.0 RW> 6 positive score matches -