Re: New Bayes like paradigm

2011-10-14 Thread darxus
On 10/13, Adam Katz wrote: PS: As an SA Committer, do I have access to those logs? Don't think so, but you can just ask for a regular masscheck account if you don't already have one, and with that account do: rsync --exclude '*~' -vaz rsync.spamassassin.org::corpus ./ -- I'd rather be happy

Re: New Bayes like paradigm

2011-10-13 Thread Marc Perkel
On 10/10/2011 9:16 AM, dar...@chaosreigns.com wrote: On 10/10, Marc Perkel wrote: On 9/28/2011 8:02 AM, dar...@chaosreigns.com wrote: On 09/28, Marc Perkel wrote: You would only have to test the rule combinations that the message actually triggered. So if it hit 10 rules then it would be

Re: New Bayes like paradigm

2011-10-13 Thread Adam Katz
On 9/28/2011 8:02 AM, dar...@chaosreigns.com wrote: You definitely have a good point that it would only be necessary to track the combinations that actually show up in emails, however 1024 is only the possible combinations from one set of 10 rules. The number of combinations in the actual

Re: New Bayes like paradigm

2011-10-10 Thread Marc Perkel
On 9/28/2011 8:02 AM, dar...@chaosreigns.com wrote: On 09/28, Marc Perkel wrote: You would only have to test the rule combinations that the message actually triggered. So if it hit 10 rules then it would be 1024 combinations. Seems not to be unreasonable to me. You definitely have a good

Re: New Bayes like paradigm

2011-10-10 Thread darxus
On 10/10, Marc Perkel wrote: On 9/28/2011 8:02 AM, dar...@chaosreigns.com wrote: On 09/28, Marc Perkel wrote: You would only have to test the rule combinations that the message actually triggered. So if it hit 10 rules then it would be 1024 combinations. Seems not to be unreasonable to me.

Re: New Bayes like paradigm

2011-09-28 Thread Marc Perkel
On 9/27/2011 9:25 PM, dar...@chaosreigns.com wrote: On 09/27, Marc Perkel wrote: Here's the kind of think I'm seeing. Spam talks about money - low score. Spam talks about Jesus - low score. Spam talks about money and Jesus and throw in a dear someone and it's spam. I'm hoping to detect

Re: New Bayes like paradigm

2011-09-28 Thread darxus
On 09/28, Marc Perkel wrote: You would only have to test the rule combinations that the message actually triggered. So if it hit 10 rules then it would be 1024 combinations. Seems not to be unreasonable to me. You definitely have a good point that it would only be necessary to track the

Re: New Bayes like paradigm

2011-09-28 Thread darxus
On 09/28, dar...@chaosreigns.com wrote: On 09/28, Marc Perkel wrote: You would only have to test the rule combinations that the message actually triggered. So if it hit 10 rules then it would be 1024 combinations. Seems not to be unreasonable to me. combinations in the actual corpora

Re: New Bayes like paradigm

2011-09-27 Thread Marc Perkel
On 9/25/2011 5:37 PM, RW wrote: On Sun, 25 Sep 2011 09:28:32 -0700 Marc Perkel wrote: Here's what I'd like to be able to do. I'd like a program of some sort where I could take word tokes - like name of rules that were triggered - and look for rule combinations that indicate spam or ham. For

Re: New Bayes like paradigm

2011-09-27 Thread darxus
On 09/27, Marc Perkel wrote: Here's the kind of think I'm seeing. Spam talks about money - low score. Spam talks about Jesus - low score. Spam talks about money and Jesus and throw in a dear someone and it's spam. I'm hoping to detect combinations automatcally. You're not really talking about

Re: New Bayes like paradigm

2011-09-27 Thread darxus
Another possibility would be to generate meta rules from random sets of three rules. Some (actually random) examples: meta RANDOM_3_A = (MPART_ALT_DIFF GAPPY_SUBJECT URI_UNSUBSCRIBE) meta RANDOM_3_B = (RCVD_IN_MAPS_OPS WEIRD_PORT FSL_FAKE_GMAIL_RCVD) meta RANDOM_3_C = (FB_CAN_LONGER

New Bayes like paradigm

2011-09-25 Thread Marc Perkel
Here's what I'd like to be able to do. I'd like a program of some sort where I could take word tokes - like name of rules that were triggered - and look for rule combinations that indicate spam or ham. For example, a message triggers 4 rules A B C and D. These rules are combined as follows: A

Re: New Bayes like paradigm

2011-09-25 Thread David F. Skoll
On Sun, 25 Sep 2011 09:28:32 -0700 Marc Perkel supp...@junkemailfilter.com wrote: Each rule combo is then looked up for how often it occurs in spam and how often it occurs in ham. Then the results are combined into some sort of likelihood of being spam or ham. We looked at (and even

Re: New Bayes like paradigm

2011-09-25 Thread Benny Pedersen
On Sun, 25 Sep 2011 09:28:32 -0700, Marc Perkel wrote: Hope you all understand what I'm saying here. How would someone do something like that? meta foo ((a + b + c + d) x) where x is how many of the rules that need to hit then make __a __b __c __d body header what ever you like to scan for

Re: New Bayes like paradigm

2011-09-25 Thread RW
On Sun, 25 Sep 2011 09:28:32 -0700 Marc Perkel wrote: Here's what I'd like to be able to do. I'd like a program of some sort where I could take word tokes - like name of rules that were triggered - and look for rule combinations that indicate spam or ham. For example, a message triggers 4