Re: SA Concepts - plugin for email semantics

David Jones Tue, 31 May 2016 16:11:07 -0700

>From: RW <rwmailli...@googlemail.com>
>Sent: Tuesday, May 31, 2016 5:20 PM
>To: users@spamassassin.apache.org
>Subject: Re: SA Concepts - plugin for email semantics


>On Tue, 31 May 2016 15:20:56 -0400
>Bill Cole wrote:

>> On 29 May 2016, at 11:07, RW wrote:
>>

>> > Statistical filters are based on some statistical theory combined
>> > with pragmatic kludges and assumptions. Practical filters have been
>> > developed based on what's been found to work, not on what's more
>> > statistically correct.
>>
>> I'm not aware of any hard evidence that the SA Bayes pragmatic
>> kludges and assumptions perform better or worse than an
>> implementation that used fewer or different ones.

>It's not specific to SA, for example there's no sound basis for
>assigning token probability to tokens that have zero ham or spam
>counts, many classifications turn on completely made-up probabilities.
>There's also no way of assigning meaningful probabilities to tokens
>that enter or re-enter the database while it's mature without making
>an assumption about the current spam/ham training ratio.

>The assumption that tokens are independent was never reasonable in the
>first place, there's plenty of natural duplication e.g. ip address and
>RDNS, and strong correlations between important tokens. There's also a
>lot of inadvertent duplication for example from metadata headers that
>are not primarily intended for Bayes.


>I don't think concepts is a particular good idea, but I don't like to
>see someone's worked dismissed on such paper-thin theoretical grounds.



>> > I think the OP is probably underselling it, in that it could be
>> > used to
>> > extract information that normal tokenization can't get, for example:
>> > ...
>> > The main problem is that you'd need a lot of rules to make a
>> > substantial
>> > difference.
>>
>> So: re-invent SpamAssassin v1 but without rule scores, using Bayes to
>> do half-assed dynamic score adjustment per site with rules that with
>> either evolve constantly or grow stale?

>I was thinking that it would be an alternative to local custom rules
>- particularly for spams that leave Bayes with little to work with and
> where individual body rules aren't worth much of a score.

I think it could be valuable in custom meta rules.  That's how I would
like to try it out anyway for a while with minuscule scores.

Dave

Re: SA Concepts - plugin for email semantics

Reply via email to