Re: random low contrast text with bayes [Solved]

Eric Shubert Sat, 04 Oct 2014 09:18:32 -0700

On 09/03/2014 01:26 AM, Matus UHLAR - fantomas wrote:

On Sun, 31 Aug 2014, Eric Shubert wrote:

I've seen an uptick of spam lately with random low contrast (hidden)
text. This appears to be lowering bayes probabilities.

On 08/31/2014 10:26 PM, John Hardin wrote:

Learn them as spam. That will tend to eliminate that effect.


On 31.08.14 22:54, Eric Shubert wrote:

Been doing that (learning them) for quite a while. I've had that
mechanism set up for several years now, and it's working fairly well
(after I adjusted the scoring upwards for bayes rules).

It appears to me that the hidden text is being randomly generated.
Even saw a random function of some sort in there. I presume it's been
designed to 'poison' bayes by vitue of the random text (and a sizable
amount of it).


note that even the code for low-contrast HTML may be catched as spam...

bayes poisoning has been considered a myth. With good training, and using
hapaxes (enabled by default) it can even help detecting the spam.

John Hardin was instrumental in helping me identify the problem. Therule for low contrast text wasn't firing with SA v3.3.4. I upgraded to3.4.0, which appears to have fixed the problem.


Many thanks John!

P.S. I did have to apply a patch to 3.4.0 in order for spamd to functionproperly. Sorry I neglected to note the bug number (searching closedbugs throws an error at this time). The patch can be found here:

https://github.com/QMailToaster/spamassassin/blob/master/v340-util.patch

--
-Eric 'shubes'

Re: random low contrast text with bayes [Solved]

Reply via email to