On 09/03/2014 01:26 AM, Matus UHLAR - fantomas wrote:
On Sun, 31 Aug 2014, Eric Shubert wrote:
I've seen an uptick of spam lately with random low contrast (hidden)
text. This appears to be lowering bayes probabilities.

On 08/31/2014 10:26 PM, John Hardin wrote:
Learn them as spam. That will tend to eliminate that effect.

On 31.08.14 22:54, Eric Shubert wrote:
Been doing that (learning them) for quite a while. I've had that
mechanism set up for several years now, and it's working fairly well
(after I adjusted the scoring upwards for bayes rules).

It appears to me that the hidden text is being randomly generated.
Even saw a random function of some sort in there. I presume it's been
designed to 'poison' bayes by vitue of the random text (and a sizable
amount of it).

note that even the code for low-contrast HTML may be catched as spam...

bayes poisoning has been considered a myth. With good training, and using
hapaxes (enabled by default) it can even help detecting the spam.

John Hardin was instrumental in helping me identify the problem. The rule for low contrast text wasn't firing with SA v3.3.4. I upgraded to 3.4.0, which appears to have fixed the problem.

Many thanks John!

P.S. I did have to apply a patch to 3.4.0 in order for spamd to function properly. Sorry I neglected to note the bug number (searching closed bugs throws an error at this time). The patch can be found here:

-Eric 'shubes'

Reply via email to