New tests lead me to the following rule. It works
and is now deployed on production servers.

full B_PLL /<p>(?:(?!<\/p>).){2000}/msi
describe B_PLL Paragraph Length Limit
score B_PLL 1.5

rawbody has a hidden bug: it breaks the above rule too.
I am re-writing all local rules to "full" until "rawbody" is fixed.

2000 is an arbitrary number that fits the local corpus at this time.
The long paragraph in the original spam had 10797 characters,
and was 192 lines long.

The score is now 1.5/5.0. The original spam scored 1.6/5.0.
An additional rule scores 1.0 for any uri to a php page,
and a third rule scores 1.0 when the From addr contains numbers.
The resulting score for the original spam is now 5.0/5.0

Please catch the bug in rawbody.

Reply via email to