On Fri, 2014-05-23 at 19:36 -0400, Alex wrote: > Hi, > > On Thu, May 22, 2014 at 8:44 PM, John Hardin <jhar...@impsec.org> wrote: > > > On Thu, 22 May 2014, Karsten Bräckelmann wrote: > > > > On Thu, 2014-05-22 at 15:49 -0400, James B. Byrne wrote: > >> > >>> I am clearly missing something with these rules but I lack the > >>> experience to > >>> see what it is: > >>> > >>> score RAW_BLANK_LINES_05 0.5 > >>> rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i > >>> > >> > >> Why is everyone trying to match empty lines these days? Must be spam I'm > >> missing out on. ;) > >> > > > > Heh. Something similar just plopped into my spam quarantine. > > > > You might want to do this: > > > > rawbody MANY_BLANK_LINES /(?:(?:<br>)?\r?\n){9}/mi > > > I tried this for a while in my corpus. Have you combined this into a meta? > I'm finding this matches far too much ham to even remotely be considered. > Was it the intention to only match fn's? > If you're going to write rules to reliably match HTML spam, its a good idea to start by reading enough of the HTML generated by the more popular MUAs, especially the MS ones, to be familiar with the tag sequences they generate because a lot of them are quite unlike anything you'd expect a rationally designed program to produce. IOW you need some familiarity with the tangled strings of tags that can be found in *ham* so you can avoid matching them by mistake.
Martin > Thanks, > Alex