On Fri, 2014-05-23 at 19:36 -0400, Alex wrote:
> Hi,
> 
> On Thu, May 22, 2014 at 8:44 PM, John Hardin <jhar...@impsec.org> wrote:
> 
> > On Thu, 22 May 2014, Karsten Bräckelmann wrote:
> >
> >  On Thu, 2014-05-22 at 15:49 -0400, James B. Byrne wrote:
> >>
> >>> I am clearly missing something with these rules but I lack the
> >>> experience to
> >>> see what it is:
> >>>
> >>> score RAW_BLANK_LINES_05 0.5
> >>> rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i
> >>>
> >>
> >> Why is everyone trying to match empty lines these days? Must be spam I'm
> >> missing out on. ;)
> >>
> >
> > Heh. Something similar just plopped into my spam quarantine.
> >
> > You might want to do this:
> >
> >   rawbody  MANY_BLANK_LINES  /(?:(?:<br>)?\r?\n){9}/mi
> 
> 
> I tried this for a while in my corpus. Have you combined this into a meta?
> I'm finding this matches far too much ham to even remotely be considered.
> Was it the intention to only match fn's?
> 
If you're going to write rules to reliably match HTML spam, its a good
idea to start by reading enough of the HTML generated by the more
popular MUAs, especially the MS ones, to be familiar with the tag
sequences they generate because a lot of them are quite unlike anything
you'd expect a rationally designed program to produce. IOW you need some
familiarity with the tangled strings of tags that can be found in *ham*
so you can avoid matching them by mistake.
  

Martin

> Thanks,
> Alex



Reply via email to