On Fri, 19 Nov 2010, Daniel McDonald wrote:

> On 11/19/10 2:51 PM, "Bowie Bailey" <bowie_bai...@buc.com> wrote:
>
> > rawbody  FR_3TAG_3TAG
> > m'<[abcefghijklmnoqstuvwxz]{3}></[abcefghijklmnoqstuvwxz]{3}>'i
> >
> > It looks for an html tag containing exactly three characters followed by
> > a closing tag which also contains exactly three characters.
>
> But no instances of d,p,r or y.  I'm sure that's a really clever trick for
> something, I just don't have a clue as to what it might be....

It was an attempt to find obfsucated HTML junk that spamers were
using to break up spammy words such as "male medications"

EG: via<sqz></sqz>gra

The idea was that most all legit 3 character HTML tags such as '<div>'
contained at least one of those letters ([dpry]) in them. So a purported
tag that had none of them was not legit and thus probably bogus spammer
spoor.
With the evolution of HTML (xml, etc) that's no longer a safe
asumption, so that rule probably FPs.


-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Reply via email to