"Mathew Hendry" <[EMAIL PROTECTED]> writes: > Is there an easy way to detect fraudulent links like the following from > a recent scamspam. > > <A=20 > href=3D"http://hyperiod.hypermart.net/fraud.html"><FONT = > face=3DArial=20 > size=3D2>BestBuy.com/fraud_department.html</FONT></A> > > i.e. both the href and the visible text look like URLs, but don't come > anywhere close (I guess that's the tricky part :) to matching one > another?
I tried writing a rule to detect this. I used a "longest common substring" function as the basis for the test (if I recall correctly, on both the entire URL and the hostname). Basically, I was trying to see how long the LCS was compared to the shorter of the two URLs and I turned it into a percentile test (assuming both looked like URLs). You need to work it into the HTML.pm code. A lot of ham also triggered most of the rules I tried, but I think it's probably possible to write a test that would work well enough. I'd suggest starting with a basic analysis and then start testing various permutations of tests. - full URL | hostname only | domain only | local-part | filename | etc. - longest common substring | word distance algorithms - case-sensitive | case-insensitive Thinking about it now, I think word distance might work better than longest common substring, but the algorithm is a bit more complicated and you might want to pull in an optional module at that point. Another idea would be doing reverse-lookups on both hosts and see if they resolve to the same IP. Unfortunately, that's probably too expensive compared to how often this obfuscation technique is used, but it might work. Daniel -- Daniel Quinlan anti-spam (SpamAssassin), Linux, and open http://www.pathname.com/~quinlan/ source consulting (looking for new work) ------------------------------------------------------- This SF.Net email is sponsored by: INetU Attention Web Developers & Consultants: Become An INetU Hosting Partner. Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission! INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk