"Mathew Hendry" <[EMAIL PROTECTED]> writes:

> Is there an easy way to detect fraudulent links like the following from
> a recent scamspam.
> 
> <A=20
> href=3D"http://hyperiod.hypermart.net/fraud.html";><FONT =
> face=3DArial=20
> size=3D2>BestBuy.com/fraud_department.html</FONT></A>
> 
> i.e. both the href and the visible text look like URLs, but don't come
> anywhere close (I guess that's the tricky part :) to matching one
> another?

I tried writing a rule to detect this.  I used a "longest common
substring" function as the basis for the test (if I recall correctly, on
both the entire URL and the hostname).  Basically, I was trying to see
how long the LCS was compared to the shorter of the two URLs and I
turned it into a percentile test (assuming both looked like URLs).  You
need to work it into the HTML.pm code.

A lot of ham also triggered most of the rules I tried, but I think it's
probably possible to write a test that would work well enough.

I'd suggest starting with a basic analysis and then start testing
various permutations of tests.

 - full URL | hostname only | domain only | local-part | filename | etc.
 - longest common substring | word distance algorithms
 - case-sensitive | case-insensitive

Thinking about it now, I think word distance might work better than
longest common substring, but the algorithm is a bit more complicated
and you might want to pull in an optional module at that point.

Another idea would be doing reverse-lookups on both hosts and see if
they resolve to the same IP.  Unfortunately, that's probably too
expensive compared to how often this obfuscation technique is used, but
it might work.

Daniel

-- 
Daniel Quinlan                     anti-spam (SpamAssassin), Linux, and open
http://www.pathname.com/~quinlan/   source consulting (looking for new work)


-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to