On Wednesday, September 15, 2004, 2:41:14 AM, Chr. Stuckrad wrote: > On Wed, Sep 15, 2004 at 02:17:15AM -0700, Jeff Chan wrote: >> On Wednesday, September 15, 2004, 1:38:30 AM, Julian Field wrote: >> > ... Is it possible to detect where >> > <A HREF="foo">bar</A> >> > and foo and bar are unrelated domains? >> >> That could be a good idea for a rule. It would be nice if it >> could be determined canonically, without actually resolving >> either location.
> IMHO this is near impossible. > The trivial String Back-reference check can never > determine whether 'foo' and 'bar' are un*related*. > Just whether the text *in* the HREF is unequal to > the text shown to the user highlighted as a link. > In all cases, where the HREF is only 'semantically' > *related* to the following link text, a string check > will assume 'spam', while 'spam/scam' will sooner or > later just obfuscate the text portion by javascript > or encoding tricks. > e.g.: <a HREF="www.eplus.de">imail.de</a> > is 'related' (even if 'mis'constructed) > because you find access to the 'imail.de' > Mails via the 'www.eplus.de' webserver. > Also many Mail-Texts of the kind > ... to reach FOO click <a HREF="somedomain">here</a> > would be very difficult to 'analyze correctly'. > So I believe it to be an interesting idea for AI specialists, > but alas not for inclusion in spamassassin as it works now. > Stucki (postmaster at mi.fu-berlin.de using spamassassin 2.63) Hmm, well there's always the brute force method of matching phisher URI domains in our phishing SURBL using urhrhsbl, urirhssub or SpamCopURI: http://www.surbl.org/lists.html#ph Jeff C. -- Jeff Chan mailto:[EMAIL PROTECTED] http://www.surbl.org/