On Wednesday, September 15, 2004, 2:41:14 AM, Chr. Stuckrad wrote:
> On Wed, Sep 15, 2004 at 02:17:15AM -0700, Jeff Chan wrote:
>> On Wednesday, September 15, 2004, 1:38:30 AM, Julian Field wrote:
>> > ... Is it possible to detect where
>> > <A HREF="foo">bar</A>
>> > and foo and bar are unrelated domains?
>> 
>> That could be a good idea for a rule.  It would be nice if it
>> could be determined canonically, without actually resolving
>> either location.

> IMHO this is near impossible.

> The trivial String Back-reference check can never
> determine whether 'foo' and 'bar' are un*related*.
> Just whether the text *in* the HREF is unequal to
> the text shown to the user highlighted as a link.

> In all cases, where the HREF is only 'semantically'
> *related* to the following link text, a string check
> will assume 'spam', while 'spam/scam' will sooner or
> later just obfuscate the text portion by javascript
> or encoding tricks.

> e.g.:   <a HREF="www.eplus.de">imail.de</a>
>         is 'related' (even if 'mis'constructed)
>         because you find access to the 'imail.de'
>         Mails via the 'www.eplus.de' webserver.

>         Also many Mail-Texts of the kind
>          ... to reach FOO click <a HREF="somedomain">here</a>
>         would be very difficult to 'analyze correctly'.

> So I believe it to be an interesting idea for AI specialists,
> but alas not for inclusion in spamassassin as it works now.

> Stucki  (postmaster at mi.fu-berlin.de using spamassassin 2.63)

Hmm, well there's always the brute force method of matching
phisher URI domains in our phishing SURBL using urhrhsbl,
urirhssub or SpamCopURI:

  http://www.surbl.org/lists.html#ph

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/

Reply via email to