[Bug 1375] do RBL look-ups on URLs

bugzilla-daemon 2 Mar 2004 22:35:10 -0000

http://bugzilla.spamassassin.org/show_bug.cgi?id=1375






------- Additional Comments From [EMAIL PROTECTED]  2004-03-02 14:35 -------
Sorry, but I disagree with most of the previous comment.

Before even getting into the arguments, there is a simple counterexample to your
proposal of ignoring just empty links. The example that was attached a few
comments ago shows a spammer already including an href to an innocent site
kai.com with an empty link area. Your proposal would result in the next spam
from that person including the same href with font size 1 text, making the test
useless. There is no reason to add a useless test.

We are not talking about a general open-ended AI problem. The browser solves the
problem already by interpreting the HTML and rendering pixels. If there are
enough pixels in contrasting foreground and background colors in an area that is
declared as a clickable hotspot, then the link is visible. The question is not
if it is possible to do the same thing, but how close can we get to the same
determination using only a reasonable amount of processing. We already have code
to determine if text has been made invisible by being inside an HTML comment or
in an invisible color or in a very tiny font. We need that already to catch
attempts to make invisible non-spam content dominate the scoring.

That still leaves open the different problem that an image can be visible or
invisible and we cannot tell without downloading it from a website, possibly
triggering a webbug. I don't know how to get around that one, which means that
while I strongly disagree that this is an "AI problem" whose solution would give
us a place in history, I do agree that we may not be able to solve the general
problem.

Most importantly, I disagree with your conclusions:

"the distributed nature of dns would seem to defeat any attempts at dos by
looking up links"

If SA has to look up hundreds of legitimate domains to process each message,
that will slow down processing too much.

Spammers can create throwaway domains and host them on DNS servers that are
designed to slow down anything that queries them. The distributed nature of DNS
only helps to the degree that queries are cached, but spam cam contain
variations of host names that will ensure that doesn't help.

The solution to avoiding DoS is not to look up absolutely every link, instead
choosing a random sample. But that allows the spammer to set their own
probabilities of detection by how many invisible links they include for each
visible link.

"the only thing spammers would achieve by loading up spams with bogus links, is
making it less likely that their spams would get through"

The links would only be "bogus" in the sense that they are not really links that
the spammer wants anybody to click on. They could point to real, innocent
websites that we would not want on any RBL, like the kai.com example. They would
not appear when someone readsd the spam, so they will not be clicked on. The
only thing that might look up the domains of the hrefs would be spam filters,
which will find that they are innocent.

What _might_ work is a rule that is DoS-proof because it looks up only a limited
number of hrefs, and another rule that penalizes mail that has enough links that
it may be an attempt to introduce chaff to fool the first rule. Both of those
would be made more effective by ignoring links that have invisible text. I still
don't know what we would do about links that use images for their clickable 
area.





------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 1375] do RBL look-ups on URLs

Reply via email to