On Thu, 3 Jul 2025 at 10:18, Mouse <mo...@rodents-montreal.org> wrote: > Actually, most offenders of type (1) usually just go into the automated > list, because I don't use the top and bottom addresses of my netblock > for anything but scanner sentinels; anyone trying to access them goes > into the automated list. Most address-range scanners hit this. Only > the ones that are visible enough to get human handling ever go into the > manually-maintained list.
This is actually an interesting point that's very trivial yet something I've never seen before! Using an extra unused address as an indication of bot activity, is an interesting approach. But doesn't this also leave you vulnerable for an attacker to block you from a legitimate network, by intentionally visiting said resource from a network you'd rather not be blocked? E.g., they can go to the coffee shop or university you frequent, or even a workplace, scan your network in the most minimal way, and then you can't access your own network from said public resource anymore? This is the same reason why spamtrap email addresses could also be misused by an attacker to intentionally disconnect you from some major mail server, for example. > > Another possible reason is that I don't speak HTTPS; I consider it > plausble the LLM scrapers have drunk the "HTTPS is the One True Way" > koolaid and aren't even trying HTTP. Some of the port-80 connections > that proceed to send me binary garbage may be attempts to initiate > HTTPS (even though it's the HTTP port); whatever they are, they get > dropped into the automated ban list along with anything else sending > something I don't recognize in the position of an HTTP verb. This is yet another good reason to not support HTTPS! Do you also publish your source code through CVSweb? I think the whole problem here is that it's very expensive for CVSweb to process these requests, so, it is easy to make it drown in these requests and cause DoS. C.