Re: cvsweb anti-bot protection (was: Retrieving MAC address from struct ifnet)

Jörg Sonnenberger Thu, 03 Jul 2025 02:40:10 -0700

On 7/3/25 6:23 AM, Constantine A. Murenin wrote:

Can you really blame kids for looking at all 5000 links from a single
file, when you give them 5000 links to start with?  Maybe start by not
giving the 5000 unique links from a single file, and implement caching
/ throttling?  How could you know there's nothing interesting in there
if you don't visit it all for a few files first?


Are you intentionally misrepresenting the problem?

These AIs literally behave the exact same way as humans; they're
simply dumber and more persistent.  The way CVSweb is designed, it's
easily DoS'able with the default `wget -r` and `wget --recursive` from
probably like 20 years ago?

This is complete BS. "wget -r" uses a single connection (at any point intime). It uses a consistent source address. It actually honorsrobots.txt by default. None of that applies to the current generation ofAI scrapers:


(1) They have no effective rate limiting mechanism on the origin side.

(2) They are intentionally distributing requests to avoid server siderate limits.

(3) The combination of the two makes most caching useless.
(3) They (intentionally or maliciously) do not honor robots.txt.
(4) They are intentionally faking the user agent.

Comparing AI scrapers to regular non-criminal human behavior is ignorantat best, but otherwise intellectually dishonest.


Joerg

Re: cvsweb anti-bot protection (was: Retrieving MAC address from struct ifnet)

Reply via email to