On Tue, Jul 9, 2024 at 9:11 AM Dave Wreski
<[email protected]> wrote:
> Hi, I have the following rewrite rule in place on one of our staging sites
> to redirect bots and malicious scripts to our corporate page:
>
> RewriteCond %{HTTP_USER_AGENT}
> ^$ [OR]
> RewriteCond %{HTTP_USER_AGENT}
> ^.*(<|>|'|%0A|%0D|%27|%3C|%3E|%00).* [NC,OR]
> RewriteCond %{HTTP_USER_AGENT}
> ^.*(HTTrack|clshttp|archiver|loader|email|nikto|miner|python).* [NC,OR]
> RewriteCond %{HTTP_USER_AGENT}
> ^.*(winhttp|libwww\-perl|curl|wget|harvest|scan|grab|extract).* [NC,OR]
> RewriteCond %{HTTP_USER_AGENT}
> ^.*(Googlebot|SemrushBot|PetalBot|Bytespider|bingbot).* [NC]
> RewriteRule (.*) https://guardiandigital.com$1 [L,R=301]
>
> However, it doesn't appear to always work properly:
>
> 66.249.68.6 - - [08/Jul/2024:11:43:41 -0400] "GET /robots.txt HTTP/1.1"
> 200 343 r:"-" "Mozilla/5.0 (compatible; Googlebot/2.1; +
> http://www.google.com/bot.html)" 0/5493 1145/6615/343 H:HTTP/1.1
> U:/robots.txt s:200
>
> Instead of making changes to my rules then having to wait until the
> condition is met (Googlebot scans the site again), I'd like to simulate the
> above request against my ruleset to see if it matches. Is this possible?
>
> Thanks,
> Dave
>
>
>
For the user agent, just install an extension in your browser to "fake" the
value, and make a HTTP request. Alternatively, you can use curl as well.