Hi,
I have an apache-2.4.56 install on fedora37 and trying to block
some bots from accessing the site, unless they're trying to access
our RSS feeds. How can I do this?
I'm blocking the bots with SetEnvIF lines in the .htacess file in
the document root like:
SetEnvIf user-agent "(?i:libwww)" stayout=1
deny from env=stayout
<RequireAll>
Require all granted
Require not env stayout
</RequireAll>
However, creating an entry that explicitly allows access to the
XML files before or after doesn't seem to take effect:
RewriteRule linuxsecurity_features\.xml$ - [L]
It is still blocked by the user-agent setting above. I understood
the file was processed from the top down, and when a match is
made, it stops processing. Is that not the case? Shouldn't the
RewriteRule above, if placed before the env rule, be enough to
stop processing the htaccess file and allow access?
The [L] flag only stops later RewriteRule directives from being processed.
Every module still gets its configuration merged from every matching
config context, then decides what to do with its configuration when
passed control at various times.
setenvif is processed very early, so if you can stay with it for
manipulating this variable it will be much more intuitive
I've also tried adding these RewriteRule entries to the server
config htaccess with an Include, but it appears the .htaccess in
the document root is always processed afterwards, even after
finding match in the server config htaccess.
I'd suggest the following:
1. Ditch the "deny", requireall, and require all granted leaving just
"Require not env stayout"
2. Ditch the RewriteRule and do a second SetEnvIf for the exception
(SetEnvIf Request_URI linuxsecurity_features\.xml$ !stayout"
This doesn't fix it, assuming I'm implementing it as you've described.
Removing the RequireAll section produces a site-wide 500 error in error_log:
.htaccess: negative Require directive has no effect in <RequireAny>
directive
SetEnvIf user-agent "(?i:libwww-perl)" stayout=1
SetEnvIf Request_URI ^linuxsecurity_features\.*$ !stayout
RewriteRule linuxsecurity_features\.xml$ - [L]
198.74.49.155 - - [10/Apr/2023:10:32:33 -0400] "GET
/linuxsecurity_features.xml HTTP/1.1" 403 199 "-" "LWP::Simple/6.00
libwww-perl/6.05" X:"SAMEORIGIN" 0/9629 979/8868/199 H:HTTP/1.1
This is all designed to prevent bots from being able to easily mirror
our website. Even though I understand individuals could just change
their user agent, sites like yandex/Acunetix and other services won't.
dave