On 12/2/21 10:07 PM, JonGeorg SageLibrary via Evergreen-general wrote:
I tried that and still got the loopback address, after restarting
services. Any other ideas? And the robots.txt file seems to be doing
nothing, which is not much of a surprise. I've reached out to the people
who host our
The DorkBot queries I'm referring to look like this:
[02/Dec/2021:12:08:13 -0800] "GET
/eg/opac/results?do_basket_action=Go=1_record_view=1=Search_highlight=1=metabib_basket_action=1=keyword%27%22%3Amat_format=1=176=1
HTTP/1.0" 200 62417 "-" "UT-Dorkbot/1.0"
they vary after metabib, but all are
Thank you!
-Jon
On Fri, Dec 3, 2021 at 8:10 AM Blake Henderson via Evergreen-general <
evergreen-general@list.evergreen-ils.org> wrote:
> JonGeorg,
>
> This reminds me of a similar issues that we had. We resolved it with this
> change to NGINX. Here's the link:
>
>
>
JonGeorg,
This reminds me of a similar issues that we had. We resolved it with
this change to NGINX. Here's the link:
https://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/user/blake/LP1913610_nginx_request_limits
and the bug:
I tried that and still got the loopback address, after restarting services.
Any other ideas? And the robots.txt file seems to be doing nothing, which
is not much of a surprise. I've reached out to the people who host our
network and have control of everything on the other side of the firewall.
The LONG STRING sometimes contains a word, but it's usually just a string
of numbers repeated, like this- $_78110$[$_78110$, $_78110$$_78110$),
$_78110$]$_78110$, $_78110$$_78110$. The numbers change which is why I
suspect it's a SQL injection attempt.
I agree re blocking by IP's. I didn't set
Our robots.txt file (https://catalogue.libraries.coop/robots.txt)
throttles Googlebot and Bingbot to 60 seconds and disallows certain
other crawlers entirely. So even 10 seconds seems generous to me.
Of course, robots.txt will only be respected by well-behaved crawlers;
there's nothing
JonGeorg,
If you're using nginx as a proxy, that may be the configuration of
Apache and nginx.
First, make sure that mod_remote_ip is installed and enabled for Apache 2.
Then, in eg_vhost.conf, find the 3 lines the begin with
"RemoteIPInternalProxy 127.0.0.1/24" and uncomment them.
Next,
Because we're behind a firewall, all the addresses display as 127.0.0.1. I
can talk to the people who administer the firewall though about blocking
IP's. Thanks
-Jon
On Tue, Nov 30, 2021 at 8:20 PM Jason Stephenson via Evergreen-general <
evergreen-general@list.evergreen-ils.org> wrote:
>
JonGeorg,
Check your Apache logs for the source IP addresses. If you can't find
them, I can share the correct configuration for Apache with Nginx so
that you will get the addresses logged.
Once you know the IP address ranges, block them. If you have a firewall,
I suggest you block them
Question. We've been getting hammered by search engine bots [?], but they
seem to all query our system at the same time. Enough that it's crashing
the app servers. We have a robots.txt file in place. I've increased the
crawling delay speed from 3 to 10 seconds, and have explicitly disallowed
the
11 matches
Mail list logo