Re: [Evergreen-general] Question about search engine bots & DB CPU spikes

2021-12-07 Thread Jason Stephenson via Evergreen-general
On 12/2/21 10:07 PM, JonGeorg SageLibrary via Evergreen-general wrote: I tried that and still got the loopback address, after restarting services. Any other ideas? And the robots.txt file seems to be doing nothing, which is not much of a surprise. I've reached out to the people who host our

Re: [Evergreen-general] Question about search engine bots & DB CPU spikes

2021-12-03 Thread JonGeorg SageLibrary via Evergreen-general
The DorkBot queries I'm referring to look like this: [02/Dec/2021:12:08:13 -0800] "GET /eg/opac/results?do_basket_action=Go=1_record_view=1=Search_highlight=1=metabib_basket_action=1=keyword%27%22%3Amat_format=1=176=1 HTTP/1.0" 200 62417 "-" "UT-Dorkbot/1.0" they vary after metabib, but all are

Re: [Evergreen-general] Question about search engine bots & DB CPU spikes

2021-12-03 Thread JonGeorg SageLibrary via Evergreen-general
Thank you! -Jon On Fri, Dec 3, 2021 at 8:10 AM Blake Henderson via Evergreen-general < evergreen-general@list.evergreen-ils.org> wrote: > JonGeorg, > > This reminds me of a similar issues that we had. We resolved it with this > change to NGINX. Here's the link: > > >

Re: [Evergreen-general] Question about search engine bots & DB CPU spikes

2021-12-03 Thread Blake Henderson via Evergreen-general
JonGeorg, This reminds me of a similar issues that we had. We resolved it with this change to NGINX. Here's the link: https://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/user/blake/LP1913610_nginx_request_limits and the bug:

Re: [Evergreen-general] Question about search engine bots & DB CPU spikes

2021-12-02 Thread JonGeorg SageLibrary via Evergreen-general
I tried that and still got the loopback address, after restarting services. Any other ideas? And the robots.txt file seems to be doing nothing, which is not much of a surprise. I've reached out to the people who host our network and have control of everything on the other side of the firewall.

Re: [Evergreen-general] Question about search engine bots & DB CPU spikes

2021-12-01 Thread JonGeorg SageLibrary via Evergreen-general
The LONG STRING sometimes contains a word, but it's usually just a string of numbers repeated, like this- $_78110$[$_78110$, $_78110$$_78110$), $_78110$]$_78110$, $_78110$$_78110$. The numbers change which is why I suspect it's a SQL injection attempt. I agree re blocking by IP's. I didn't set

Re: [Evergreen-general] Question about search engine bots & DB CPU spikes

2021-12-01 Thread Jeff Davis via Evergreen-general
Our robots.txt file (https://catalogue.libraries.coop/robots.txt) throttles Googlebot and Bingbot to 60 seconds and disallows certain other crawlers entirely. So even 10 seconds seems generous to me. Of course, robots.txt will only be respected by well-behaved crawlers; there's nothing

Re: [Evergreen-general] Question about search engine bots & DB CPU spikes

2021-12-01 Thread Jason Stephenson via Evergreen-general
JonGeorg, If you're using nginx as a proxy, that may be the configuration of Apache and nginx. First, make sure that mod_remote_ip is installed and enabled for Apache 2. Then, in eg_vhost.conf, find the 3 lines the begin with "RemoteIPInternalProxy 127.0.0.1/24" and uncomment them. Next,

Re: [Evergreen-general] Question about search engine bots & DB CPU spikes

2021-11-30 Thread JonGeorg SageLibrary via Evergreen-general
Because we're behind a firewall, all the addresses display as 127.0.0.1. I can talk to the people who administer the firewall though about blocking IP's. Thanks -Jon On Tue, Nov 30, 2021 at 8:20 PM Jason Stephenson via Evergreen-general < evergreen-general@list.evergreen-ils.org> wrote: >

Re: [Evergreen-general] Question about search engine bots & DB CPU spikes

2021-11-30 Thread Jason Stephenson via Evergreen-general
JonGeorg, Check your Apache logs for the source IP addresses. If you can't find them, I can share the correct configuration for Apache with Nginx so that you will get the addresses logged. Once you know the IP address ranges, block them. If you have a firewall, I suggest you block them

[Evergreen-general] Question about search engine bots & DB CPU spikes

2021-11-30 Thread JonGeorg SageLibrary via Evergreen-general
Question. We've been getting hammered by search engine bots [?], but they seem to all query our system at the same time. Enough that it's crashing the app servers. We have a robots.txt file in place. I've increased the crawling delay speed from 3 to 10 seconds, and have explicitly disallowed the