[Evergreen-general] Cover Image Scaling question

2024-05-03 Thread JonGeorg SageLibrary via Evergreen-general
In prior versions of Evergreen, cover images missing from ContentCafe were
always square when they had to be manually uploaded to the server.

We're still currently on 3.7 [planning to upgrade soon] and the traditional
view shows them correctly. The new angular search results however stretch
the images to the new rectangular, portrait oriented format. Documentation
suggests that how those images are handled can be changed, but it doesn't
say how or where. We have approximately 1500 cover images that I've
uploaded and I'd rather not resize them all/reupload them if there is a way
to avoid that.

Suggestions?
Thanks
-Jon
___
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general


[Evergreen-general] Permission reset question

2024-05-03 Thread JonGeorg SageLibrary via Evergreen-general
I have a question. Some of our staff accounts have individually set
permissions, or at least customized permissions, rather than all being
managed by groups. Is there an easy way to reset a user's permissions to
null, so I can then reassign them by group? I'm assuming that this would
have to be done on the database side?

Thanks
-Jon
___
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general


Re: [Evergreen-general] SMS Messages Not Being Received

2024-04-05 Thread JonGeorg SageLibrary via Evergreen-general
Elizabeth, what 3rd party service did you end up using and how well has it
been working for your libraries?
-Jon

On Fri, Apr 5, 2024 at 11:29 AM Elizabeth Davis via Evergreen-general <
evergreen-general@list.evergreen-ils.org> wrote:

> Hi Will
>
>
>
> We’ve had several issues with most of the carriers on SMS issues.
> Sometimes SMS messages were throttled, and patrons would get them days or
> weeks later if at all.  Some carriers just stopped sending them totally.
> Sometimes the messages to a specific carrier would deliver for one library
> in the consortium but not others.  We tried changing from sending SMS to
> MMS with no luck.  Also we were getting asked to add new carriers that
> don’t have gateways and so we were unable to provide SMS service to those
> patrons.In the end we moved to a third-party option to deliver SMS
> notifications.
>
>
>
>
>
> *Elizabeth Davis* (she/her), *Support & Project Management Specialist*
>
> *Pennsylvania Integrated Library System **(PaILS) | SPARK*
>
> (717) 256-1627 | elizabeth.da...@sparkpa.org
> 
> support.sparkpa.org | supp...@sparkpa.org
>
>
>
> *From:* Evergreen-general <
> evergreen-general-boun...@list.evergreen-ils.org> *On Behalf Of *Szwagiel,
> Will via Evergreen-general
> *Sent:* Friday, April 5, 2024 2:13 PM
> *To:* Szwagiel, Will via Evergreen-general <
> evergreen-general@list.evergreen-ils.org>
> *Cc:* Szwagiel, Will 
> *Subject:* [Evergreen-general] SMS Messages Not Being Received
>
>
>
> Good afternoon,
>
>
>
> We have recently been receiving a number of reports from different
> libraries that patrons are not receiving SMS notifications, particularly
> those for holds.  Evergreen is sending the messages like it is supposed to,
> so we are thinking that some carriers may be flagging the messages as
> spam.  Based on the reports we have received, it appears to be most common
> with AT
>
>
>
> For anyone else who might have experienced this in the past, did you have
> any direct interactions with the carrier/s, and if so, what were the steps
> that needed to be taken to prevent this from happening in the future?
>
>
>
> Thank you.
>
>
>
> *William C. Szwagiel*
>
> NC Cardinal Project Manager
>
> State Library of North Carolina
>
> william.szwag...@ncdcr.gov | 919.814.6721
>
> https://statelibrary.ncdcr.gov/services-libraries/nc-cardinal
> 
>
> 109 East Jones Street  | 4640 Mail Service Center
>
> Raleigh, North Carolina 27699-4600
>
> The State Library is part of the NC Department of Natural & Cultural
> Resources.
>
> *Email correspondence to and from this address is subject to the North
> Carolina Public Records Law and may be disclosed to third parties.*
>
>
>
>
> --
>
>
> Email correspondence to and from this address may be subject to the North
> Carolina Public Records Law and may be disclosed to third parties by an
> authorized state official.
> ___
> Evergreen-general mailing list
> Evergreen-general@list.evergreen-ils.org
> http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general
>
___
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general


Re: [Evergreen-general] SMS Messages Not Being Received

2024-04-05 Thread JonGeorg SageLibrary via Evergreen-general
We've had a lot of issues with SMS notifications not going out according to
patrons.

Sometimes it's because the carriers were bought out by another carrier like
when StraightTalk was purchased by Verizon out here in Oregon- so I updated
SMS server settings.

Another was when US Cellular apparently stopped supporting email to text
altogether, at least that is what I was told by their support. However,
some users are still receiving notifications while others aren't- leaving
me completely confused and unable to verify the issue.

For now we've been suggesting to staff to recommend email notifications
instead of SMS, and I've set up email forwarders for all branches to avoid
SPF issues for the time being.

Looking forward to seeing more responses on this topic to see what
solutions others are using. It's been mentioned before that the time to use
a 3rd party application for email & text notifications is approaching and
I'd like to hear more about success/issues with that if anyone has gone
that route.

Thanks for bringing this topic up.
-Jon

On Fri, Apr 5, 2024 at 11:13 AM Szwagiel, Will via Evergreen-general <
evergreen-general@list.evergreen-ils.org> wrote:

> Good afternoon,
>
> We have recently been receiving a number of reports from different
> libraries that patrons are not receiving SMS notifications, particularly
> those for holds.  Evergreen is sending the messages like it is supposed to,
> so we are thinking that some carriers may be flagging the messages as
> spam.  Based on the reports we have received, it appears to be most common
> with AT
>
> For anyone else who might have experienced this in the past, did you have
> any direct interactions with the carrier/s, and if so, what were the steps
> that needed to be taken to prevent this from happening in the future?
>
> Thank you.
>
> *William C. Szwagiel*
>
> NC Cardinal Project Manager
>
> State Library of North Carolina
>
> william.szwag...@ncdcr.gov | 919.814.6721
>
> https://statelibrary.ncdcr.gov/services-libraries/nc-cardinal
>
> 109 East Jones Street  | 4640 Mail Service Center
>
> Raleigh, North Carolina 27699-4600
>
> The State Library is part of the NC Department of Natural & Cultural
> Resources.
>
> *Email correspondence to and from this address is subject to the North
> Carolina Public Records Law and may be disclosed to third parties.*
>
>
>
> --
>
> Email correspondence to and from this address may be subject to the North
> Carolina Public Records Law and may be disclosed to third parties by an
> authorized state official.
> ___
> Evergreen-general mailing list
> Evergreen-general@list.evergreen-ils.org
> http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general
>
___
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general


[Evergreen-general] Issues with Evergreen wiki site?

2023-10-24 Thread JonGeorg SageLibrary via Evergreen-general
I was looking for the database schema and am getting 404 errors for the
links to it. This includes the link on the wiki-
https://wiki.evergreen-ils.org/doku.php?id=dev:database_schemas

-Jon
___
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general


Re: [Evergreen-general] Question about search engine bots & DB CPU spikes

2021-12-03 Thread JonGeorg SageLibrary via Evergreen-general
The DorkBot queries I'm referring to look like this:
[02/Dec/2021:12:08:13 -0800] "GET
/eg/opac/results?do_basket_action=Go=1_record_view=1=Search_highlight=1=metabib_basket_action=1=keyword%27%22%3Amat_format=1=176=1
HTTP/1.0" 200 62417 "-" "UT-Dorkbot/1.0"

they vary after metabib, but all are using the basket feature. They come
from different library branch URLs.
-Jon

On Fri, Dec 3, 2021 at 10:45 AM JonGeorg SageLibrary <
jongeorg.sagelibr...@gmail.com> wrote:

> Yeah, I'm not seeing any /opac/extras/unapi requests in the Apache logs.
> Is DorkBot used legitimately for querying the opac?
> -Jon
>
> On Fri, Dec 3, 2021 at 10:37 AM JonGeorg SageLibrary <
> jongeorg.sagelibr...@gmail.com> wrote:
>
>> Thank you!
>> -Jon
>>
>> On Fri, Dec 3, 2021 at 8:10 AM Blake Henderson via Evergreen-general <
>> evergreen-general@list.evergreen-ils.org> wrote:
>>
>>> JonGeorg,
>>>
>>> This reminds me of a similar issues that we had. We resolved it with
>>> this change to NGINX. Here's the link:
>>>
>>>
>>> https://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/user/blake/LP1913610_nginx_request_limits
>>>
>>> and the bug:
>>> https://bugs.launchpad.net/evergreen/+bug/1913610
>>>
>>> I'm not sure that it's the same issue though, as you've shared a search
>>> SQL query and this solution addresses external requests to
>>> "/opac/extras/unapi"
>>> But you might be able to apply the same nginx rate limiting technique
>>> here if you can detect the URL they are using.
>>>
>>> There is a tool called "apachetop" which I used in order to see the
>>> URL's that were being used.
>>>
>>> apt-get -y install apachetop && apachetop -f
>>> /var/log/apache2/other_vhosts_access.log
>>>
>>> and another useful command:
>>>
>>> cat /var/log/apache2/other_vhosts_access.log | awk '{print $2}' | sort |
>>> uniq -c | sort -rn
>>>
>>> You have to ignore (not limit) all the requests to the Evergreen gateway
>>> as most of that traffic is the staff client and should (probably) not be
>>> limited.
>>>
>>> I'm just throwing some ideas out there for you. Good luck!
>>>
>>> -Blake-
>>> Conducting Magic
>>> Can consume data in any format
>>> MOBIUS
>>>
>>> On 12/2/2021 9:07 PM, JonGeorg SageLibrary via Evergreen-general wrote:
>>>
>>> I tried that and still got the loopback address, after restarting
>>> services. Any other ideas? And the robots.txt file seems to be doing
>>> nothing, which is not much of a surprise. I've reached out to the people
>>> who host our network and have control of everything on the other side of
>>> the firewall.
>>> -Jon
>>>
>>>
>>> On Wed, Dec 1, 2021 at 3:57 AM Jason Stephenson  wrote:
>>>
>>>> JonGeorg,
>>>>
>>>> If you're using nginx as a proxy, that may be the configuration of
>>>> Apache and nginx.
>>>>
>>>> First, make sure that mod_remote_ip is installed and enabled for Apache
>>>> 2.
>>>>
>>>> Then, in eg_vhost.conf, find the 3 lines the begin with
>>>> "RemoteIPInternalProxy 127.0.0.1/24" and uncomment them.
>>>>
>>>> Next, see what header Apache checks for the remote IP address. In my
>>>> example it is "RemoteIPHeader X-Forwarded-For"
>>>>
>>>> Next, make sure that the following two lines appear in BOTH "location
>>>> /"
>>>> blocks in the ngins configuration:
>>>>
>>>>  proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
>>>>  proxy_set_header X-Forwarded-Proto $scheme;
>>>>
>>>> After reloading/restarting nginx and Apache, you should start seeing
>>>> remote IP addresses in the Apache logs.
>>>>
>>>> Hope that helps!
>>>> Jason
>>>>
>>>>
>>>> On 12/1/21 12:53 AM, JonGeorg SageLibrary wrote:
>>>> > Because we're behind a firewall, all the addresses display as
>>>> 127.0.0.1.
>>>> > I can talk to the people who administer the firewall though about
>>>> > blocking IP's. Thanks
>>>> > -Jon
>>>> >
>>>> > On Tue, Nov 30, 2021 at 8:20 PM Jason Stephenson via
>>>> Evergreen-general
>>>> >

Re: [Evergreen-general] Question about search engine bots & DB CPU spikes

2021-12-03 Thread JonGeorg SageLibrary via Evergreen-general
Thank you!
-Jon

On Fri, Dec 3, 2021 at 8:10 AM Blake Henderson via Evergreen-general <
evergreen-general@list.evergreen-ils.org> wrote:

> JonGeorg,
>
> This reminds me of a similar issues that we had. We resolved it with this
> change to NGINX. Here's the link:
>
>
> https://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/user/blake/LP1913610_nginx_request_limits
>
> and the bug:
> https://bugs.launchpad.net/evergreen/+bug/1913610
>
> I'm not sure that it's the same issue though, as you've shared a search
> SQL query and this solution addresses external requests to
> "/opac/extras/unapi"
> But you might be able to apply the same nginx rate limiting technique here
> if you can detect the URL they are using.
>
> There is a tool called "apachetop" which I used in order to see the URL's
> that were being used.
>
> apt-get -y install apachetop && apachetop -f
> /var/log/apache2/other_vhosts_access.log
>
> and another useful command:
>
> cat /var/log/apache2/other_vhosts_access.log | awk '{print $2}' | sort |
> uniq -c | sort -rn
>
> You have to ignore (not limit) all the requests to the Evergreen gateway
> as most of that traffic is the staff client and should (probably) not be
> limited.
>
> I'm just throwing some ideas out there for you. Good luck!
>
> -Blake-
> Conducting Magic
> Can consume data in any format
> MOBIUS
>
> On 12/2/2021 9:07 PM, JonGeorg SageLibrary via Evergreen-general wrote:
>
> I tried that and still got the loopback address, after restarting
> services. Any other ideas? And the robots.txt file seems to be doing
> nothing, which is not much of a surprise. I've reached out to the people
> who host our network and have control of everything on the other side of
> the firewall.
> -Jon
>
>
> On Wed, Dec 1, 2021 at 3:57 AM Jason Stephenson  wrote:
>
>> JonGeorg,
>>
>> If you're using nginx as a proxy, that may be the configuration of
>> Apache and nginx.
>>
>> First, make sure that mod_remote_ip is installed and enabled for Apache 2.
>>
>> Then, in eg_vhost.conf, find the 3 lines the begin with
>> "RemoteIPInternalProxy 127.0.0.1/24" and uncomment them.
>>
>> Next, see what header Apache checks for the remote IP address. In my
>> example it is "RemoteIPHeader X-Forwarded-For"
>>
>> Next, make sure that the following two lines appear in BOTH "location /"
>> blocks in the ngins configuration:
>>
>>  proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
>>  proxy_set_header X-Forwarded-Proto $scheme;
>>
>> After reloading/restarting nginx and Apache, you should start seeing
>> remote IP addresses in the Apache logs.
>>
>> Hope that helps!
>> Jason
>>
>>
>> On 12/1/21 12:53 AM, JonGeorg SageLibrary wrote:
>> > Because we're behind a firewall, all the addresses display as
>> 127.0.0.1.
>> > I can talk to the people who administer the firewall though about
>> > blocking IP's. Thanks
>> > -Jon
>> >
>> > On Tue, Nov 30, 2021 at 8:20 PM Jason Stephenson via Evergreen-general
>> > > > <mailto:evergreen-general@list.evergreen-ils.org>> wrote:
>> >
>> > JonGeorg,
>> >
>> > Check your Apache logs for the source IP addresses. If you can't
>> find
>> > them, I can share the correct configuration for Apache with Nginx so
>> > that you will get the addresses logged.
>> >
>> >     Once you know the IP address ranges, block them. If you have a
>> > firewall,
>> > I suggest you block them there. If not, you can block them in Nginx
>> or
>> > in your load balancer configuration if you have one and it allows
>> that.
>> >
>> > You may think you want your catalog to show up in search engines,
>> but
>> > bad bots will lie about who they are. All you can do with
>> misbehaving
>> > bots is to block them.
>> >
>> > HtH,
>> > Jason
>> >
>> > On 11/30/21 9:34 PM, JonGeorg SageLibrary via Evergreen-general
>> wrote:
>> >  > Question. We've been getting hammered by search engine bots [?],
>> but
>> >  > they seem to all query our system at the same time. Enough that
>> it's
>> >  > crashing the app servers. We have a robots.txt file in place.
>> I've
>> >  > increased the crawling delay speed from 3 to 10 seconds, and have
>> >  > explicitly disallowed the specif

Re: [Evergreen-general] Question about search engine bots & DB CPU spikes

2021-12-02 Thread JonGeorg SageLibrary via Evergreen-general
I tried that and still got the loopback address, after restarting services.
Any other ideas? And the robots.txt file seems to be doing nothing, which
is not much of a surprise. I've reached out to the people who host our
network and have control of everything on the other side of the firewall.
-Jon


On Wed, Dec 1, 2021 at 3:57 AM Jason Stephenson  wrote:

> JonGeorg,
>
> If you're using nginx as a proxy, that may be the configuration of
> Apache and nginx.
>
> First, make sure that mod_remote_ip is installed and enabled for Apache 2.
>
> Then, in eg_vhost.conf, find the 3 lines the begin with
> "RemoteIPInternalProxy 127.0.0.1/24" and uncomment them.
>
> Next, see what header Apache checks for the remote IP address. In my
> example it is "RemoteIPHeader X-Forwarded-For"
>
> Next, make sure that the following two lines appear in BOTH "location /"
> blocks in the ngins configuration:
>
>  proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
>  proxy_set_header X-Forwarded-Proto $scheme;
>
> After reloading/restarting nginx and Apache, you should start seeing
> remote IP addresses in the Apache logs.
>
> Hope that helps!
> Jason
>
>
> On 12/1/21 12:53 AM, JonGeorg SageLibrary wrote:
> > Because we're behind a firewall, all the addresses display as 127.0.0.1.
> > I can talk to the people who administer the firewall though about
> > blocking IP's. Thanks
> > -Jon
> >
> > On Tue, Nov 30, 2021 at 8:20 PM Jason Stephenson via Evergreen-general
> >  > <mailto:evergreen-general@list.evergreen-ils.org>> wrote:
> >
> > JonGeorg,
> >
> > Check your Apache logs for the source IP addresses. If you can't find
> > them, I can share the correct configuration for Apache with Nginx so
> > that you will get the addresses logged.
> >
> > Once you know the IP address ranges, block them. If you have a
> > firewall,
> > I suggest you block them there. If not, you can block them in Nginx
> or
> > in your load balancer configuration if you have one and it allows
> that.
> >
> > You may think you want your catalog to show up in search engines, but
> > bad bots will lie about who they are. All you can do with misbehaving
> > bots is to block them.
> >
> > HtH,
> > Jason
> >
> > On 11/30/21 9:34 PM, JonGeorg SageLibrary via Evergreen-general
> wrote:
> >  > Question. We've been getting hammered by search engine bots [?],
> but
> >  > they seem to all query our system at the same time. Enough that
> it's
> >  > crashing the app servers. We have a robots.txt file in place. I've
> >  > increased the crawling delay speed from 3 to 10 seconds, and have
> >  > explicitly disallowed the specific bots, but I've seen no change
> > from
> >  > the worst offenders - Bingbot and UT-Dorkbot. We had over 4k hits
> > from
> >  > Dorkbot alone from 2pm-5pm today, and over 5k from Bingbot in the
> > same
> >  > timeframe. All a couple hours after I made the changes to the
> robots
> >  > file and restarted apache services. Which out of 100k entries in
> the
> >  > vhosts files in that time frame doesn't sound like a lot, but the
> > rest
> >  > of the traffic looks normal. This issue has been happening
> >  > intermittently [last 3 are 11/30, 11/3, 7/20] for a while, and
> > the only
> >  > thing that seems to work is to manually kill the services on the
> DB
> >  > servers and restart services on the application servers.
> >  >
> >  > The symptom is an immediate spike in the Database CPU load. I
> start
> >  > killing all queries older than 2 minutes, but it still usually
> >  > overwhelms the system causing the app servers to stop serving
> > requests.
> >  > The stuck queries are almost always ones along the lines of:
> >  >
> >  > -- bib search: #CD_documentLength #CD_meanHarmonic #CD_uniqueWords
> >  > from_metarecord(*/BIB_RECORD#/*) core_limit(10)
> >  > badge_orgs(1,138,151) estimation_strategy(inclusion) skip_check(0)
> >  > check_limit(1000) sort(1) filter_group_entry(1) 1
> >  > site(*/LIBRARY_BRANCH/*) depth(2)
> >  >  +
> >  >   |   | WITH w AS (
> >  >  |   | WITH */STRING/*_keyword_xq AS (SELECT
> >  >

Re: [Evergreen-general] Question about search engine bots & DB CPU spikes

2021-12-01 Thread JonGeorg SageLibrary via Evergreen-general
The LONG STRING sometimes contains a word, but it's usually just a string
of numbers repeated, like this- $_78110$[$_78110$, $_78110$$_78110$),
$_78110$]$_78110$, $_78110$$_78110$. The numbers change which is why I
suspect it's a SQL injection attempt.

I agree re blocking by IP's. I didn't set the robots file crawl time any
higher as I wanted to see what, if any, effect the initial change had
during an attack.
-Jon

On Wed, Dec 1, 2021 at 11:27 AM Jeff Davis via Evergreen-general <
evergreen-general@list.evergreen-ils.org> wrote:

> Our robots.txt file (https://catalogue.libraries.coop/robots.txt)
> throttles Googlebot and Bingbot to 60 seconds and disallows certain
> other crawlers entirely.  So even 10 seconds seems generous to me.
>
> Of course, robots.txt will only be respected by well-behaved crawlers;
> there's nothing preventing a bot from ignoring it (in which case, as
> Jason says, your best bet may be to block the offending IP).
>
> Is the "LONG_STRING" in your examples a legitimate search -- i.e, no
> unusual characters or obvious SQL injection attempts?  Does it contain
> complex nesting of search terms?
>
> Jeff
>
>
> On 2021-11-30 6:34 p.m., JonGeorg SageLibrary via Evergreen-general wrote:
> > Question. We've been getting hammered by search engine bots [?], but
> > they seem to all query our system at the same time. Enough that it's
> > crashing the app servers. We have a robots.txt file in place. I've
> > increased the crawling delay speed from 3 to 10 seconds, and have
> > explicitly disallowed the specific bots, but I've seen no change from
> > the worst offenders - Bingbot and UT-Dorkbot. We had over 4k hits from
> > Dorkbot alone from 2pm-5pm today, and over 5k from Bingbot in the same
> > timeframe. All a couple hours after I made the changes to the robots
> > file and restarted apache services. Which out of 100k entries in the
> > vhosts files in that time frame doesn't sound like a lot, but the rest
> > of the traffic looks normal. This issue has been happening
> > intermittently [last 3 are 11/30, 11/3, 7/20] for a while, and the only
> > thing that seems to work is to manually kill the services on the DB
> > servers and restart services on the application servers.
> >
> > The symptom is an immediate spike in the Database CPU load. I start
> > killing all queries older than 2 minutes, but it still usually
> > overwhelms the system causing the app servers to stop serving requests.
> > The stuck queries are almost always ones along the lines of:
> >
> > -- bib search: #CD_documentLength #CD_meanHarmonic #CD_uniqueWords
> > from_metarecord(*/BIB_RECORD#/*) core_limit(10)
> > badge_orgs(1,138,151) estimation_strategy(inclusion) skip_check(0)
> > check_limit(1000) sort(1) filter_group_entry(1) 1
> > site(*/LIBRARY_BRANCH/*) depth(2)
> >  +
> >   |   | WITH w AS (
> >  |   | WITH */STRING/*_keyword_xq AS (SELECT
> >  +
> >   |   |   (to_tsquery('english_nostop',
> > COALESCE(NULLIF( '(' ||
> >
> btrim(regexp_replace(split_date_range(search_normalize(replace(replace(uppercase(translate_isbn1013(E'1')),
>
> > */LONG_STRING/*))),E'(?:\\s+|:)','&','g'),'&|')  || ')', '()'), '')) ||
> > to_tsquery('simple', COALESCE(NULLIF( '(' ||
> >
> btrim(regexp_replace(split_date_range(search_normalize(replace(replace(uppercase(translate_isbn1013(E'1')),
>
> > */LONG_STRING/*))),E'(?:\\s+|:)','&','g'),'&|')  || ')', '()'), ''))) AS
> > tsq,+
> >   |   |   (to_tsquery('english_nostop',
> > COALESCE(NULLIF( '(' ||
> > btrim(regexp_replace(split_date_range(search_normalize
> >   00:02:17.319491 | */STRING/* |
> >
> > And the queries by DorkBot look like they could be starting the query
> > since it's using the basket function in the OPAC.
> >
> > "GET
> >
> /eg/opac/results?do_basket_action=Go=1_record_view=*/LONG_STRING/*=Search_highlight=1=metabib_basket_action=1=keyword%3Amat_format=1=112=1
>
> > HTTP/1.0" 500 16796 "-" "UT-Dorkbot/1.0"
> >
> > I've anonymized the output just to be cautious. Reports are run off the
> > backup database server, so it cannot be an auto generated report, and it
> > doesn't happen often enough for that either. At this point I'm tempted
> > to block the IP addresses. What strategies are you all using to deal
> > with crawlers, and does anyone have an idea what is causing this?
> > -Jon
> >
> > ___
> 

Re: [Evergreen-general] Question about search engine bots & DB CPU spikes

2021-11-30 Thread JonGeorg SageLibrary via Evergreen-general
Because we're behind a firewall, all the addresses display as 127.0.0.1. I
can talk to the people who administer the firewall though about blocking
IP's. Thanks
-Jon

On Tue, Nov 30, 2021 at 8:20 PM Jason Stephenson via Evergreen-general <
evergreen-general@list.evergreen-ils.org> wrote:

> JonGeorg,
>
> Check your Apache logs for the source IP addresses. If you can't find
> them, I can share the correct configuration for Apache with Nginx so
> that you will get the addresses logged.
>
> Once you know the IP address ranges, block them. If you have a firewall,
> I suggest you block them there. If not, you can block them in Nginx or
> in your load balancer configuration if you have one and it allows that.
>
> You may think you want your catalog to show up in search engines, but
> bad bots will lie about who they are. All you can do with misbehaving
> bots is to block them.
>
> HtH,
> Jason
>
> On 11/30/21 9:34 PM, JonGeorg SageLibrary via Evergreen-general wrote:
> > Question. We've been getting hammered by search engine bots [?], but
> > they seem to all query our system at the same time. Enough that it's
> > crashing the app servers. We have a robots.txt file in place. I've
> > increased the crawling delay speed from 3 to 10 seconds, and have
> > explicitly disallowed the specific bots, but I've seen no change from
> > the worst offenders - Bingbot and UT-Dorkbot. We had over 4k hits from
> > Dorkbot alone from 2pm-5pm today, and over 5k from Bingbot in the same
> > timeframe. All a couple hours after I made the changes to the robots
> > file and restarted apache services. Which out of 100k entries in the
> > vhosts files in that time frame doesn't sound like a lot, but the rest
> > of the traffic looks normal. This issue has been happening
> > intermittently [last 3 are 11/30, 11/3, 7/20] for a while, and the only
> > thing that seems to work is to manually kill the services on the DB
> > servers and restart services on the application servers.
> >
> > The symptom is an immediate spike in the Database CPU load. I start
> > killing all queries older than 2 minutes, but it still usually
> > overwhelms the system causing the app servers to stop serving requests.
> > The stuck queries are almost always ones along the lines of:
> >
> > -- bib search: #CD_documentLength #CD_meanHarmonic #CD_uniqueWords
> > from_metarecord(*/BIB_RECORD#/*) core_limit(10)
> > badge_orgs(1,138,151) estimation_strategy(inclusion) skip_check(0)
> > check_limit(1000) sort(1) filter_group_entry(1) 1
> > site(*/LIBRARY_BRANCH/*) depth(2)
> >  +
> >   |   | WITH w AS (
> >  |   | WITH */STRING/*_keyword_xq AS (SELECT
> >  +
> >   |   |   (to_tsquery('english_nostop',
> > COALESCE(NULLIF( '(' ||
> >
> btrim(regexp_replace(split_date_range(search_normalize(replace(replace(uppercase(translate_isbn1013(E'1')),
>
> > */LONG_STRING/*))),E'(?:\\s+|:)','&','g'),'&|')  || ')', '()'), '')) ||
> > to_tsquery('simple', COALESCE(NULLIF( '(' ||
> >
> btrim(regexp_replace(split_date_range(search_normalize(replace(replace(uppercase(translate_isbn1013(E'1')),
>
> > */LONG_STRING/*))),E'(?:\\s+|:)','&','g'),'&|')  || ')', '()'), ''))) AS
> > tsq,+
> >   |   |   (to_tsquery('english_nostop',
> > COALESCE(NULLIF( '(' ||
> > btrim(regexp_replace(split_date_range(search_normalize
> >   00:02:17.319491 | */STRING/* |
> >
> > And the queries by DorkBot look like they could be starting the query
> > since it's using the basket function in the OPAC.
> >
> > "GET
> >
> /eg/opac/results?do_basket_action=Go=1_record_view=*/LONG_STRING/*=Search_highlight=1=metabib_basket_action=1=keyword%3Amat_format=1=112=1
>
> > HTTP/1.0" 500 16796 "-" "UT-Dorkbot/1.0"
> >
> > I've anonymized the output just to be cautious. Reports are run off the
> > backup database server, so it cannot be an auto generated report, and it
> > doesn't happen often enough for that either. At this point I'm tempted
> > to block the IP addresses. What strategies are you all using to deal
> > with crawlers, and does anyone have an idea what is causing this?
> > -Jon
> >
> > ___
> > Evergreen-general mailing list
> > Evergreen-general@list.evergreen-ils.org
> > http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general
> >
> ___
> Evergreen-general mailing list
> Evergreen-general@list.evergreen-ils.org
> http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general
>
___
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general


[Evergreen-general] Question about search engine bots & DB CPU spikes

2021-11-30 Thread JonGeorg SageLibrary via Evergreen-general
Question. We've been getting hammered by search engine bots [?], but they
seem to all query our system at the same time. Enough that it's crashing
the app servers. We have a robots.txt file in place. I've increased the
crawling delay speed from 3 to 10 seconds, and have explicitly disallowed
the specific bots, but I've seen no change from the worst offenders -
Bingbot and UT-Dorkbot. We had over 4k hits from Dorkbot alone from 2pm-5pm
today, and over 5k from Bingbot in the same timeframe. All a couple hours
after I made the changes to the robots file and restarted apache services.
Which out of 100k entries in the vhosts files in that time frame doesn't
sound like a lot, but the rest of the traffic looks normal. This issue has
been happening intermittently [last 3 are 11/30, 11/3, 7/20] for a while,
and the only thing that seems to work is to manually kill the services on
the DB servers and restart services on the application servers.

The symptom is an immediate spike in the Database CPU load. I start killing
all queries older than 2 minutes, but it still usually overwhelms the
system causing the app servers to stop serving requests. The stuck queries
are almost always ones along the lines of:

-- bib search: #CD_documentLength #CD_meanHarmonic #CD_uniqueWords
from_metarecord(*BIB_RECORD#*) core_limit(10) badge_orgs(1,138,151)
estimation_strategy(inclusion) skip_check(0) check_limit(1000) sort(1)
filter_group_entry(1) 1 site(*LIBRARY_BRANCH*) depth(2)
+
 |   | WITH w AS (
|   | WITH *STRING*_keyword_xq AS (SELECT
  +
 |   |   (to_tsquery('english_nostop',
COALESCE(NULLIF( '(' ||
btrim(regexp_replace(split_date_range(search_normalize(replace(replace(uppercase(translate_isbn1013(E'1')),
*LONG_STRING*))),E'(?:\\s+|:)','&','g'),'&|')  || ')', '()'), '')) ||
to_tsquery('simple', COALESCE(NULLIF( '(' ||
btrim(regexp_replace(split_date_range(search_normalize(replace(replace(uppercase(translate_isbn1013(E'1')),
*LONG_STRING*))),E'(?:\\s+|:)','&','g'),'&|')  || ')', '()'), ''))) AS tsq,+
 |   |   (to_tsquery('english_nostop',
COALESCE(NULLIF( '(' ||
btrim(regexp_replace(split_date_range(search_normalize
 00:02:17.319491 | *STRING* |

And the queries by DorkBot look like they could be starting the query since
it's using the basket function in the OPAC.

"GET /eg/opac/results?do_basket_action=Go=1_record_view=
*LONG_STRING*=Search_highlight=1=metabib_basket_action=1=keyword%3Amat_format=1=112=1
HTTP/1.0" 500 16796 "-" "UT-Dorkbot/1.0"

I've anonymized the output just to be cautious. Reports are run off the
backup database server, so it cannot be an auto generated report, and it
doesn't happen often enough for that either. At this point I'm tempted to
block the IP addresses. What strategies are you all using to deal with
crawlers, and does anyone have an idea what is causing this?
-Jon
___
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general