Re: [Evergreen-general] Dealing with significant traffic increase caused by AI bots

2024-04-19 Thread Linda Jansová via Evergreen-general
Thank you for sharing the link to the Dark Visitors website - it looks 
very useful, indeed!


Linda

On 4/19/24 20:21, Lolis, John via Evergreen-general wrote:
There's been quite a conversation on the CODE4LIB listserv about this 
lately...


Scott Prater <007dd2c67ad2-dmarc-requ...@lists.clir.org>

Thu, 11 Apr, 10:43 (8 days ago)

to CODE4LIB
We've also been seeing some traffic from inconsiderate AI bots.

One of my colleagues came across this site, which tracks and documents 
AI bots:


https://darkvisitors.com/

-- Scott

--
Scott Prater
Digital Library Architect
UW Digital Collections Center
University of Wisconsin - Madison




From: Code for Libraries  on behalf of Lolis, 
John 

Sent: Wednesday, April 10, 2024 12:15 PM
To: code4...@lists.clir.org
Subject: Re: [CODE4LIB] blocking GPTBot?

This *sounds* as if it should help:
https://urldefense.com/v3/__https://searchengineland.com/google-extended-crawler-432636__;!!Mak6IKo!Pm6vbeyDLkzwxaEhcIBmaI0pK1d7U0GtguiIAgWmNzfNyOyR1m3n9iypyhqwZH3QxxIfNMIETf94S2_ioTtPPtfncyM$

John Lolis
Coordinator of Computer Systems

100 Martine Avenue
White Plains, NY  10601
tel: 1.914.422.1497
fax: 1.914.422.1452

https://urldefense.com/v3/__https://whiteplainslibrary.org/__;!!Mak6IKo!Pm6vbeyDLkzwxaEhcIBmaI0pK1d7U0GtguiIAgWmNzfNyOyR1m3n9iypyhqwZH3QxxIfNMIETf94S2_ioTtPwb7-RSk$

*“I would rather have questions that can’t be answered than answers that
can’t be questioned.”*
— Richard Feynman
,

theoretical physicist and recipient of the Nobel Prize in Physics in 1965


On Mon, 8 Apr 2024 at 16:31, Jason Casden  wrote:

> Thanks for bringing this up, Eben. We've been having a horrible time 
with

> these bots, including those from previously fairly well-behaved sources
> like Google. They've caused issues ranging from slow response times and
> high system load all the way up to outages for some older systems. 
So far,
> our systems folks have been playing whack-a-mole with a combination 
of IP

> range blocks and increasingly detailed robots.txt statements. A group is
> being convened to investigate more comprehensive options so I will be
> watching this thread closely.
>
> Jason
>
> On Mon, Apr 8, 2024 at 4:18 PM Eben English 
> wrote:
>
> > Hi all,
> >
> > I'm wondering if other folks are seeing AI and/or ML-related crawlers
> like
> > GPTBot accessing your library's website, catalog, digital 
collections, or

> > other sites.
> >
> > If so, are you blocking or disallowing these crawlers? Has anyone 
come up

> > with any policies around this?
> >
> > We're debating whether to allow these types of bots to crawl our 
digital

> > collections, many of which contain large amounts of copyrighted or "no
> > derivatives"-licensed materials. On one hand, these materials are
> available
> > for public view, but on the other hand the type of use that GPTBot and
> the
> > like are after (integrating the content into their models) could be
> > characterized as creating a derivative work, which is expressly
> > discouraged.
> >
> > Thanks,
> >
> > Eben English (he/him/his)
> > Digital Repository Services Manager
> > Boston Public Library
> >
>

John Lolis
Coordinator of Computer Systems

100 Martine Avenue
White Plains, NY  10601
tel: 1.914.422.1497
fax: 1.914.422.1452

https://whiteplainslibrary.org/

/“I would rather have questions that can’t be answered than answers 
that can’t be questioned.”/
— Richard Feynman 
, 
theoretical physicist and recipient of the Nobel Prize in Physics in 1965



On Fri, 19 Apr 2024 at 07:05, Jane Sandberg via Evergreen-general 
 wrote:


Hi Linda,

It's not for Evergreen, but my colleague recently blocked
claudebot using fail2ban on our load balancer

.
Essentially, fail2ban is configured to watch Nginx's access log,
and if more than 10 claudebot requests appear within the past
minute from a particular IP, it automatically blocks all requests
from that IP for the next 24 hours.  I would think that something
similar could work for Apache's access log.

Good luck with the bots!

  -Jane

El vie, 19 abr 2024 a la(s) 3:42 a.m., Linda Jansová via
Evergreen-general (evergreen-general@list.evergreen-ils.org) escribió:

Dear all,

Have any of you encountered an extensive crawling by
Bytespider and
Bytedance (see e.g.,

https://wordpress.org/support/topic/psa-bytedance-and-bytespider-bots-recommend-blocking/),

Claudebot or other AI 

Re: [Evergreen-general] Dealing with significant traffic increase caused by AI bots

2024-04-19 Thread Lolis, John via Evergreen-general
There's been quite a conversation on the CODE4LIB listserv about this
lately...

Scott Prater <007dd2c67ad2-dmarc-requ...@lists.clir.org>

Thu, 11 Apr, 10:43 (8 days ago)

to CODE4LIB
We've also been seeing some traffic from inconsiderate AI bots.

One of my colleagues came across this site, which tracks and documents AI
bots:

https://darkvisitors.com/

-- Scott

--
Scott Prater
Digital Library Architect
UW Digital Collections Center
University of Wisconsin - Madison




From: Code for Libraries  on behalf of Lolis, John

Sent: Wednesday, April 10, 2024 12:15 PM
To: code4...@lists.clir.org
Subject: Re: [CODE4LIB] blocking GPTBot?

This *sounds* as if it should help:
https://urldefense.com/v3/__https://searchengineland.com/google-extended-crawler-432636__;!!Mak6IKo!Pm6vbeyDLkzwxaEhcIBmaI0pK1d7U0GtguiIAgWmNzfNyOyR1m3n9iypyhqwZH3QxxIfNMIETf94S2_ioTtPPtfncyM$

John Lolis
Coordinator of Computer Systems

100 Martine Avenue
White Plains, NY  10601
tel: 1.914.422.1497
fax: 1.914.422.1452

https://urldefense.com/v3/__https://whiteplainslibrary.org/__;!!Mak6IKo!Pm6vbeyDLkzwxaEhcIBmaI0pK1d7U0GtguiIAgWmNzfNyOyR1m3n9iypyhqwZH3QxxIfNMIETf94S2_ioTtPwb7-RSk$

*“I would rather have questions that can’t be answered than answers that
can’t be questioned.”*
— Richard Feynman
<
https://urldefense.com/v3/__https://click.fourhourmail.com/5qure95xkf7hvvo93wh2/7qh7h8h05vr4zrtz/aHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvUmljaGFyZF9GZXlubWFu__;!!Mak6IKo!Pm6vbeyDLkzwxaEhcIBmaI0pK1d7U0GtguiIAgWmNzfNyOyR1m3n9iypyhqwZH3QxxIfNMIETf94S2_ioTtP3X91XJ0$
>,
theoretical physicist and recipient of the Nobel Prize in Physics in 1965


On Mon, 8 Apr 2024 at 16:31, Jason Casden  wrote:

> Thanks for bringing this up, Eben. We've been having a horrible time with
> these bots, including those from previously fairly well-behaved sources
> like Google. They've caused issues ranging from slow response times and
> high system load all the way up to outages for some older systems. So far,
> our systems folks have been playing whack-a-mole with a combination of IP
> range blocks and increasingly detailed robots.txt statements. A group is
> being convened to investigate more comprehensive options so I will be
> watching this thread closely.
>
> Jason
>
> On Mon, Apr 8, 2024 at 4:18 PM Eben English 
> wrote:
>
> > Hi all,
> >
> > I'm wondering if other folks are seeing AI and/or ML-related crawlers
> like
> > GPTBot accessing your library's website, catalog, digital collections,
or
> > other sites.
> >
> > If so, are you blocking or disallowing these crawlers? Has anyone come
up
> > with any policies around this?
> >
> > We're debating whether to allow these types of bots to crawl our digital
> > collections, many of which contain large amounts of copyrighted or "no
> > derivatives"-licensed materials. On one hand, these materials are
> available
> > for public view, but on the other hand the type of use that GPTBot and
> the
> > like are after (integrating the content into their models) could be
> > characterized as creating a derivative work, which is expressly
> > discouraged.
> >
> > Thanks,
> >
> > Eben English (he/him/his)
> > Digital Repository Services Manager
> > Boston Public Library
> >
>

John Lolis
Coordinator of Computer Systems

100 Martine Avenue
White Plains, NY  10601
tel: 1.914.422.1497
fax: 1.914.422.1452

https://whiteplainslibrary.org/

*“I would rather have questions that can’t be answered than answers that
can’t be questioned.”*
— Richard Feynman
,
theoretical physicist and recipient of the Nobel Prize in Physics in 1965


On Fri, 19 Apr 2024 at 07:05, Jane Sandberg via Evergreen-general <
evergreen-general@list.evergreen-ils.org> wrote:

> Hi Linda,
>
> It's not for Evergreen, but my colleague recently blocked claudebot using
> fail2ban on our load balancer
> .
> Essentially, fail2ban is configured to watch Nginx's access log, and if
> more than 10 claudebot requests appear within the past minute from a
> particular IP, it automatically blocks all requests from that IP for the
> next 24 hours.  I would think that something similar could work for
> Apache's access log.
>
> Good luck with the bots!
>
>   -Jane
>
> El vie, 19 abr 2024 a la(s) 3:42 a.m., Linda Jansová via Evergreen-general
> (evergreen-general@list.evergreen-ils.org) escribió:
>
>> Dear all,
>>
>> Have any of you encountered an extensive crawling by Bytespider and
>> Bytedance (see e.g.,
>>
>> https://wordpress.org/support/topic/psa-bytedance-and-bytespider-bots-recommend-blocking/),
>>
>> Claudebot or other AI bots?
>>
>> If so, do you have any secret recipe how to disable the crawler from
>> accessing the site?
>>
>> Thank you very much for sharing your experience!
>>
>> Linda
>>
>> 

[Evergreen-general] 2024 Evergreen Project Board Election Results and Congratulations

2024-04-19 Thread Frasur, Ruth via Evergreen-general
Thank you to everyone who served as candidates for this year's Evergreen 
Project Board election.  As a note for all newcomers, the Evergreen Project 
Board election is an annual event with generally three seats available to fill. 
 This year saw four seats open.  Even so, we had an abundance of willing and 
engaged candidates and the only problem was choosing amongst such a strong 
field.  Thankfully, we also saw a highly engaged electorate with a significant 
number of community members registering to vote and then actually "showing up." 
 Thanks to all of you who participated in this aspect of the election as well.  
In the end, four individuals were elected.

3 year term
Katie Greenleaf Martin, PAILS/SparkPA
Susan Morrison, GA PINES
Lindsay Stratton*, Westchester Library System

1 year term
Andrea Buntz Neiman*, Equinox Open Library Initiative

*decided by coin toss to break tie agreed to by both parties

Once again, thank you to all who participated in the election and 
congratulations to those elected.


Ruth Frasur Davis (she/they)

President

Evergreen Project Board

Coordinator

Evergreen Indiana Library Consortium

Evergreen Community Development Initiative

Indiana State Library

140 N. Senate Ave.

Indianapolis, IN 46204

(317) 232-3691


___
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general


Re: [Evergreen-general] Dealing with significant traffic increase caused by AI bots

2024-04-19 Thread Linda Jansová via Evergreen-general

Thank you very much, Jane!

We will certainly give fail2ban a try, though - as we use Apache - some 
implementation details will probably be a bit different :-).


Linda

On 4/19/24 13:05, Jane Sandberg wrote:

Hi Linda,

It's not for Evergreen, but my colleague recently blocked claudebot 
using fail2ban on our load balancer 
.  
Essentially, fail2ban is configured to watch Nginx's access log, and 
if more than 10 claudebot requests appear within the past minute from 
a particular IP, it automatically blocks all requests from that IP for 
the next 24 hours.  I would think that something similar could work 
for Apache's access log.


Good luck with the bots!

  -Jane

El vie, 19 abr 2024 a la(s) 3:42 a.m., Linda Jansová via 
Evergreen-general (evergreen-general@list.evergreen-ils.org) escribió:


Dear all,

Have any of you encountered an extensive crawling by Bytespider and
Bytedance (see e.g.,

https://wordpress.org/support/topic/psa-bytedance-and-bytespider-bots-recommend-blocking/),

Claudebot or other AI bots?

If so, do you have any secret recipe how to disable the crawler from
accessing the site?

Thank you very much for sharing your experience!

Linda

___
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general

___
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general


Re: [Evergreen-general] Dealing with significant traffic increase caused by AI bots

2024-04-19 Thread Jane Sandberg via Evergreen-general
Hi Linda,

It's not for Evergreen, but my colleague recently blocked claudebot using
fail2ban on our load balancer
.
Essentially, fail2ban is configured to watch Nginx's access log, and if
more than 10 claudebot requests appear within the past minute from a
particular IP, it automatically blocks all requests from that IP for the
next 24 hours.  I would think that something similar could work for
Apache's access log.

Good luck with the bots!

  -Jane

El vie, 19 abr 2024 a la(s) 3:42 a.m., Linda Jansová via Evergreen-general (
evergreen-general@list.evergreen-ils.org) escribió:

> Dear all,
>
> Have any of you encountered an extensive crawling by Bytespider and
> Bytedance (see e.g.,
>
> https://wordpress.org/support/topic/psa-bytedance-and-bytespider-bots-recommend-blocking/),
>
> Claudebot or other AI bots?
>
> If so, do you have any secret recipe how to disable the crawler from
> accessing the site?
>
> Thank you very much for sharing your experience!
>
> Linda
>
> ___
> Evergreen-general mailing list
> Evergreen-general@list.evergreen-ils.org
> http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general
>
___
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general


[Evergreen-general] Dealing with significant traffic increase caused by AI bots

2024-04-19 Thread Linda Jansová via Evergreen-general

Dear all,

Have any of you encountered an extensive crawling by Bytespider and 
Bytedance (see e.g., 
https://wordpress.org/support/topic/psa-bytedance-and-bytespider-bots-recommend-blocking/), 
Claudebot or other AI bots?


If so, do you have any secret recipe how to disable the crawler from 
accessing the site?


Thank you very much for sharing your experience!

Linda

___
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general