<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">Thank you for sharing the link to the
Dark Visitors website - it looks very useful, indeed!</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">Linda</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">On 4/19/24 20:21, Lolis, John via
Evergreen-general wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAJiSQLA6j_2nQ6cRGENrnQ-Qh7B+YHzSddtcC7fETdr0FuPxpA@mail.gmail.com">
<div dir="ltr">
<div class="gmail_default">There's been quite a conversation on
the CODE4LIB listserv about this lately...<br>
</div>
<div class="gmail_default"><br>
</div>
<div class="gmail_default">Scott Prater <<a
href="mailto:0000007dd2c67ad2-dmarc-request@lists.clir.org"
moz-do-not-send="true" class="moz-txt-link-freetext">0000007dd2c67ad2-dmarc-request@lists.clir.org</a>><br>
<br>
Thu, 11 Apr, 10:43 (8 days ago)<br>
<br>
to CODE4LIB<br>
We've also been seeing some traffic from inconsiderate AI
bots.<br>
<br>
One of my colleagues came across this site, which tracks and
documents AI bots:<br>
<br>
<a href="https://darkvisitors.com/" moz-do-not-send="true"
class="moz-txt-link-freetext">https://darkvisitors.com/</a><br>
<br>
-- Scott<br>
<br>
--<br>
Scott Prater<br>
Digital Library Architect<br>
UW Digital Collections Center<br>
University of Wisconsin - Madison<br>
<br>
<br>
<br>
________________________________________<br>
From: Code for Libraries <<a
href="mailto:CODE4LIB@LISTS.CLIR.ORG" moz-do-not-send="true"
class="moz-txt-link-freetext">CODE4LIB@LISTS.CLIR.ORG</a>>
on behalf of Lolis, John <<a
href="mailto:jlolis@WHITEPLAINSNY.GOV"
moz-do-not-send="true" class="moz-txt-link-freetext">jlolis@WHITEPLAINSNY.GOV</a>><br>
Sent: Wednesday, April 10, 2024 12:15 PM<br>
To: <a href="mailto:CODE4LIB@LISTS.CLIR.ORG"
moz-do-not-send="true" class="moz-txt-link-freetext">CODE4LIB@LISTS.CLIR.ORG</a><br>
Subject: Re: [CODE4LIB] blocking GPTBot?<br>
<br>
This *sounds* as if it should help:<br>
<a
href="https://urldefense.com/v3/__https://searchengineland.com/google-extended-crawler-432636__;!!Mak6IKo!Pm6vbeyDLkzwxaEhcIBmaI0pK1d7U0GtguiIAgWmNzfNyOyR1m3n9iypyhqwZH3QxxIfNMIETf94S2_ioTtPPtfncyM$"
moz-do-not-send="true" class="moz-txt-link-freetext">https://urldefense.com/v3/__https://searchengineland.com/google-extended-crawler-432636__;!!Mak6IKo!Pm6vbeyDLkzwxaEhcIBmaI0pK1d7U0GtguiIAgWmNzfNyOyR1m3n9iypyhqwZH3QxxIfNMIETf94S2_ioTtPPtfncyM$</a><br>
<br>
John Lolis<br>
Coordinator of Computer Systems<br>
<br>
100 Martine Avenue<br>
White Plains, NY 10601<br>
tel: 1.914.422.1497<br>
fax: 1.914.422.1452<br>
<br>
<a
href="https://urldefense.com/v3/__https://whiteplainslibrary.org/__;!!Mak6IKo!Pm6vbeyDLkzwxaEhcIBmaI0pK1d7U0GtguiIAgWmNzfNyOyR1m3n9iypyhqwZH3QxxIfNMIETf94S2_ioTtPwb7-RSk$"
moz-do-not-send="true" class="moz-txt-link-freetext">https://urldefense.com/v3/__https://whiteplainslibrary.org/__;!!Mak6IKo!Pm6vbeyDLkzwxaEhcIBmaI0pK1d7U0GtguiIAgWmNzfNyOyR1m3n9iypyhqwZH3QxxIfNMIETf94S2_ioTtPwb7-RSk$</a><br>
<br>
*“I would rather have questions that can’t be answered than
answers that<br>
can’t be questioned.”*<br>
— Richard Feynman<br>
<<a
href="https://urldefense.com/v3/__https://click.fourhourmail.com/5qure95xkf7hvvo93wh2/7qh7h8h05vr4zrtz/aHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvUmljaGFyZF9GZXlubWFu__;!!Mak6IKo!Pm6vbeyDLkzwxaEhcIBmaI0pK1d7U0GtguiIAgWmNzfNyOyR1m3n9iypyhqwZH3QxxIfNMIETf94S2_ioTtP3X91XJ0$"
moz-do-not-send="true" class="moz-txt-link-freetext">https://urldefense.com/v3/__https://click.fourhourmail.com/5qure95xkf7hvvo93wh2/7qh7h8h05vr4zrtz/aHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvUmljaGFyZF9GZXlubWFu__;!!Mak6IKo!Pm6vbeyDLkzwxaEhcIBmaI0pK1d7U0GtguiIAgWmNzfNyOyR1m3n9iypyhqwZH3QxxIfNMIETf94S2_ioTtP3X91XJ0$</a>
>,<br>
theoretical physicist and recipient of the Nobel Prize in
Physics in 1965<br>
<br>
<br>
On Mon, 8 Apr 2024 at 16:31, Jason Casden <<a
href="mailto:casden@gmail.com" moz-do-not-send="true"
class="moz-txt-link-freetext">casden@gmail.com</a>>
wrote:<br>
<br>
> Thanks for bringing this up, Eben. We've been having a
horrible time with<br>
> these bots, including those from previously fairly
well-behaved sources<br>
> like Google. They've caused issues ranging from slow
response times and<br>
> high system load all the way up to outages for some older
systems. So far,<br>
> our systems folks have been playing whack-a-mole with a
combination of IP<br>
> range blocks and increasingly detailed robots.txt
statements. A group is<br>
> being convened to investigate more comprehensive options
so I will be<br>
> watching this thread closely.<br>
><br>
> Jason<br>
><br>
> On Mon, Apr 8, 2024 at 4:18 PM Eben English <<a
href="mailto:eben.english@gmail.com" moz-do-not-send="true"
class="moz-txt-link-freetext">eben.english@gmail.com</a>><br>
> wrote:<br>
><br>
> > Hi all,<br>
> ><br>
> > I'm wondering if other folks are seeing AI and/or
ML-related crawlers<br>
> like<br>
> > GPTBot accessing your library's website, catalog,
digital collections, or<br>
> > other sites.<br>
> ><br>
> > If so, are you blocking or disallowing these
crawlers? Has anyone come up<br>
> > with any policies around this?<br>
> ><br>
> > We're debating whether to allow these types of bots
to crawl our digital<br>
> > collections, many of which contain large amounts of
copyrighted or "no<br>
> > derivatives"-licensed materials. On one hand, these
materials are<br>
> available<br>
> > for public view, but on the other hand the type of
use that GPTBot and<br>
> the<br>
> > like are after (integrating the content into their
models) could be<br>
> > characterized as creating a derivative work, which
is expressly<br>
> > discouraged.<br>
> ><br>
> > Thanks,<br>
> ><br>
> > Eben English (he/him/his)<br>
> > Digital Repository Services Manager<br>
> > Boston Public Library<br>
> ><br>
></div>
<div>
<div dir="ltr" class="gmail_signature"
data-smartmail="gmail_signature">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div><span><br>
</span></div>
<div><span>John Lolis</span><br>
</div>
<div>Coordinator of
Computer Systems</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div>
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div><img
moz-do-not-send="true"><br>
</div>
<div><span>100 Martine
Avenue</span><br>
</div>
<div><span>White Plains,
NY 10601</span></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div>
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div>tel: 1.914.422.1497</div>
<div>fax: 1.914.422.1452</div>
<div><br>
</div>
<div><a
href="https://whiteplainslibrary.org/" target="_blank"
moz-do-not-send="true" class="moz-txt-link-freetext">https://whiteplainslibrary.org/</a></div>
<div><br>
</div>
<div><span><i>“I would
rather have
questions that
can’t be answered
than answers that
can’t be
questioned.”</i><br>
</span><span>— </span><a
href="https://click.fourhourmail.com/5qure95xkf7hvvo93wh2/7qh7h8h05vr4zrtz/aHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvUmljaGFyZF9GZXlubWFu"
rel="noopener noreferrer" target="_blank" moz-do-not-send="true">Richard
Feynman</a><span>,
theoretical
physicist and
recipient of the
Nobel Prize in
Physics in 1965</span><br>
</div>
<span></span></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Fri, 19 Apr 2024 at 07:05,
Jane Sandberg via Evergreen-general <<a
href="mailto:evergreen-general@list.evergreen-ils.org"
moz-do-not-send="true" class="moz-txt-link-freetext">evergreen-general@list.evergreen-ils.org</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote">
<div dir="ltr">Hi Linda,
<div><br>
</div>
<div>It's not for Evergreen, but my colleague <a
href="https://github.com/pulibrary/princeton_ansible/commit/6f9009249a168442391d90e2b75028d40a8a9e91"
target="_blank" moz-do-not-send="true">recently blocked
claudebot using fail2ban on our load balancer</a>.
Essentially, fail2ban is configured to watch Nginx's
access log, and if more than 10 claudebot requests appear
within the past minute from a particular IP, it
automatically blocks all requests from that IP for the
next 24 hours. I would think that something similar could
work for Apache's access log.</div>
<div><br>
</div>
<div>Good luck with the bots!</div>
<div><br>
</div>
<div> -Jane</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">El vie, 19 abr 2024 a
la(s) 3:42 a.m., Linda Jansová via Evergreen-general (<a
href="mailto:evergreen-general@list.evergreen-ils.org"
target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">evergreen-general@list.evergreen-ils.org</a>)
escribió:<br>
</div>
<blockquote class="gmail_quote">Dear all,<br>
<br>
Have any of you encountered an extensive crawling by
Bytespider and <br>
Bytedance (see e.g., <br>
<a
href="https://wordpress.org/support/topic/psa-bytedance-and-bytespider-bots-recommend-blocking/"
rel="noreferrer" target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">https://wordpress.org/support/topic/psa-bytedance-and-bytespider-bots-recommend-blocking/</a>),
<br>
Claudebot or other AI bots?<br>
<br>
If so, do you have any secret recipe how to disable the
crawler from <br>
accessing the site?<br>
<br>
Thank you very much for sharing your experience!<br>
<br>
Linda<br>
<br>
_______________________________________________<br>
Evergreen-general mailing list<br>
<a href="mailto:Evergreen-general@list.evergreen-ils.org"
target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">Evergreen-general@list.evergreen-ils.org</a><br>
<a
href="http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general"
rel="noreferrer" target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general</a><br>
</blockquote>
</div>
_______________________________________________<br>
Evergreen-general mailing list<br>
<a href="mailto:Evergreen-general@list.evergreen-ils.org"
target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">Evergreen-general@list.evergreen-ils.org</a><br>
<a
href="http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general"
rel="noreferrer" target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general</a><br>
</blockquote>
</div>
<br>
<fieldset class="moz-mime-attachment-header"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
Evergreen-general mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Evergreen-general@list.evergreen-ils.org">Evergreen-general@list.evergreen-ils.org</a>
<a class="moz-txt-link-freetext" href="http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general">http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general</a>
</pre>
</blockquote>
<p><br>
</p>
</body>
</html>