<!DOCTYPE html>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <div class="moz-cite-prefix">Thank you for sharing the link to the
      Dark Visitors website - it looks very useful, indeed!</div>
    <div class="moz-cite-prefix"><br>
    </div>
    <div class="moz-cite-prefix">Linda</div>
    <div class="moz-cite-prefix"><br>
    </div>
    <div class="moz-cite-prefix">On 4/19/24 20:21, Lolis, John via
      Evergreen-general wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAJiSQLA6j_2nQ6cRGENrnQ-Qh7B+YHzSddtcC7fETdr0FuPxpA@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_default">There's been quite a conversation on
          the CODE4LIB listserv about this lately...<br>
        </div>
        <div class="gmail_default"><br>
        </div>
        <div class="gmail_default">Scott Prater <<a
            href="mailto:0000007dd2c67ad2-dmarc-request@lists.clir.org"
            moz-do-not-send="true" class="moz-txt-link-freetext">0000007dd2c67ad2-dmarc-request@lists.clir.org</a>><br>
          <br>
          Thu, 11 Apr, 10:43 (8 days ago)<br>
          <br>
          to CODE4LIB<br>
          We've also been seeing some traffic from inconsiderate AI
          bots.<br>
          <br>
          One of my colleagues came across this site, which tracks and
          documents AI bots:<br>
          <br>
          <a href="https://darkvisitors.com/" moz-do-not-send="true"
            class="moz-txt-link-freetext">https://darkvisitors.com/</a><br>
          <br>
          -- Scott<br>
          <br>
          --<br>
          Scott Prater<br>
          Digital Library Architect<br>
          UW Digital Collections Center<br>
          University of Wisconsin - Madison<br>
          <br>
          <br>
          <br>
          ________________________________________<br>
          From: Code for Libraries <<a
            href="mailto:CODE4LIB@LISTS.CLIR.ORG" moz-do-not-send="true"
            class="moz-txt-link-freetext">CODE4LIB@LISTS.CLIR.ORG</a>>
          on behalf of Lolis, John <<a
            href="mailto:jlolis@WHITEPLAINSNY.GOV"
            moz-do-not-send="true" class="moz-txt-link-freetext">jlolis@WHITEPLAINSNY.GOV</a>><br>
          Sent: Wednesday, April 10, 2024 12:15 PM<br>
          To: <a href="mailto:CODE4LIB@LISTS.CLIR.ORG"
            moz-do-not-send="true" class="moz-txt-link-freetext">CODE4LIB@LISTS.CLIR.ORG</a><br>
          Subject: Re: [CODE4LIB] blocking GPTBot?<br>
          <br>
          This *sounds* as if it should help:<br>
          <a
href="https://urldefense.com/v3/__https://searchengineland.com/google-extended-crawler-432636__;!!Mak6IKo!Pm6vbeyDLkzwxaEhcIBmaI0pK1d7U0GtguiIAgWmNzfNyOyR1m3n9iypyhqwZH3QxxIfNMIETf94S2_ioTtPPtfncyM$"
            moz-do-not-send="true" class="moz-txt-link-freetext">https://urldefense.com/v3/__https://searchengineland.com/google-extended-crawler-432636__;!!Mak6IKo!Pm6vbeyDLkzwxaEhcIBmaI0pK1d7U0GtguiIAgWmNzfNyOyR1m3n9iypyhqwZH3QxxIfNMIETf94S2_ioTtPPtfncyM$</a><br>
          <br>
          John Lolis<br>
          Coordinator of Computer Systems<br>
          <br>
          100 Martine Avenue<br>
          White Plains, NY  10601<br>
          tel: 1.914.422.1497<br>
          fax: 1.914.422.1452<br>
          <br>
          <a
href="https://urldefense.com/v3/__https://whiteplainslibrary.org/__;!!Mak6IKo!Pm6vbeyDLkzwxaEhcIBmaI0pK1d7U0GtguiIAgWmNzfNyOyR1m3n9iypyhqwZH3QxxIfNMIETf94S2_ioTtPwb7-RSk$"
            moz-do-not-send="true" class="moz-txt-link-freetext">https://urldefense.com/v3/__https://whiteplainslibrary.org/__;!!Mak6IKo!Pm6vbeyDLkzwxaEhcIBmaI0pK1d7U0GtguiIAgWmNzfNyOyR1m3n9iypyhqwZH3QxxIfNMIETf94S2_ioTtPwb7-RSk$</a><br>
          <br>
          *“I would rather have questions that can’t be answered than
          answers that<br>
          can’t be questioned.”*<br>
          — Richard Feynman<br>
          <<a
href="https://urldefense.com/v3/__https://click.fourhourmail.com/5qure95xkf7hvvo93wh2/7qh7h8h05vr4zrtz/aHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvUmljaGFyZF9GZXlubWFu__;!!Mak6IKo!Pm6vbeyDLkzwxaEhcIBmaI0pK1d7U0GtguiIAgWmNzfNyOyR1m3n9iypyhqwZH3QxxIfNMIETf94S2_ioTtP3X91XJ0$"
            moz-do-not-send="true" class="moz-txt-link-freetext">https://urldefense.com/v3/__https://click.fourhourmail.com/5qure95xkf7hvvo93wh2/7qh7h8h05vr4zrtz/aHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvUmljaGFyZF9GZXlubWFu__;!!Mak6IKo!Pm6vbeyDLkzwxaEhcIBmaI0pK1d7U0GtguiIAgWmNzfNyOyR1m3n9iypyhqwZH3QxxIfNMIETf94S2_ioTtP3X91XJ0$</a>
          >,<br>
          theoretical physicist and recipient of the Nobel Prize in
          Physics in 1965<br>
          <br>
          <br>
          On Mon, 8 Apr 2024 at 16:31, Jason Casden <<a
            href="mailto:casden@gmail.com" moz-do-not-send="true"
            class="moz-txt-link-freetext">casden@gmail.com</a>>
          wrote:<br>
          <br>
          > Thanks for bringing this up, Eben. We've been having a
          horrible time with<br>
          > these bots, including those from previously fairly
          well-behaved sources<br>
          > like Google. They've caused issues ranging from slow
          response times and<br>
          > high system load all the way up to outages for some older
          systems. So far,<br>
          > our systems folks have been playing whack-a-mole with a
          combination of IP<br>
          > range blocks and increasingly detailed robots.txt
          statements. A group is<br>
          > being convened to investigate more comprehensive options
          so I will be<br>
          > watching this thread closely.<br>
          ><br>
          > Jason<br>
          ><br>
          > On Mon, Apr 8, 2024 at 4:18 PM Eben English <<a
            href="mailto:eben.english@gmail.com" moz-do-not-send="true"
            class="moz-txt-link-freetext">eben.english@gmail.com</a>><br>
          > wrote:<br>
          ><br>
          > > Hi all,<br>
          > ><br>
          > > I'm wondering if other folks are seeing AI and/or
          ML-related crawlers<br>
          > like<br>
          > > GPTBot accessing your library's website, catalog,
          digital collections, or<br>
          > > other sites.<br>
          > ><br>
          > > If so, are you blocking or disallowing these
          crawlers? Has anyone come up<br>
          > > with any policies around this?<br>
          > ><br>
          > > We're debating whether to allow these types of bots
          to crawl our digital<br>
          > > collections, many of which contain large amounts of
          copyrighted or "no<br>
          > > derivatives"-licensed materials. On one hand, these
          materials are<br>
          > available<br>
          > > for public view, but on the other hand the type of
          use that GPTBot and<br>
          > the<br>
          > > like are after (integrating the content into their
          models) could be<br>
          > > characterized as creating a derivative work, which
          is expressly<br>
          > > discouraged.<br>
          > ><br>
          > > Thanks,<br>
          > ><br>
          > > Eben English (he/him/his)<br>
          > > Digital Repository Services Manager<br>
          > > Boston Public Library<br>
          > ><br>
          ></div>
        <div>
          <div dir="ltr" class="gmail_signature"
            data-smartmail="gmail_signature">
            <div dir="ltr">
              <div dir="ltr">
                <div dir="ltr">
                  <div dir="ltr">
                    <div dir="ltr">
                      <div dir="ltr">
                        <div dir="ltr">
                          <div dir="ltr">
                            <div dir="ltr">
                              <div dir="ltr">
                                <div>
                                  <div dir="ltr">
                                    <div dir="ltr">
                                      <div dir="ltr">
                                        <div dir="ltr">
                                          <div dir="ltr">
                                            <div dir="ltr">
                                              <div dir="ltr">
                                                <div><span><br>
                                                  </span></div>
                                                <div><span>John Lolis</span><br>
                                                </div>
                                                <div>Coordinator of
                                                  Computer Systems</div>
                                              </div>
                                            </div>
                                          </div>
                                        </div>
                                      </div>
                                    </div>
                                  </div>
                                </div>
                                <div>
                                  <div dir="ltr">
                                    <div dir="ltr">
                                      <div dir="ltr">
                                        <div dir="ltr">
                                          <div dir="ltr">
                                            <div dir="ltr">
                                              <div dir="ltr">
                                                <div><img
moz-do-not-send="true"><br>
                                                </div>
                                                <div><span>100 Martine
                                                    Avenue</span><br>
                                                </div>
                                                <div><span>White Plains,
                                                    NY  10601</span></div>
                                              </div>
                                            </div>
                                          </div>
                                        </div>
                                      </div>
                                    </div>
                                  </div>
                                </div>
                                <div>
                                  <div dir="ltr">
                                    <div dir="ltr">
                                      <div dir="ltr">
                                        <div dir="ltr">
                                          <div dir="ltr">
                                            <div dir="ltr">
                                              <div dir="ltr">
                                                <div>tel: 1.914.422.1497</div>
                                                <div>fax: 1.914.422.1452</div>
                                                <div><br>
                                                </div>
                                                <div><a
href="https://whiteplainslibrary.org/" target="_blank"
moz-do-not-send="true" class="moz-txt-link-freetext">https://whiteplainslibrary.org/</a></div>
                                                <div><br>
                                                </div>
                                                <div><span><i>“I would
                                                      rather have
                                                      questions that
                                                      can’t be answered
                                                      than answers that
                                                      can’t be
                                                      questioned.”</i><br>
                                                  </span><span>— </span><a
href="https://click.fourhourmail.com/5qure95xkf7hvvo93wh2/7qh7h8h05vr4zrtz/aHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvUmljaGFyZF9GZXlubWFu"
rel="noopener noreferrer" target="_blank" moz-do-not-send="true">Richard
                                                    Feynman</a><span>,
                                                    theoretical
                                                    physicist and
                                                    recipient of the
                                                    Nobel Prize in
                                                    Physics in 1965</span><br>
                                                </div>
                                                <span></span></div>
                                            </div>
                                          </div>
                                        </div>
                                      </div>
                                    </div>
                                  </div>
                                </div>
                              </div>
                            </div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </div>
                </div>
              </div>
            </div>
          </div>
        </div>
        <br>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Fri, 19 Apr 2024 at 07:05,
          Jane Sandberg via Evergreen-general <<a
            href="mailto:evergreen-general@list.evergreen-ils.org"
            moz-do-not-send="true" class="moz-txt-link-freetext">evergreen-general@list.evergreen-ils.org</a>>
          wrote:<br>
        </div>
        <blockquote class="gmail_quote">
          <div dir="ltr">Hi Linda,
            <div><br>
            </div>
            <div>It's not for Evergreen, but my colleague <a
href="https://github.com/pulibrary/princeton_ansible/commit/6f9009249a168442391d90e2b75028d40a8a9e91"
                target="_blank" moz-do-not-send="true">recently blocked
                claudebot using fail2ban on our load balancer</a>. 
              Essentially, fail2ban is configured to watch Nginx's
              access log, and if more than 10 claudebot requests appear
              within the past minute from a particular IP, it
              automatically blocks all requests from that IP for the
              next 24 hours.  I would think that something similar could
              work for Apache's access log.</div>
            <div><br>
            </div>
            <div>Good luck with the bots!</div>
            <div><br>
            </div>
            <div>  -Jane</div>
          </div>
          <br>
          <div class="gmail_quote">
            <div dir="ltr" class="gmail_attr">El vie, 19 abr 2024 a
              la(s) 3:42 a.m., Linda Jansová via Evergreen-general (<a
                href="mailto:evergreen-general@list.evergreen-ils.org"
                target="_blank" moz-do-not-send="true"
                class="moz-txt-link-freetext">evergreen-general@list.evergreen-ils.org</a>)
              escribió:<br>
            </div>
            <blockquote class="gmail_quote">Dear all,<br>
              <br>
              Have any of you encountered an extensive crawling by
              Bytespider and <br>
              Bytedance (see e.g., <br>
              <a
href="https://wordpress.org/support/topic/psa-bytedance-and-bytespider-bots-recommend-blocking/"
                rel="noreferrer" target="_blank" moz-do-not-send="true"
                class="moz-txt-link-freetext">https://wordpress.org/support/topic/psa-bytedance-and-bytespider-bots-recommend-blocking/</a>),
              <br>
              Claudebot or other AI bots?<br>
              <br>
              If so, do you have any secret recipe how to disable the
              crawler from <br>
              accessing the site?<br>
              <br>
              Thank you very much for sharing your experience!<br>
              <br>
              Linda<br>
              <br>
              _______________________________________________<br>
              Evergreen-general mailing list<br>
              <a href="mailto:Evergreen-general@list.evergreen-ils.org"
                target="_blank" moz-do-not-send="true"
                class="moz-txt-link-freetext">Evergreen-general@list.evergreen-ils.org</a><br>
              <a
href="http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general"
                rel="noreferrer" target="_blank" moz-do-not-send="true"
                class="moz-txt-link-freetext">http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general</a><br>
            </blockquote>
          </div>
          _______________________________________________<br>
          Evergreen-general mailing list<br>
          <a href="mailto:Evergreen-general@list.evergreen-ils.org"
            target="_blank" moz-do-not-send="true"
            class="moz-txt-link-freetext">Evergreen-general@list.evergreen-ils.org</a><br>
          <a
href="http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general"
            rel="noreferrer" target="_blank" moz-do-not-send="true"
            class="moz-txt-link-freetext">http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general</a><br>
        </blockquote>
      </div>
      <br>
      <fieldset class="moz-mime-attachment-header"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
Evergreen-general mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Evergreen-general@list.evergreen-ils.org">Evergreen-general@list.evergreen-ils.org</a>
<a class="moz-txt-link-freetext" href="http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general">http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general</a>
</pre>
    </blockquote>
    <p><br>
    </p>
  </body>
</html>