<!DOCTYPE html>

<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    All,<br>

    <br>

    I almost replied with the arstechnica article that Josh linked when

    the thread was started. But I decided not to put it out there until

    I had setup a test system to see if I could get that code working. A

    tarpit, I think, serves them right. And, of course, the whole issue

    is destined to receive the fate of spam and spam filters forever and

    ever.<br>

    <br>

    It was a serendipitous timed article. It's existence at this moment

    in time signals to me that this isn't a "just us" problem. It's the

    entire planet.<br>

    <br>

    <pre class="moz-signature" cols="72">-Blake-

Conducting Magic

Will consume any data format

MOBIUS

</pre>

    <div class="moz-cite-prefix">On 2/13/2025 3:10 PM, Josh Stompro via

      Evergreen-dev wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAGOQQftvG72nky1Okk-_-Gee3tVSW6kQFRE=NA5E_9BMMmFoyw@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="ltr">

        <div dir="ltr">Jeff, thanks for bringing this up on the list.

          <div><br>

          </div>

          <div>We are seeing a lot of requests like</div>

          <div> "GET

            /eg/opac/mylist/delete?anchor=record_184821&record=184821"

            from never seen before IPs, and they make 1-12 requests and

            then stop.</div>

          <div><br>

          </div>

          <div>And they seem like they usually have a random out of date

            chrome version in the user agent string.  </div>

          <div><span style="color:rgb(0,0,0)">Chrome/88.0.4324.192</span></div>

          <div><span style="color:rgb(0,0,0)">Chrome/86.0.4240.75</span><font

              color="#000000"><br>

            </font>

            <div><br>

            </div>

            <div>I've been trying to slow down the bots by

              collecting logs and grabbing all the obvious patterns and

              blocking netblocks for non US ranges.  <a

                href="http://ipinfo.io" moz-do-not-send="true">ipinfo.io</a>

              offers a free country & ASN database download that

              I've been using to look up the ranges and countries. (<a

                href="https://ipinfo.io/products/free-ip-database"

                moz-do-not-send="true" class="moz-txt-link-freetext">https://ipinfo.io/products/free-ip-database</a>) 

              I would be happy to share a link to our current blocklist

              that has 10K non US ranges.  </div>

            <div><br>

            </div>

            <div>I've also been reporting the non US bot activity to <a

                href="https://www.abuseipdb.com/" moz-do-not-send="true"

                class="moz-txt-link-freetext">https://www.abuseipdb.com/</a>

              just to bring some visibility to these bad bots.  I

              noticed initially that many of the IPs that we were

              getting hit from didn't seem to be listed on any

              blocklists already, so I figured some reporting might

              help.  I'm kind of curious if Evergreen sites are getting

              hit from the same IPs, so an evergreen specific blocklist

              would be useful.  If you look up your bot IPs on <a

                href="http://abuseipdb.com" moz-do-not-send="true">abuseipdb.com</a>

              you can see if I've already reported any of them.</div>

            <div><br>

            </div>

            <div>I've also been making use of block lists from <a

                href="https://iplists.firehol.org/"

                moz-do-not-send="true" class="moz-txt-link-freetext">https://iplists.firehol.org/</a></div>

            <div>Such as </div>

            <div><a

href="https://iplists.firehol.org/files/cleantalk_30d.ipset"

                moz-do-not-send="true" class="moz-txt-link-freetext">https://iplists.firehol.org/files/cleantalk_30d.ipset</a><br>

              <div><span

style="color:rgb(51,51,51);font-family:Roboto,sans-serif;font-size:14px"><a

href="https://iplists.firehol.org/files/botscout_7d.ipset"

                    moz-do-not-send="true" class="moz-txt-link-freetext">https://iplists.firehol.org/files/botscout_7d.ipset</a></span></div>

              <div><a

href="https://iplists.firehol.org/files/firehol_abusers_1d.netset"

                  moz-do-not-send="true" class="moz-txt-link-freetext">https://iplists.firehol.org/files/firehol_abusers_1d.netset</a></div>

              <div><br>

              </div>

            </div>

            <div>We are using HAProxy so I did some looking into the

              CrowdSec HAProxy Bouncer (<a

                href="https://docs.crowdsec.net/u/bouncers/haproxy/"

                moz-do-not-send="true" class="moz-txt-link-freetext">https://docs.crowdsec.net/u/bouncers/haproxy/</a>)

              but I'm not sure that would help since these IPs don't

              seem to be on blocklists.  But I may just not quite

              understand how CrowdSec is supposed to work.</div>

            <div><br>

            </div>

            <div>HAProxy Enterprise has a ReCaptcha module that I think

              would allow us to feed any non-us connections that haven't

              connected before through a recaptcha, but the price for

              HAProxy Enterprise is out of our budget.  <a

href="https://www.haproxy.com/blog/announcing-haproxy-enterprise-3-0#new-captcha-and-saml-modules"

                moz-do-not-send="true" class="moz-txt-link-freetext">https://www.haproxy.com/blog/announcing-haproxy-enterprise-3-0#new-captcha-and-saml-modules</a></div>

            <div><br>

            </div>

            <div>There is also a fairly up to date project for adding

              Captchas through haproxy at </div>

            <div><a href="https://github.com/ndbiaw/haproxy-protection"

                moz-do-not-send="true" class="moz-txt-link-freetext">https://github.com/ndbiaw/haproxy-protection</a>,

              This looks promising as a transparent method, requires new

              connections to perform a javascript proof of work

              calculation before allowing access.  Could be a good

              transparent way of handling it.  </div>

            <div><br>

            </div>

            <div>We were taken out by ChatGTP bots back in December,

              which were a bit easier to block the netblocks since they

              were not as spread out.  I recently saw this article about

              how some people are fighting back against bots that ignore

              robots.txt, <a

href="https://arstechnica.com/tech-policy/2025/01/ai-haters-build-tarpits-to-trap-and-trick-ai-scrapers-that-ignore-robots-txt/"

                moz-do-not-send="true" class="moz-txt-link-freetext">https://arstechnica.com/tech-policy/2025/01/ai-haters-build-tarpits-to-trap-and-trick-ai-scrapers-that-ignore-robots-txt/</a></div>

          </div>

          <div><br>

          </div>

          <div>Josh</div>

        </div>

        <br>

        <div class="gmail_quote gmail_quote_container">

          <div dir="ltr" class="gmail_attr">On Mon, Jan 27, 2025 at

            6:33 PM Jeff Davis via Evergreen-dev <<a

              href="mailto:evergreen-dev@list.evergreen-ils.org"

              moz-do-not-send="true" class="moz-txt-link-freetext">evergreen-dev@list.evergreen-ils.org</a>>

            wrote:<br>

          </div>

          <blockquote class="gmail_quote"

style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi

            folks,<br>

            <br>

            Our Evergreen environment has been experiencing a

            higher-than-usual volume of unwanted bot traffic in recent

            months. Much of this traffic looks like webcrawlers hitting

            Evergreen-specific URLs from an enormous number of different

            IP addresses. Judging from discussion in IRC last week, it

            sounds like other EG admins have been seeing the same thing.

            Does anyone have any recommendations for managing this

            traffic and mitigating its impact?<br>

            <br>

            Some solutions that have been suggested/implemented so far:<br>

            - Geoblocking entire countries.<br>

            - Using Cloudflare's proxy service. There's some trickiness

            in getting this to work with Evergreen.<br>

            - Putting certain OPAC pages behind a captcha.<br>

            - Deploying publicly-available blocklists of "bad bot"

            IPs/useragents/etc. (good but limited, and not EG-specific).<br>

            - Teaching EG to identify and deal with bot traffic itself

            (but arguably this should happen before the traffic hits

            Evergreen).<br>

            <br>

            My organization is currently evaluating CrowdSec as another

            possible solution. Any opinions on any of these approaches?<br>

            -- <br>

            Jeff Davis<br>

            BC Libraries Cooperative<br>

            _______________________________________________<br>

            Evergreen-dev mailing list<br>

            <a href="mailto:Evergreen-dev@list.evergreen-ils.org"

              target="_blank" moz-do-not-send="true"

              class="moz-txt-link-freetext">Evergreen-dev@list.evergreen-ils.org</a><br>

            <a

href="http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-dev"

              rel="noreferrer" target="_blank" moz-do-not-send="true"

              class="moz-txt-link-freetext">http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-dev</a><br>

          </blockquote>

        </div>

      </div>

      <br>

      <fieldset class="moz-mime-attachment-header"></fieldset>

      <pre wrap="" class="moz-quote-pre">_______________________________________________

Evergreen-dev mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Evergreen-dev@list.evergreen-ils.org">Evergreen-dev@list.evergreen-ils.org</a>

<a class="moz-txt-link-freetext" href="http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-dev">http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-dev</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>