[Evergreen-dev] Problematic bot traffic

Blake Graham-Henderson blake at mobiusconsortium.org
Thu Feb 13 16:46:42 EST 2025


All,

I almost replied with the arstechnica article that Josh linked when the 
thread was started. But I decided not to put it out there until I had 
setup a test system to see if I could get that code working. A tarpit, I 
think, serves them right. And, of course, the whole issue is destined to 
receive the fate of spam and spam filters forever and ever.

It was a serendipitous timed article. It's existence at this moment in 
time signals to me that this isn't a "just us" problem. It's the entire 
planet.

-Blake-
Conducting Magic
Will consume any data format
MOBIUS

On 2/13/2025 3:10 PM, Josh Stompro via Evergreen-dev wrote:
> Jeff, thanks for bringing this up on the list.
>
> We are seeing a lot of requests like
>  "GET /eg/opac/mylist/delete?anchor=record_184821&record=184821" from 
> never seen before IPs, and they make 1-12 requests and then stop.
>
> And they seem like they usually have a random out of date chrome 
> version in the user agent string.
> Chrome/88.0.4324.192
> Chrome/86.0.4240.75
>
> I've been trying to slow down the bots by collecting logs and grabbing 
> all the obvious patterns and blocking netblocks for non US ranges. 
> ipinfo.io <http://ipinfo.io> offers a free country & ASN database 
> download that I've been using to look up the ranges and countries. 
> (https://ipinfo.io/products/free-ip-database) I would be happy to 
> share a link to our current blocklist that has 10K non US ranges.
>
> I've also been reporting the non US bot activity to 
> https://www.abuseipdb.com/ just to bring some visibility to these bad 
> bots.  I noticed initially that many of the IPs that we were getting 
> hit from didn't seem to be listed on any blocklists already, so I 
> figured some reporting might help.  I'm kind of curious if Evergreen 
> sites are getting hit from the same IPs, so an evergreen specific 
> blocklist would be useful.  If you look up your bot IPs on 
> abuseipdb.com <http://abuseipdb.com> you can see if I've already 
> reported any of them.
>
> I've also been making use of block lists from https://iplists.firehol.org/
> Such as
> https://iplists.firehol.org/files/cleantalk_30d.ipset
> https://iplists.firehol.org/files/botscout_7d.ipset
> https://iplists.firehol.org/files/firehol_abusers_1d.netset
>
> We are using HAProxy so I did some looking into the CrowdSec HAProxy 
> Bouncer (https://docs.crowdsec.net/u/bouncers/haproxy/) but I'm not 
> sure that would help since these IPs don't seem to be on blocklists.  
> But I may just not quite understand how CrowdSec is supposed to work.
>
> HAProxy Enterprise has a ReCaptcha module that I think would allow us 
> to feed any non-us connections that haven't connected before through a 
> recaptcha, but the price for HAProxy Enterprise is out of our budget. 
> https://www.haproxy.com/blog/announcing-haproxy-enterprise-3-0#new-captcha-and-saml-modules
>
> There is also a fairly up to date project for adding Captchas through 
> haproxy at
> https://github.com/ndbiaw/haproxy-protection, This looks promising as 
> a transparent method, requires new connections to perform a javascript 
> proof of work calculation before allowing access.  Could be a good 
> transparent way of handling it.
>
> We were taken out by ChatGTP bots back in December, which were a bit 
> easier to block the netblocks since they were not as spread out.  I 
> recently saw this article about how some people are fighting back 
> against bots that ignore robots.txt, 
> https://arstechnica.com/tech-policy/2025/01/ai-haters-build-tarpits-to-trap-and-trick-ai-scrapers-that-ignore-robots-txt/
>
> Josh
>
> On Mon, Jan 27, 2025 at 6:33 PM Jeff Davis via Evergreen-dev 
> <evergreen-dev at list.evergreen-ils.org> wrote:
>
>     Hi folks,
>
>     Our Evergreen environment has been experiencing a
>     higher-than-usual volume of unwanted bot traffic in recent months.
>     Much of this traffic looks like webcrawlers hitting
>     Evergreen-specific URLs from an enormous number of different IP
>     addresses. Judging from discussion in IRC last week, it sounds
>     like other EG admins have been seeing the same thing. Does anyone
>     have any recommendations for managing this traffic and mitigating
>     its impact?
>
>     Some solutions that have been suggested/implemented so far:
>     - Geoblocking entire countries.
>     - Using Cloudflare's proxy service. There's some trickiness in
>     getting this to work with Evergreen.
>     - Putting certain OPAC pages behind a captcha.
>     - Deploying publicly-available blocklists of "bad bot"
>     IPs/useragents/etc. (good but limited, and not EG-specific).
>     - Teaching EG to identify and deal with bot traffic itself (but
>     arguably this should happen before the traffic hits Evergreen).
>
>     My organization is currently evaluating CrowdSec as another
>     possible solution. Any opinions on any of these approaches?
>     -- 
>     Jeff Davis
>     BC Libraries Cooperative
>     _______________________________________________
>     Evergreen-dev mailing list
>     Evergreen-dev at list.evergreen-ils.org
>     http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-dev
>
>
> _______________________________________________
> Evergreen-dev mailing list
> Evergreen-dev at list.evergreen-ils.org
> http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.evergreen-ils.org/pipermail/evergreen-dev/attachments/20250213/aa828127/attachment-0001.htm>


More information about the Evergreen-dev mailing list