[Evergreen-dev] Problematic bot traffic
Josh Stompro
stomproj at gsuite.larl.org
Thu Feb 13 16:10:38 EST 2025
Jeff, thanks for bringing this up on the list.
We are seeing a lot of requests like
"GET /eg/opac/mylist/delete?anchor=record_184821&record=184821" from never
seen before IPs, and they make 1-12 requests and then stop.
And they seem like they usually have a random out of date chrome version in
the user agent string.
Chrome/88.0.4324.192
Chrome/86.0.4240.75
I've been trying to slow down the bots by collecting logs and grabbing all
the obvious patterns and blocking netblocks for non US ranges. ipinfo.io
offers a free country & ASN database download that I've been using to look
up the ranges and countries. (https://ipinfo.io/products/free-ip-database)
I would be happy to share a link to our current blocklist that has 10K non
US ranges.
I've also been reporting the non US bot activity to
https://www.abuseipdb.com/ just to bring some visibility to these bad
bots. I noticed initially that many of the IPs that we were getting hit
from didn't seem to be listed on any blocklists already, so I figured some
reporting might help. I'm kind of curious if Evergreen sites are getting
hit from the same IPs, so an evergreen specific blocklist would be useful.
If you look up your bot IPs on abuseipdb.com you can see if I've already
reported any of them.
I've also been making use of block lists from https://iplists.firehol.org/
Such as
https://iplists.firehol.org/files/cleantalk_30d.ipset
https://iplists.firehol.org/files/botscout_7d.ipset
https://iplists.firehol.org/files/firehol_abusers_1d.netset
We are using HAProxy so I did some looking into the CrowdSec HAProxy
Bouncer (https://docs.crowdsec.net/u/bouncers/haproxy/) but I'm not sure
that would help since these IPs don't seem to be on blocklists. But I may
just not quite understand how CrowdSec is supposed to work.
HAProxy Enterprise has a ReCaptcha module that I think would allow us to
feed any non-us connections that haven't connected before through a
recaptcha, but the price for HAProxy Enterprise is out of our budget.
https://www.haproxy.com/blog/announcing-haproxy-enterprise-3-0#new-captcha-and-saml-modules
There is also a fairly up to date project for adding Captchas through
haproxy at
https://github.com/ndbiaw/haproxy-protection, This looks promising as a
transparent method, requires new connections to perform a javascript proof
of work calculation before allowing access. Could be a good transparent
way of handling it.
We were taken out by ChatGTP bots back in December, which were a bit easier
to block the netblocks since they were not as spread out. I recently saw
this article about how some people are fighting back against bots that
ignore robots.txt,
https://arstechnica.com/tech-policy/2025/01/ai-haters-build-tarpits-to-trap-and-trick-ai-scrapers-that-ignore-robots-txt/
Josh
On Mon, Jan 27, 2025 at 6:33 PM Jeff Davis via Evergreen-dev <
evergreen-dev at list.evergreen-ils.org> wrote:
> Hi folks,
>
> Our Evergreen environment has been experiencing a higher-than-usual volume
> of unwanted bot traffic in recent months. Much of this traffic looks like
> webcrawlers hitting Evergreen-specific URLs from an enormous number of
> different IP addresses. Judging from discussion in IRC last week, it sounds
> like other EG admins have been seeing the same thing. Does anyone have any
> recommendations for managing this traffic and mitigating its impact?
>
> Some solutions that have been suggested/implemented so far:
> - Geoblocking entire countries.
> - Using Cloudflare's proxy service. There's some trickiness in getting
> this to work with Evergreen.
> - Putting certain OPAC pages behind a captcha.
> - Deploying publicly-available blocklists of "bad bot" IPs/useragents/etc.
> (good but limited, and not EG-specific).
> - Teaching EG to identify and deal with bot traffic itself (but arguably
> this should happen before the traffic hits Evergreen).
>
> My organization is currently evaluating CrowdSec as another possible
> solution. Any opinions on any of these approaches?
> --
> Jeff Davis
> BC Libraries Cooperative
> _______________________________________________
> Evergreen-dev mailing list
> Evergreen-dev at list.evergreen-ils.org
> http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.evergreen-ils.org/pipermail/evergreen-dev/attachments/20250213/760ac5d0/attachment.htm>
More information about the Evergreen-dev
mailing list