[Evergreen-general] Re: Bot issues

25 Mar 2025


      Thank you to everyone who responded. We're working with our vendor to see
what can be done. I appreciate the responses.
-Jon

On Wed, Mar 19, 2025 at 10:55 AM Kev Woolley <kev.woolley@bc.libraries.coop>
wrote:
...
Hi Jon,
We use CrowdSec: https://www.crowdsec.net/
It allows you to define your own scenarios that allow you to make
decisions on incoming traffic and automatically mitigate it via firewall or
other banning measures, throwing up a CAPTCHA, and more.
Note that CrowdSec doesn't work in any time slice above 48 hours -- all of
its mitigations are very short-lived. We are combining this with a
substantial long-term blocklist (implemented as an ipset block in Linux
iptables) that subsumes the functionality of both geo and provider blocks
for longer-term mitigations. This is, of course, a labour-heavy endeavour,
but we've tried several alternatives, and this is what's working best so
far.
We have scenarios defined to catch useragent traits and block useragents
that seem bad. After some initial learning ("oh, so this version of MS
Office says it's MSIE 7.0, so a library just blocked themselves -- oops!"
and similar situations), it was pretty easy to get most bot traffic caught
in that. As I get time (and more familiarity with writing the scenarios)
I'll be designing scenarios that look for specific behaviours (such as
grabbing the links on a page in order, too quickly) and improving our
defense that way.
CS offers reasonably good visualisation and reporting tools. This is
useful for both keeping track of who's doing what, but also seeing the
persistent threats and creating entries in the long-term blocklist for
those.
My observation, even very recently as I've been working on the long-term
blocklist and not updating it on our servers (working with ~10k rules takes
a while), is that there really doesn't seem to be a point where one can
take their eyes off the issue entirely and forget about it -- new traffic
comes out of the woodwork. With a substantial enough long-term blocklist
this can reduce the time spent to a reasonable amount, but there doesn't
seem to be an "okay, we're done here" point.
My gut feel is that 30-50k long-term blocklist rules is where we may end
up eventually (with some years of building them).
I'm happy to share what I've got in the LTB. It's been built over the last
several months, based on the attacks we've received.
Resources I've found helpful include:
https://www.qurium.org/ -- their digital forensics and investigations
pages have a lot of good info on the methods and actors for some types of
attacks -- we experienced this flavour, in particular:
https://www.qurium.org/weaponizing-proxy-and-vpn-providers/fineproxy-rayobyt...
Finding this site helped confirm a lot of information I'd found over the
previous couple of years, studying these things on my own.
https://www.radb.net/ -- you can query this for free, and it's a good way
to look up network information without having to bounce around between
ARIN, RIPE, APNIC, and other RIRs (Regional Internet Registries). You can
do advanced queries against it with a Whois client, as well:
whois -h whois.radb.net -- '-i origin AS714'
The above command will give a list of everything originating from one of
Apple's ASNs (Autonomous System Numbers; these are used to help manage
routing). For example:
whois -h whois.radb.net -- '-i origin AS55185'
Gives:
route:          209.87.62.0/24
origin:         AS55185
descr:          750 - 555 Seymour Street
                Vancouver BC V6B-3H6
                Canada
admin-c:        HOSTM458-ARIN
tech-c:         NOC33711-ARIN
mnt-by:         MNT-BC-Z
created:        2023-12-07T21:58:41Z
last-modified:  2023-12-07T21:58:41Z
source:         ARIN
rpki-ov-state:  valid
route6:         2607:f8f0:6a0::/48
origin:         AS55185
descr:          750 - 555 Seymour Street
                Vancouver BC V6B-3H6
                Canada
admin-c:        HOSTM458-ARIN
tech-c:         NOC33711-ARIN
mnt-by:         MNT-BC
created:        2023-12-07T22:00:06Z
last-modified:  2023-12-07T22:00:06Z
source:         ARIN
rpki-ov-state:  valid
With a bit of scripting, it's not difficult to pull out the route: and
route6: lines, run them through aggregate (a tool that removes duplication
and shadowing of lists of netblocks, giving you the shortest possible list
of netblocks that cover all of the provided addresses), and output them to
a file for validation and addition to whatever solution you're using.
It's a huge topic, and I've already babbled long enough. I'm happy to give
info or lend a hand, though. It's a hard problem.
Thank you,
Kev
--
Kev Woolley (they/them)
Gratefully acknowledging that I live and work in the unceded traditional
territories of the Səl̓ílwətaɬ (Tsleil-Waututh) and Sḵwx̱wú7mesh Úxwumixw.
________________________________________
From: JonGeorg SageLibrary via Evergreen-general <
evergreen-general@list.evergreen-ils.org>
Sent: 19 March 2025 08:52
To: Evergreen Discussion Group
Cc: JonGeorg SageLibrary
Subject: [Evergreen-general] Bot issues
We've been dealing with a lot of bots crawling our catalog, and
overwhelming our app servers.
Are any of you having the same issue, and if so what tools are you using
to remedy the situation?
We've already implemented geoblocking to limit traffic to the US and
Canada, after being overwhelmed by queries from overseas.
I've been looking at bad bot blocker as an option.
-Jon
This message originated from outside the M365 organisation. Please be
careful with links, and don't trust messages you don't recognise.