
Thank you to everyone who responded. We're working with our vendor to see what can be done. I appreciate the responses. -Jon On Wed, Mar 19, 2025 at 10:55 AM Kev Woolley <kev.woolley@bc.libraries.coop> wrote:
Hi Jon,
We use CrowdSec: https://www.crowdsec.net/
It allows you to define your own scenarios that allow you to make decisions on incoming traffic and automatically mitigate it via firewall or other banning measures, throwing up a CAPTCHA, and more.
Note that CrowdSec doesn't work in any time slice above 48 hours -- all of its mitigations are very short-lived. We are combining this with a substantial long-term blocklist (implemented as an ipset block in Linux iptables) that subsumes the functionality of both geo and provider blocks for longer-term mitigations. This is, of course, a labour-heavy endeavour, but we've tried several alternatives, and this is what's working best so far.
We have scenarios defined to catch useragent traits and block useragents that seem bad. After some initial learning ("oh, so this version of MS Office says it's MSIE 7.0, so a library just blocked themselves -- oops!" and similar situations), it was pretty easy to get most bot traffic caught in that. As I get time (and more familiarity with writing the scenarios) I'll be designing scenarios that look for specific behaviours (such as grabbing the links on a page in order, too quickly) and improving our defense that way.
CS offers reasonably good visualisation and reporting tools. This is useful for both keeping track of who's doing what, but also seeing the persistent threats and creating entries in the long-term blocklist for those.
My observation, even very recently as I've been working on the long-term blocklist and not updating it on our servers (working with ~10k rules takes a while), is that there really doesn't seem to be a point where one can take their eyes off the issue entirely and forget about it -- new traffic comes out of the woodwork. With a substantial enough long-term blocklist this can reduce the time spent to a reasonable amount, but there doesn't seem to be an "okay, we're done here" point.
My gut feel is that 30-50k long-term blocklist rules is where we may end up eventually (with some years of building them).
I'm happy to share what I've got in the LTB. It's been built over the last several months, based on the attacks we've received.
Resources I've found helpful include:
https://www.qurium.org/ -- their digital forensics and investigations pages have a lot of good info on the methods and actors for some types of attacks -- we experienced this flavour, in particular:
https://www.qurium.org/weaponizing-proxy-and-vpn-providers/fineproxy-rayobyt...
Finding this site helped confirm a lot of information I'd found over the previous couple of years, studying these things on my own.
https://www.radb.net/ -- you can query this for free, and it's a good way to look up network information without having to bounce around between ARIN, RIPE, APNIC, and other RIRs (Regional Internet Registries). You can do advanced queries against it with a Whois client, as well:
whois -h whois.radb.net -- '-i origin AS714'
The above command will give a list of everything originating from one of Apple's ASNs (Autonomous System Numbers; these are used to help manage routing). For example:
whois -h whois.radb.net -- '-i origin AS55185'
Gives:
route: 209.87.62.0/24 origin: AS55185 descr: 750 - 555 Seymour Street Vancouver BC V6B-3H6 Canada admin-c: HOSTM458-ARIN tech-c: NOC33711-ARIN mnt-by: MNT-BC-Z created: 2023-12-07T21:58:41Z last-modified: 2023-12-07T21:58:41Z source: ARIN rpki-ov-state: valid
route6: 2607:f8f0:6a0::/48 origin: AS55185 descr: 750 - 555 Seymour Street Vancouver BC V6B-3H6 Canada admin-c: HOSTM458-ARIN tech-c: NOC33711-ARIN mnt-by: MNT-BC created: 2023-12-07T22:00:06Z last-modified: 2023-12-07T22:00:06Z source: ARIN rpki-ov-state: valid
With a bit of scripting, it's not difficult to pull out the route: and route6: lines, run them through aggregate (a tool that removes duplication and shadowing of lists of netblocks, giving you the shortest possible list of netblocks that cover all of the provided addresses), and output them to a file for validation and addition to whatever solution you're using.
It's a huge topic, and I've already babbled long enough. I'm happy to give info or lend a hand, though. It's a hard problem.
Thank you,
Kev
-- Kev Woolley (they/them)
Gratefully acknowledging that I live and work in the unceded traditional territories of the Səl̓ílwətaɬ (Tsleil-Waututh) and Sḵwx̱wú7mesh Úxwumixw.
________________________________________ From: JonGeorg SageLibrary via Evergreen-general < evergreen-general@list.evergreen-ils.org> Sent: 19 March 2025 08:52 To: Evergreen Discussion Group Cc: JonGeorg SageLibrary Subject: [Evergreen-general] Bot issues
We've been dealing with a lot of bots crawling our catalog, and overwhelming our app servers.
Are any of you having the same issue, and if so what tools are you using to remedy the situation?
We've already implemented geoblocking to limit traffic to the US and Canada, after being overwhelmed by queries from overseas.
I've been looking at bad bot blocker as an option. -Jon This message originated from outside the M365 organisation. Please be careful with links, and don't trust messages you don't recognise.