Greetings. We've been slammed by bot traffic and had to take counter measures. We geoblocked international traffic at the host firewall level, and recently added a nginx bot blocker for bots based on servers in the US and Canada. I then scraped bot IPs out of the apache logs and began adding the IPs that were still coming through. Yes, I've updated the robots.txt file- they're ignoring it.
The issue is that after a day or two of reprieve, we started getting a ton of 404's with loopback addresses. I've reverted the blacklist config file back to blank, and restarted all services on all servers. We're still getting a ton of traffic that appears to be internally generated.
I don't see anything obvious within crontab. Since it appears to be internally generated, the opac stays up longer than it normally would with the number of sessions on the load balancer.
Is there an Evergreen or Apache service that indexes the entire catalog? We have our external IP whitelisted. Do internal vlan IP addresses need whitelisted?
Here's an example of the traffic I'm seeing. It's all on port 80 too, external traffic all comes on 443.
our_domain:80 127.0.0.1 - - [16/Jun/2025:08:18:31 -0700] "HEAD /opac/extras/ac/anotes/html/r/2621889 HTTP/1.1" 404 159 "-" "-"
-Jon