Bots and scrapers
-
dima
- Posts: 1885
- Joined: Wed Feb 12, 2014 1:35 am
- Location: Los Angeles
We're having more overloading issues with people's poorly-behaved scripts, and a dumb geoblocker (like I had before) is no longer enough. I just installed fail2ban to kill the worst offenders, and it seems to be doing the job right now. It's possible I tuned it too aggressively: if you see any issues (browser says "site cannot be reached", or something along those lines), please tell me
-
tekewin
- Posts: 1399
- Joined: Thu Apr 11, 2013 5:07 pm
Unfortunately, we in the Age of Ultron Agents.
I am one of the offenders (not on this site), but I was sending agents out to scour the world for information and getting blocked with 429 errors everywhere. I stopped doing that with limited exceptions and with my own throttles in place. There will soon be far more agents on the Internet than people. That may already be the case.
Peakbagger.com and Bob Burds site are now gated with Cloudflare. We might need to do something similar if it is not cost prohibitive. They have a free plan with DDoS protection, which is what agents unintentionally are doing.
I am one of the offenders (not on this site), but I was sending agents out to scour the world for information and getting blocked with 429 errors everywhere. I stopped doing that with limited exceptions and with my own throttles in place. There will soon be far more agents on the Internet than people. That may already be the case.
Peakbagger.com and Bob Burds site are now gated with Cloudflare. We might need to do something similar if it is not cost prohibitive. They have a free plan with DDoS protection, which is what agents unintentionally are doing.
-
dima
- Posts: 1885
- Joined: Wed Feb 12, 2014 1:35 am
- Location: Los Angeles
Yeah, Cloudflare or something like it would solve it, but I REALLY don't want to go there yet. We're a location-specific, niche, old-school forum about the mountains. We shouldn't NEED such big hammers to be able to operate. I'm wondering if the recent influx was related to the thread about Monica receiving a lot of outside attention, which brough with it lots of additional traffic (both human and robot). In any case, the storm seems to have died down for now (maybe because I blocked everybody and they went home, or maybe not
) The current blocking settings maybe are close-enough now. Look at and to see the current settings. To see who's banned right now: and -misc.
Code: Select all
/etc/fail2ban/jail.d/defaults-debian.confCode: Select all
/etc/fail2ban/filter.d/apache-eispiraten.confCode: Select all
fail2ban-client status apache-eispiraten-hammer-
tekewin
- Posts: 1399
- Joined: Thu Apr 11, 2013 5:07 pm
Wow. TIL that fail2ban can secure more than SSH.
To try to understand the config, I fed the .conf files into a friendly AI who gave me this unsolicited comment. Do with it what you will. Your current config seems to be working.
To try to understand the config, I fed the .conf files into a friendly AI who gave me this unsolicited comment. Do with it what you will. Your current config seems to be working.
A maxretry of 20 combined with a findtime of 20 is quite "loose." This configuration allows a bot to make 1 request per second indefinitely without ever getting banned.
Tip: Usually, for aggressive scrapers, you want a longer findtime (like 600 for 10 minutes) or a much lower maxretry (like 5) to catch bots that pace their requests to stay under the radar.
-
dima
- Posts: 1885
- Joined: Wed Feb 12, 2014 1:35 am
- Location: Los Angeles
Oh man. It's totally right. Previously I had problems with it being too aggressive, banning confirmed humans. I detuned it, but I also adjusted the filter regex. After the more specific regex I can probably tighten it again, but I haven't bothered to do that yet. Feel free to play with it. For what it's worth, the onslaught seems to have subsided for now, so maybe we can leave it alone.
-
Nate U
- Posts: 642
- Joined: Wed Apr 05, 2023 7:38 pm
off-trail Los Angeles Mtn explorers and true crime enthusiasts are 2 WILDLY different-sized demographics... this site is not designed to handle the latter.
