Bots and scrapers

Comments & questions about this site.
User avatar
dima
Posts: 1906
Joined: Wed Feb 12, 2014 1:35 am
Location: Los Angeles

Post by dima »

We're having more overloading issues with people's poorly-behaved scripts, and a dumb geoblocker (like I had before) is no longer enough. I just installed fail2ban to kill the worst offenders, and it seems to be doing the job right now. It's possible I tuned it too aggressively: if you see any issues (browser says "site cannot be reached", or something along those lines), please tell me
User avatar
tekewin
Posts: 1409
Joined: Thu Apr 11, 2013 5:07 pm

Post by tekewin »

Unfortunately, we in the Age of Ultron Agents.

I am one of the offenders (not on this site), but I was sending agents out to scour the world for information and getting blocked with 429 errors everywhere. I stopped doing that with limited exceptions and with my own throttles in place. There will soon be far more agents on the Internet than people. That may already be the case.

Peakbagger.com and Bob Burds site are now gated with Cloudflare. We might need to do something similar if it is not cost prohibitive. They have a free plan with DDoS protection, which is what agents unintentionally are doing.
User avatar
dima
Posts: 1906
Joined: Wed Feb 12, 2014 1:35 am
Location: Los Angeles

Post by dima »

Yeah, Cloudflare or something like it would solve it, but I REALLY don't want to go there yet. We're a location-specific, niche, old-school forum about the mountains. We shouldn't NEED such big hammers to be able to operate. I'm wondering if the recent influx was related to the thread about Monica receiving a lot of outside attention, which brough with it lots of additional traffic (both human and robot). In any case, the storm seems to have died down for now (maybe because I blocked everybody and they went home, or maybe not :) ) The current blocking settings maybe are close-enough now. Look at

Code: Select all

/etc/fail2ban/jail.d/defaults-debian.conf
and

Code: Select all

/etc/fail2ban/filter.d/apache-eispiraten.conf
to see the current settings. To see who's banned right now:

Code: Select all

fail2ban-client status apache-eispiraten-hammer
and -misc.
User avatar
tekewin
Posts: 1409
Joined: Thu Apr 11, 2013 5:07 pm

Post by tekewin »

Wow. TIL that fail2ban can secure more than SSH.

To try to understand the config, I fed the .conf files into a friendly AI who gave me this unsolicited comment. Do with it what you will. Your current config seems to be working.
A maxretry of 20 combined with a findtime of 20 is quite "loose." This configuration allows a bot to make 1 request per second indefinitely without ever getting banned.

Tip: Usually, for aggressive scrapers, you want a longer findtime (like 600 for 10 minutes) or a much lower maxretry (like 5) to catch bots that pace their requests to stay under the radar.
User avatar
dima
Posts: 1906
Joined: Wed Feb 12, 2014 1:35 am
Location: Los Angeles

Post by dima »

Oh man. It's totally right. Previously I had problems with it being too aggressive, banning confirmed humans. I detuned it, but I also adjusted the filter regex. After the more specific regex I can probably tighten it again, but I haven't bothered to do that yet. Feel free to play with it. For what it's worth, the onslaught seems to have subsided for now, so maybe we can leave it alone.
User avatar
Nate U
Posts: 658
Joined: Wed Apr 05, 2023 7:38 pm

Post by Nate U »

off-trail Los Angeles Mtn explorers and true crime enthusiasts are 2 WILDLY different-sized demographics... this site is not designed to handle the latter.
User avatar
dima
Posts: 1906
Joined: Wed Feb 12, 2014 1:35 am
Location: Los Angeles

Post by dima »

The board is super slow right now; we're being bombarded again.

We should see if tightening the fail2ban settings would alleviate it. tekewin: feel free to fix it before I get to it :)
GoalHiking
Posts: 44
Joined: Sun Feb 18, 2024 10:58 am

Post by GoalHiking »

Whatever you do, please don't use Cloudflare since they're pro-censorship. First they came for, etc etc.
User avatar
tekewin
Posts: 1409
Joined: Thu Apr 11, 2013 5:07 pm

Post by tekewin »

I've taken a look at the custom fail2ban configs, the fail2ban logs, and the apache2 logs.

A sample of the access log showed 2000 requests, 1904 unique IPs → ~1.05 requests per IP on average in a five minute block. This is a distributed botnet, not a few abusers. 1527 of 2000 requests (76%) hit /app.php/thankslist — the "Thanks for posts" extension's public list page, each IP hitting one page. None of these were being blocked by fail2ban.

I turned off Guest access to the Thanks list. It shouldn't affect users.

In the phpBB ACP: Permissions → Group permissions → Guests → Advanced Permissions -> Misc -> Can view list of all thanks: No.

I think the custom fail2ban config in apache-eispiraten-hammer.conf is probably catching more users than bots. The one for file downloads looks good. I can improve the apache-eispiraten-hammer.conf patterns after more research on the access log. I'll be out of town until the middle of next so I don't want to make any serious tweaks to it. The only thing changed was the Guest access to the Thanks list.
User avatar
dima
Posts: 1906
Joined: Wed Feb 12, 2014 1:35 am
Location: Los Angeles

Post by dima »

GoalHiking wrote: Fri May 22, 2026 11:04 am Whatever you do, please don't use Cloudflare since they're pro-censorship. First they came for, etc etc.
I haven't been following too closely. Do you know if the open-source cloudflare flavors are as effective? Anubis and whatever bugs.debian.org uses and such.
User avatar
tekewin
Posts: 1409
Joined: Thu Apr 11, 2013 5:07 pm

Post by tekewin »

I'm not familiar with the open source equivalents. Have no idea.
User avatar
dima
Posts: 1906
Joined: Wed Feb 12, 2014 1:35 am
Location: Los Angeles

Post by dima »

tekewin wrote: Fri May 22, 2026 11:16 am I've taken a look at the custom fail2ban configs, the fail2ban logs, and the apache2 logs.

A sample of the access log showed 2000 requests, 1904 unique IPs → ~1.05 requests per IP on average in a five minute block. This is a distributed botnet, not a few abusers. 1527 of 2000 requests (76%) hit /app.php/thankslist — the "Thanks for posts" extension's public list page, each IP hitting one page. None of these were being blocked by fail2ban.

I turned off Guest access to the Thanks list. It shouldn't affect users.

In the phpBB ACP: Permissions → Group permissions → Guests → Advanced Permissions -> Misc -> Can view list of all thanks: No.

I think the custom fail2ban config in apache-eispiraten-hammer.conf is probably catching more users than bots. The one for file downloads looks good. I can improve the apache-eispiraten-hammer.conf patterns after more research on the access log. I'll be out of town until the middle of next so I don't want to make any serious tweaks to it. The only thing changed was the Guest access to the Thanks list.
Take your sweet time, and thanks for looking at it! Do you see the slowness? Maybe 1/4 of the time when I try to load the board, it takes ~20-30sec for it to come up. Do you see that? Would be interesting to look at the logs during one of those events.
User avatar
tekewin
Posts: 1409
Joined: Thu Apr 11, 2013 5:07 pm

Post by tekewin »

dima wrote: Fri May 22, 2026 12:31 pm Take your sweet time, and thanks for looking at it! Do you see the slowness? Maybe 1/4 of the time when I try to load the board, it takes ~20-30sec for it to come up. Do you see that? Would be interesting to look at the logs during one of those events.
Yes, I've experienced it myself. I got banned for a while last night while I was gathering the log data.

I'll look for one of those events happening to a user.

Mainly, I don't want to make things worse.
User avatar
Sean
Cucamonga
Posts: 4364
Joined: Wed Jul 27, 2011 12:32 pm

Post by Sean »

Yeah, it's kind of annoying when I can't get on the site or can't upload a file because of these attacks.