Bots Are Eating My Blog for Lunch
by Peter RukavinaAfter moving his blog to Hetzner, Peter noticed strange CPU spikes and heavy bandwidth use. A bit of sleuthing pointed to bots—especially Scrapy and other AI crawlers—hoovering up his content.
I read this post while enjoying my first coffee this morning, and it piqued my interest. I don't monitor web stats on this site, so I was in the dark. However, server logs are a thing, so I decided to dumped my last few months of access logs and did similar analysis.
Here's a very high-level breakdown of bots vs humans:
- Total bot/tool traffic: ~453,703 (≈ 29.5%)
- Likely human browser traffic: ~555,042 (≈ 36.1%)
- Unknown/empty User-Agent: ~527,765 (≈ 34.3%)
So unlike Peter, I'm not getting hammered by bots. But assuming that the 34.3% of traffic that has no user agent assigned are also bots, that's still around 65% of my total traffic.
Bloody hell...
Here's a look at the top 20 user agents, like Peter did:
Rank | User-Agent | Hits | Notes |
---|---|---|---|
1 | N/A | 266,544 | No User-Agent 😒 |
2 | NetNewsWire (RSS Reader; https://netnewswire.com/) | 59,475 | Legit Mac/iOS RSS client |
3 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36 | 42,789 | Standard Chrome browser |
4 | node | 40,010 | Likely scripted bot/automation using Node.js |
5 | FreshRSS/1.26.2 (Linux; https://freshrss.org) | 31,913 | Self-hosted RSS reader |
6 | Mozilla/5.0 (compatible; Reeder/2025.5) | 22,112 | iOS/macOS RSS reader |
7 | fasthttp | 20,121 | Go-based HTTP client — usually scripts or scrapers |
8 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:139.0) Gecko/20100101 Firefox/139.0 | 19,867 | Standard Firefox browser |
9 | FreshRSS/1.26.1 (Linux; https://freshrss.org) | 18,747 | Slightly older FreshRSS |
10 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36 | 17,071 | Older Chrome version |
11 | Mozlila/... (misspelled, Android) | 15,746 | Misspelling of "Mozilla", likely a poorly-coded bot |
12 | Reeder/5050102 CFNetwork/... Darwin/24.5.0 | 14,911 | Native Apple app using Reeder |
13 | NextCloud-News/1.0 | 14,088 | News reader built into NextCloud |
14 | Mozilla/5.0 (compatible; Miniflux/2.2.9; +https://miniflux.app) | 13,961 | Minimalist self-hosted feed reader |
15 | Go-http-client/1.1 | 13,099 | Generic Go client — automation/scraper common |
16 | python-requests/2.32.3 | 13,065 | Python script traffic — could be bots or utilities |
17 | Mozilla/5.0 (...) Chrome/132.0.0.0 Safari/537.36 | 12,975 | Normal Chrome browser |
18 | Mozilla/5.0 (compatible; Miniflux/2.2.8; +https://miniflux.app) | 12,062 | Older Miniflux |
19 | Mozilla/5.0 (...) PetalBot | 11,145 | Huawei search engine crawler |
20 | FreshRSS/1.26.3 (Linux; https://freshrss.org) | 10,967 | Latest FreshRSS version |
There are a number of different types of bots in this list, but more importantly, there's a whole heap of RSS feed readers in here too, which brings me joy as RSS is great!
Unlike Peter, I don't think I'm going to take any action. Mainly because I have no user agent to match against for a good chunk of the data. I can't even look at IPs as my web server strips the last 2 octets of all IP addresses before they're logged. Privacy, yo!
None of this is impacting the site, but it's still annoying that the majority of this site's traffic seems to be bots of some kind.
Anyway, Peter's post is an interesting one - go check it out.