Facebook’s Fascination with My Robots.txt · Random Notes

🔥 Explore this must-read post from Hacker News 📖

📂 **Category**:

💡 **What You’ll Learn**:

For the past 4 days — and probably more since I don’t have logs beyond that —
Facebook has been hitting the /robots.txt of my self-hosted Forgejo instance
several times per second. The user-agent is
facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php). I
expected the UA header to be nontrustworthy, but all the requests are also
coming from Meta’s IP address ranges.

The interesting thing is that no other file is being accessed. Just robots.txt
over and over and over again.

Facebook’s documentation states:

The primary purpose of FacebookExternalHit is to crawl the content of an app
or website that was shared on one of Meta’s family of apps, such as Facebook,
Instagram, or Messenger. The link might have been shared by copying and
pasting or by using the Facebook social plugin. This crawler gathers, caches,
and displays information about the app or website such as its title,
description, and thumbnail image.

Now, as tempting as it is to think that I’ve suddenly reached unfathomable
levels of popularity on Meta’s platforms, I find it difficult to believe as the
only other traffic on my instance are the
AI bots consistently crawling the qmk_firmware repository and the
very occasional user of one of my Hex packages. And myself. Not even
Facebook themselves are requesting any other path at the moment, just
robots.txt.

Here’s the accesses I’m getting, visualised in two ways for your convenience:

A chart of accesses to robots.txt, ranging from 4000 to over 7000 requests per hour.

This chart provided by my extreme LibreOffice Calc skillz. Data is grouped
by hour. Click the image to open in full size.

So what’s going on at Meta? Why are they so obsessed with my very bog standard
robots.txt file? I’m a nobody and surely not interesting enough that they’d
only be targeting me specifically, so how much bandwidth and energy are they
using globally to mass request robots.txt files in a never ending loop?
Perhaps someone at their end screwed up a loop conditional, but you’d think some
monitoring dashboard somewhere would have a warning pop up because of this.

Anyway, compared to the earlier AI bot onslaught, this traffic is mostly benign
for myself, just interesting. As long as it doesn’t continue picking up speed.

🔥 **What’s your take?**
Share your thoughts in the comments below!

#️⃣ **#Facebooks #Fascination #Robots.txt #Random #Notes**

🕒 **Posted on**: 1771852707

🌟 **Want more?** Click here for more info! 🌟

By

Leave a Reply

Your email address will not be published. Required fields are marked *