If you don’t opt out by Apr 24 GitHub will train on your private repos

💥 Explore this must-read post from Hacker News 📖

📂 **Category**:

💡 **What You’ll Learn**:

Pro tip: sign up for the business/enterprise version when reasonable in price.

I do this with Google Workspace. You can also do it with GitHub.

(Google doesn’t train on Workspace, Github doesn’t train on business customers, etc)

No we won’t. Details here https://github.blog/news-insights/company-news/updates-to-gi…

For users of Free, Pro and Pro+ Copilot, if you don’t opt out then we will start collecting usage data of Copilot for use in model training.

If you are a subscriber for Business or Pro we do not train on usage.

The blog post covers more details but we do not train on private repo data at rest, just interaction data with Copilot. If you don’t use Copilot this will not affect you. However you can still opt out now if you wish and that preference will be retained if you decide to start using Copilot in the future.

Hope that helps.

This headline is false; it will not go take your private repos and dump them into a training dataset. Rather, GitHub will train on your copilot interactions with your private repos. If you do not use copilot, this makes no difference to you, though you should probably still turn it off.

That is the charitable way to interpret what they intend to train on but there are many more uncharitable ways to interpret it that involve finding ways to force copilot on you to get access to more private repo training data.

I’m looking forward to the class action lawsuit, even if only to establish a precedent!

I don’t have much hope, but I wish that ignoring software licensing and attribution at scale becomes harder than it currently seems.

To be precise: the opt-out is for GitHub Copilot training specifically, which has always required opt-in for public repos under their policy. The change Apr 24 is about private repos being included by default unless you opt out. If you’re using Copilot in your private repos, definitely opt out unless you’re comfortable with that. The setting is at github.com/settings/copilot — takes 30 seconds.

To Github’s credit, they have been showing a banner consistently.
To my discredit – I never bothered to read that banner until I saw this HN headline

How do I opt out of this for my own private repos? I don’t see anything related to this as I’ve got a ton of settings for Copilot itself (I have access to Copilot through my work org)

I believe it is under:

Settings->Copilot->Features->Privacy=>[
Allow GitHub to use my data for AI model training

Allow GitHub to collect and use my Inputs, Outputs, and associated context to train and improve AI models. Read more in the Privacy Statement.
]

I’ve recently started hosting my own forgejo instance. It works so well! Free tailscale for connectivity. I expose mine over fly.io proxy, also free, but not to be done without caution.

Just spitballing, don’t use these tools myself, but isn’t this something that should be encrypted to really prevent them from training? I personally don’t trust anyone with my data when they pivot to building AI products yet claim my data wasn’t a part of that strategy. It’s too easy to hide/lie.

But it always seemed to me that the UI should run locally with encryption keys that are shared and the service just manages encrypted blobs of diffs that can roll from version to version of encrypted data and that’s about it. Granted I probably don’t know the full workflow, i typically am a single dev on simple projects where I don’t need 99% of the overhead these introduce.

It’s a fair question, but if you need private repos, I think you need to start considering a paid option, or self-host.

If it’s really important to you that the repo is private, I’d self-host.

I would’ve recommended codeberg but codeberg isn’t the finest to be recommended for free private repos.

I definitely feel like more can be done within this space and that there is space for more competitors (even forgejo instances for that matter)

I wonder how effective it would be to sabotage the training by publishing deliberately bad code. A FizzBuzz with O(n^2) complexity. A function named “quicksort” that actually implements bogosort. A “filter_xss” function that’s a no-op or just does something else entirely.

The possibilities are endless. I thought of this after remembering seeing a post a couple months ago about how it doesn’t take a significant amount of bad data to poison an LLM’s training.

Context: https://github.com/orgs/community/discussions/188488

TLDR: As long as you aren’t using Copilot, your code should be safe (according to GitHub).

  What data are you collecting?

  When an individual user has this setting enabled, the interaction data we may collect includes:

  - Outputs accepted or modified by the user
  - Inputs sent to GitHub Copilot, including code snippets shown to the model
  - Code context surrounding the user’s cursor position
  - Comment and documentation that the user wrote
  - File names, repository structure, and navigation patterns
  - Interactions with Copilot features including Chat and inline suggestions

It is the feature “Allow GitHub to use my data for AI model training” that needs to be disabled. Right?

Or am I missing some trick / dark GUI pattern? Just want to make sure.

When Louis Rossmann started describing tech leadership as having a “rapist mentality” I brushed him off as being sensationalist. But actions like this make me think more and more he’s right. The product managers pushing for changes like this are despicable scum.

Even the way modern software phrases questions is rapey.

Imagine a man asking a woman “want to have sex? Or maybe later?” out of the blue, then asking her again every 3 days until she says “yes”

The situation you describe has dynamics that don’t apply when your windows laptop is trying to get you to install an update. A woman can’t have 100% confidence that saying no won’t trigger a man into rage, so just the question being asked at all is already a bit unpleasant. WinRAR trying to get me to buy a license is not as offensive because I know it won’t beat me up for saying no.

🔥 **What’s your take?**
Share your thoughts in the comments below!

#️⃣ **#dont #opt #Apr #GitHub #train #private #repos**

🕒 **Posted on**: 1774647162

🌟 **Want more?** Click here for more info! 🌟

If you don’t opt out by Apr 24 GitHub will train on your private repos

By

Leave a Reply Cancel reply