Some uncomfortable truths about AI coding agents – Standup for Me

💥 Check out this trending post from Hacker News 📖

📂 **Category**:

📌 **What You’ll Learn**:

I’ve been following the development of generative AI closely for several years now. Early on, like most people, I was absolutely blown away by what OpenAI accomplished based on a relatively niche deep learning research paper from Google and a bit of reinforcement learning from human feedback. When it worked, it was absolutely incredible, an illusion so convincing that it made you believe it could do anything. It inspired me to perform considerable experimentation on my own and to develop some proof of concept apps using these large language models. And it continues to be reliable fodder for lively debate between nearly anyone with even a basic understanding of it. I’ve tried really hard not to jump to conclusions about generative AI, one way or the other, but after much contemplation, I think I’m finally ready to render my verdict. I know, I know. You’ve been dying to find out (nearly everyone reading this right now: “Wait, who is this guy?”).

This post comes amidst the seemingly meteoric rise of the AI coding agent, which takes your favourite prone-to-hallucinations LLM and adds a feedback loop that allows for it to generate some truly impressive results. Entire companies are being built from the ground up that are all in on AI coding agents, and even established and well-regarded companies like Notion, Spotify and Stripe seem to be fully onboard – after all, why let humans labour away for ages when an AI coding agent can do it faster and cheaper than they ever could? Depending on who you ask, the AI coding agent has either made the process of manually writing code completely obsolete and worthless or it’s an affront to everything that the software development lifecycle stands for. I’ve decided to wade into that environment to say, definitively, that LLM-based AI coding agents have no place now, or ever, in generating production code for any software I build professionally. And I think you should seriously consider taking that stance, too.

Are AI coding agents powerful? Absolutely, they are. Anyone who has been paying attention and who is being honest with themselves can see that plainly. And do LLMs in general have their uses? Yes (as long as you never, ever trust what they have to say). Right now I want to focus on LLM-based AI coding agents, though. We’ll talk about where LLMs in general are useful for software engineering later, if you’re still with me.

There are four main issues contributing to the blanket ban on AI coding agents in my professional work: skill atrophy, artificially low cost, prompt injections and copyright/licensing.

Skill atrophy

The easiest to comprehend and also the squishiest of those issues is skill atrophy. It’s becoming clear that the software engineer’s job is changing dramatically. The role change has been described by some as becoming a sort of software engineering manager, where one writes little or no code oneself but instead supervises a team of AI coding agents as if they are a team of human junior software engineers. Yes, AI coding agents make mistakes, we are told, but not to worry; the intermediate and senior software engineers will use their years of experience and review every line of code the agents produce to make sure every change is up to snuff. Even if you believe that claim is valid now, I’m here to tell you that the software engineers that have been relegated to code review duty will become rusty over time. Their coding and software design skills will atrophy and they will become worse software engineers as a result. Even if they set out fully intending to provide the highest level of scrutiny to all generated code, they will gradually lose the ability to tell a good change from a bad one because they’ve stopped writing code themselves. Practice and receiving feedback from others are critical to the upkeep and advancement of one’s coding knowledge, but engineers in this position will get none of that.

In reality, though, the code review load for software engineers will gradually increase as fewer and fewer of them are expected to supervise an ever-growing number of coding agents, and they will inevitably learn to become complacent over time, out of pure necessity for their sanity. I’m a proponent of code review for finding room for improvement and to propagate understanding between software engineers, but even I often consider it a slog to do my due diligence for a large code review (just because I think it’s important doesn’t mean I think it’s fun). If it’s your full-time job to review a swarm of agents’ work, and experience tells you they are good enough 95%+ of the time, you’re not going to pay as much attention as you should and bad changes will get through. That’s true of all code reviews, but at least you can mostly trust that your human coworkers mean well and that they can learn from their mistakes. And, what’s more, you can actually walk over (or start a video call) and talk to your human coworker face to face to ask them why they implemented something the way they did. There’s no telling where the LLM got the inspiration for that tricky block of code. Go ahead and ask it; it will only make up a plausible-sounding response because it doesn’t actually know.

I’m fully aware that my views on this particular issue may turn out to be a case of Old Man Yells at Cloud. In particular, I recognize that it is reminiscent of a few decades ago when old timers complained about the proliferation of high level programming languages and insisted they would lead to a generation of programmers lacking a proper understanding of how the system behaves beneath all that syntactic sugar and automatic garbage collection. They won’t have the foundational skills necessary to design and build quality software. And, for the most part, they turned out to be wrong. Practically speaking, plenty of competent software engineers today don’t really understand how their language runtime allocates and frees up the memory they use, but that hasn’t stopped them from building useful and valuable things. At its core, the only defense I’ve got for that response is… this time feels different? Not a particularly rigorous defense, I admit, but I did warn you that this was the squishiest of the issues at hand. Also, I will point out that I have two decades of professional software engineering experience to bolster my argument, for what it’s worth.

Artificially low cost

The artificially low cost of using generative AI is the area I feel least qualified to dissect, but I think I know enough to recognize that the problem is there and I am confident in saying that no one has a solution for it.

The idea that we are in the middle of a generative AI bubble, ready to pop at any moment, is a popular argument for AI naysayers. Essentially, the argument goes that the technology behind LLMs does not live up to the hype and the money spigot will eventually dry up (or worse) as reality catches up. Big tech companies are pouring a truly staggering amount of cash into buying GPUs, memory, storage and warehouses to build data centers to train and run these models, but they’ve been struggling to justify those huge expenses as the initial shine has worn off and the practical limits of LLMs have become clear. Freeing you from having to write filler emails to your boss and coworkers may be handy, but it’s not the killer feature that justifies the astronomical costs these companies incur. But, in the past year or so, they’ve locked onto AI coding agents as the “proof” they were right and that their massive investments will pay off.

However, the cold, hard reality is that generative AI models are wildly unprofitable for the companies that train and operate them, and AI coding agents don’t change that one bit; in fact, they actually compound the problem by encouraging significantly more usage. Big tech companies are following the typical Silicon Valley model of build it first, figure out how to make money later. To that end, they seem to be crossing their fingers and hoping for some incredible innovations to come along on the order of the original Attention Is All You Need paper from Google. They desperately need some miraculous new invention to appear that improves the cost efficiency of their AI models or, even better, to discover the secret sauce that gets them to the elusive and ill-defined end state of artificial general intelligence. Absent the timely arrival of such innovations (I guess it could happen, but I wouldn’t count on it), the bubble will undoubtedly pop and a major restructuring of the industry will follow. We’re at the point where giants like Google and Meta have such massive war chests that they will likely weather that storm, but other players that don’t have other lines of business to fall back on will have to suddenly find a way to actually make money or perish.

Right now the end user prices for generative AI models are completely disconnected from the actual cost of training and running those models. We’re talking the need for orders-of-magnitude price hikes and dramatically lower usage caps in the best case scenarios. It’s not just the companies that build and train foundation AI models, either; all those middlemen that have built their businesses offering tools powered by generative AI models with artificially low prices will find themselves in a pickle when their costs suddenly soar. The blast radius will be large, to put it mildly. Many will not survive.

Some might argue that, even if that time comes eventually, that’s no reason not to make use of the tools that are available right now. But it should come as no surprise that I disagree. Better not to become overly dependent on AI coding agents in the first place so you’ll be better situated to weather the storm (and maybe even thrive) when it comes.

If you want to dive into the questionable economics of generative AI and the incredibly shady “creative accounting” that companies like OpenAI and Anthropic engage in to make it seem to the public like they aren’t lighting mountains of cash on fire every day, Ed Zitron’s blog has you thoroughly covered.

Prompt injection

Prompt injection is a well known issue in LLMs by now, but here’s a brief summary: LLMs are inherently gullible, so much so that a sufficiently motivated individual can trick an LLM into saying or doing something it’s not “supposed” to by carefully crafting inputs to that LLM. An LLM’s job is simply to predict the next token in an expanding string of text, and their lack of true reasoning ability means they struggle to differentiate legitimate prompts from instructions that are surreptitiously buried in context. All of the big LLM providers have made improvements over the years to address this behaviour whenever new prompt injection approaches are found, but those fixes have all amounted to little more than covering leaks in a sinking ship with duct tape. All signs indicate that the class of vulnerability known as prompt injection exploits behaviour that is foundational to how LLMs work. In other words, an LLM’s fundamental inability to reliably distinguish instructions from data is a problem that is very unlikely ever to be fully fixed.

So far the industry has only really had to deal with prompt injections from the perspective of a user in a one-on-one chat session with an LLM who is trying to break the LLM free from its system prompts so that it will do and say things the LLM’s operator didn’t intend to allow. But with the rapid spread of AI coding agents, it’s only a matter of time before malicious outsiders begin launching coordinated campaigns to poison websites under their control and inundate your email inbox with rogue LLM instructions. Your coding agent will then stumble upon these malicious instructions while searching the web for examples of how to solve a particular coding problem or while digging through your inbox for any emails that are relevant to the task at hand.

And it’s not just coding agents either; every type of AI agent with the ability to pull in context from untrusted sources is vulnerable. If an agent is running with sufficiently loose restrictions (as they all do these days), the damage could be truly catastrophic; in many cases, expect nothing less than full compromise of your system and any accounts you’re logged into. Bruce Schneier has helpfully deemed this nascent class of attack “promptware“. Expect to see a lot of it in the coming years as bad actors come up with increasingly novel ways to surreptitiously inject their malicious prompts into an agent’s context. It will certainly make things interesting.

Even ignoring the threat of malicious actors trying to poison your AI agent, you still run the risk of your AI agent hallucinating or losing the thread and deciding it’s a good idea to delete all of your files, databases or emails. It’s not worth the risk.

Copyright/licensing

I’m not a lawyer! I’m a legal layperson offering my unqualified assessment of some tricky legal questions. Let’s get to it.

I don’t know why, but this is an issue that very few people seem to be talking about. In the U.S. at least, the output of a generative AI model is not copyrightable.

That point is critically important to understand, so let me put that another way: when you spend all that time and effort crafting the perfect specifications and then feed them into an AI model, the code that it spits out is, at best, part of the public domain. You don’t own it. No one does. Or perhaps it’s more accurate to say that everyone in the world owns it.

Seriously.

The U.S. Copyright Office has ruled that the output of a generative AI cannot be copyrighted. Since then, the question has also been brought before the U.S. courts and the courts have agreed with that ruling, to the point that the case made its way all the way to the Supreme Court, which elected not to intervene, letting the decisions of lower courts stand: any output of a generative AI is not protected by copyright. Granted, that’s just in the U.S., but as the country with the largest population in the western world and despite repeated attempts by the current administration to thoroughly alienate its allies, it remains a very powerful political and economic force. What’s more, I don’t think it’s a stretch to assume that at least some other countries with similar definitions of copyright will come to the same conclusion in the foreseeable future.

I mentioned above that Notion is going hard on using AI coding agents. Let’s imagine a hypothetical world in which they had used AI coding agents from the very beginning of their business and that those agents had produced all of the code in their productivity software. All of that hypothetical code would technically belong to everyone now. If an employee of a company like that were to decide to dump that source code somewhere online, the whole world would suddenly have totally unrestricted access to do whatever they wanted to do with it. It’s not clear that the company would even have a legal mechanism to enforce a prior limit on the employee’s fundamental right to distribute something that is legally in the public domain (Notion’s AI-generated code in this example) to the rest of the world. NDAs protect trade secrets, but can they be used to prevent an employee from sharing something that, by rights, belongs to everyone?

In the real world, up until recently, presumably all (or nearly all) of Notion’s code was written by humans. If an employee were to leak Notion’s proprietary code online, that code would be highly radioactive, legally speaking. If a competitor incorporated any of that proprietary code into their own products, they would do so risking catastrophic legal and financial penalties. Copyright to the rescue. Add in the fact that an employee would have to violate their NDA with Notion to leak the code in the first place and it works as a pretty effective deterrent.

But in the hypothetical world we’re imagining, anyone could spin up some servers running the published code (swapping out legally protected assets like images and trademarks, of course) and start a direct competitor; they’d be able to completely skip over the years and years that the company spent carefully guiding their AI coding agents to build a highly capable and competitive productivity suite. Yes, this new competitor would then have to maintain their own branch of the code to fix bugs, add new features, etc., but that’s what AI coding agents are for, right?

In response some would argue that, if our hypothetical company had used AI coding agents to cheaply produce all the code for a modern, full-featured note-taking app, then the code itself isn’t that important or valuable anyway; it would be no big deal for someone else to use the AI coding agents of their choice to build their own competitor, whether or not they have access to the original source code. But that completely ignores all of the work that must still go into researching a problem, discussing with peers and writing specifications so that an AI coding agent can turn them into functioning code and then all the refinement and revisions that come afterwards. Software engineering will continue to be a time-consuming process even if writing code is virtually a non-factor. Ironically, the specifications that engineers spend all that time writing and feeding into AI coding agents would likely be covered by copyright, but it would be completely irrelevant to protecting the company’s interests. Thanks to this hypothetical company’s use of AI coding agents, a competitor would be able to legally skip all the time and hard work that is traditionally necessary to get established!

The question of copyright becomes significantly more complex when a codebase is a mix of human-generated and AI-generated code, which may give companies enough legal cover as long as they maintain a certain baseline (whatever that may be) of human-generated code. Or maybe governments around the world will gradually amend their definitions of copyright to assign ownership of the output of a generative AI to the person or organization that wrote the prompts; that will be no small feat as long as the Berne Convention and the TRIPS Agreement are around! I still contend it’s best just to let the humans write most or all of the code to begin with and skip all the legal uncertainty that comes from having the bots do it. At the very least, it’s probably a good idea to consult some copyright lawyers before you decide to lean too heavily on AI coding agents for your business.

So, what are LLMs good for?

Research! But only as long as you carefully verify every single thing an LLM tells you. Whenever you encounter a new package or API you have little or no experience with, being able to ask an LLM to generate an example block of code is absolutely brilliant. That said, they make mistakes often enough that you won’t be able to directly copy and paste in many cases.

Also, as AI coding agents mature and actually implement comprehensive isolation guarantees to prevent them from going wild and deleting all of your emails, they could be a great enabler for non-coders to bring their ideas to life, as long as those users don’t intend to build a business off of the results. Or even for experienced coders who don’t have time to manually build out all of their recreational side projects.

Conclusion

Whew, you made it! Thanks for reading this far.

Despite my unease with them, it’s clear to me that LLMs in general and AI coding agents in particular are here to stay, in one form or another. But, if this post convinces some people at least to think more critically about the pros and cons of the tech, then I have succeeded.

Okay, this is the part of the blog with the obligatory shameful product plug: please consider giving Standup for Me a try. It’s a standup bot for Slack that is lovingly crafted and maintained by yours truly, without a single line of AI-generated code. Its headline feature is the ability to integrate with the services you use (Google Calendar and GitHub right now) to fetch a list of the things you did yesterday and the things assigned to you for today to help you write useful responses for your daily standups. It’s free for small teams!

🔥 **What’s your take?**
Share your thoughts in the comments below!

#️⃣ **#uncomfortable #truths #coding #agents #Standup**

🕒 **Posted on**: 1774643513

🌟 **Want more?** Click here for more info! 🌟