🚀 Discover this awesome post from Hacker News 📖
📂 **Category**:
✅ **What You’ll Learn**:
January 8, 2026
Neural Networks at Buildtime, Software at Runtime
Over the last 6 months and the last 6 weeks in particular, AI coding agents have shown to be incredibly capable at writing software. Tasks that traditionally required weeks of human labor can now be done in days if not hours. Even more incredibly, software systems that are designed from the start to harness AI coding agents exhibit many of the characteristics of the neural nets that were integral to their creation in the first place. These AI-native software systems are learned, not designed. Code is the policy, deployment is the episode, and the bug report is the reward signal – well-architected coding agents can drive this loop with little human intervention. Unlike traditional reinforcement learning architectures, they are encoded in CPU instruction sets instead of neural network weights, but they are learned just the same.
The success of coding agents and the software systems built thereon carry lessons about where to apply AI agents in general as well. Coding, like many other creative tasks, requires judgment. How best to implement some function with input A and output B; how to name some variable; whether to share some function or implement a new version; etc. Neural networks excel at judgment (more on why below). Yet many of the agentic deployments we are seeing in the wild are against tasks that can be fully specified as explicit instructions. Of course, traditional software excels at executing explicit instructions. Any programming language can be executed on today’s machinery at billions of instructions per second.
Coding agents get this exactly right, since by definition they are making a series of judgments when writing code at buildtime and leaving the execution of such code to machines operating at runtime. The best performing architectures follow suit, delegating judgment to neural networks and execution to traditional software, even when the executable artifacts are produced entirely by AI.
Some Agents in Practice #
Many agentic AI projects are failing — agentic drift, opaque debugging, brittle autonomy. Meanwhile, Claude Code has driven significant productivity gains by doing something different: it writes code that humans review and deploy, producing artifacts that are durable, version-controlled, and deterministic.
These failures and successes reflect a fundamental architectural difference.
Judgment and Execution Historically #
Humans have historically done two different types of jobs for different reasons, and AI changes each differently.
Judgment is fuzzy classification that cannot be specified as explicit rules. This variable should be made private, not public; this handwritten letter is a “B”, not a “P”; this customer complaint is about a refund, not fraud; this image contains a receipt; this element on some unfamiliar page is “the login button.” Humans did these tasks because traditional CPU-based Von Neumann machines simply could not. The rules could not be written down, and even today exist only as learned boundaries in high-dimensional space. Minimization of a loss function via gradient descent in a vastly dimensional space draws these boundaries inside neural networks without confinement to the nouns and verbs of English, C, or even Rust (lol).
Execution is discrete logic that can be specified as explicit rules. If complaint type is refund and days since purchase is less than 30, approve; if machine type is CPAP and facility code is X, the SKU is ABC-123; click the element with selector a[href="/login"]. Humans did these tasks, even though Von Neumann machines theoretically could and are more reliable and faster, because writing and operating software systems that encode these rules was expensive. The investment was not worth the savings not because of any fuzziness inherent to the task.
Common Conflations Today #
Dominant agent architectures conflate judgment and execution, frequently using neural networks for both. The consensus definition of an agent — “an LLM runs tools in a loop to achieve a goal” — clarifies the mechanism but not the problem space.
Frameworks like browser-use and Stagehand embody this conflation. Consider browser-use:
agent = Agent(task="Find the top HN post", llm=llm, browser=browser)
await agent.run()
Or Stagehand:
await stagehand.act("click on the stagehand repo");
await agent.execute("Get to the latest PR");
In both cases, the LLM performs judgment (which element is “the stagehand repo”?) and execution (click it, figure out the next step, click that). The entire loop is neural. No durable artifact emerges. The LLM is the runtime.
Why Execution Requires Traditional Software #
Neural networks lack the properties that execution requires: determinism, auditability, and precision on edge cases.
Consider this business logic from a system that processes medical equipment orders (from Docflow Labs, my startup):
const scriptedMachineCode = extractMachineCodeFromScripted(scriptedMachine);
if (scriptedMachineCode && machineType) 💬
if (!machineSku || machineSku.trim() === "") {
const facilityMachineCode = extractMachineCodeFromFacility(facility);
if (facilityMachineCode && machineType) 💬
}
This code handles combinations that may occur once a year — a rare facility, an unusual machine type, a specific classification. The code provides 100% precision even for edge cases. When a billing dispute arises and someone asks why the system chose rental versus purchase for a particular patient, the logic can be traced line by line. It lives in version control and is semantically transparent, deterministic, and auditable.
A neural network approximating this function cannot provide these properties. Sparse training data will never cover the combinatorial space. Moreover, it blurs boundaries that business requires to be sharp. And it fails opaquely — gradients and activations offer no affordance for debugging. Decisions in this substrate are semantically opaque, non-deterministic, and untraceable.
The Stagehand Example: Half Right #
Stagehand’s act("click on the stagehand repo") correctly implements judgment via a neural network in some sense. Which element on any dynamically chosen page corresponds to the “stagehand repo” cannot be represented in traditional software. There are too many permutations of page layout. The fuzziness of these boundaries is best approached by neural networks in massively multidimensional space minimizing some loss function against many examples.
In another sense, however, Stagehand’s architecture is limited. We may know ahead of time which webpage we are attempting a click against and it may change infrequently, requiring only a one-time (or few-time) judgment.
Yet Stagehand produces no executable artifact by design. Instead, the LLM returns a selector, which gets cached opaquely outside version control. On cache miss, the LLM re-engages at runtime to re-interpret the instruction, invoking a neural net.
A better architecture might still allow the LLM to make a judgment and return a selector, but afford positioning this judgment squarely at buildtime. The selector gets emitted as code into a Playwright script. The script is committed to version control, reviewed, and deployed. On failure — because the site changed and the selector broke — the development process re-engages. An AI agent rewrites the script. Same judgment, different artifact. The selector becomes a semantically transparent piece of the underlying software system, not ephemeral runtime state.
A Better Architecture #
Neural nets may remain at runtime when tackling judgments that can only be made dynamically at runtime. Every other LLM agent belongs at buildtime accelerating the production of executable software.
complaint_text = get_complaint()
complaint_type = llm.classify(
complaint_text,
categories=["refund", "fraud", "shipping", "other"]
)
if complaint_type == "refund":
if days_since_purchase < 30 and item_condition == "unopened":
approve_refund()
else:
escalate_to_human()
elif complaint_type == "fraud":
flag_for_review()
The workflow orchestrator is traditional software. It calls out to neural networks for judgment tasks: classification, extraction, interpretation — the fuzzy pattern matching that cannot be specified as rules. Then it executes business logic itself, deterministically. The execution paths are explicit, auditable, and version-controlled.
This is not a new pattern. Production ML systems already work this way: a model classifies, code acts. What’s new is that AI agents can write the code, dissolving the apparent tradeoff between RPA (deterministic but brittle) and AI agents (adaptive but unpredictable).
Development Time Approaching Runtime #
Software systems have historically maintained a clear separation between two domains: development (humans writing code, days or weeks) and execution (CPUs running code, nanoseconds). AI coding agents close this gap. The theoretical limit as development time approaches zero is runtime:
Even if AI never achieves nanosecond times for writing software, timescales of hours, minutes, and perhaps even seconds allow software systems to adapt to feedback as it arrives.
As AI agents get more capable, the distinction between “writing code” and “running code” may dissolve. What emerges resembles reinforcement learning with a different substrate. In traditional RL, a neural network observes state, outputs an action, receives a reward signal, and updates its weights. The network is the adaptive element.
Substitute software for the neural network and the structure remains identical. The system observes data — requests, errors, metrics, user complaints. Code executes a response. Feedback arrives. An AI agent updates the code. Same adaptive loop, different computable substrate.
The difference in representation matters. Neural networks encode behavior in opaque weight matrices. Software encodes behavior in symbolic, human-readable form. Software can be audited, debugged, and surgically modified if necessary. A single fallback chain can be altered without retraining an entire model and hoping it generalizes correctly. The symbolic substrate preserves the properties that production systems often require: interpretability, debuggability, auditability, and surgical modifiability. When the learned update mechanism provides adaptability, you get the benefits of RL without the costs.
Adaptable Software Systems #
Ironically, software is still mostly all you need at runtime.
Neural networks are best reserved for judgment — the fuzzy tasks we cannot otherwise specify in language — and for buildtime acceleration. Neural networks will not replace traditional software, but rather enable its proliferation into corners of the economy that could benefit from reliable discrete logical execution at a fraction of historical costs.
An architecture where neural networks handle runtime judgment, software handles execution, and AI agents accelerate buildtime creates a symbolic substrate that is nonetheless adaptable — auditability, determinism, and precision alongside the adaptability of learned systems.
This is what we’re building at Docflow Labs: adaptive systems with a symbolic substrate. If this resonates, say hello!
{💬|⚡|🔥} **What’s your take?**
Share your thoughts in the comments below!
#️⃣ **#Software**
🕒 **Posted on**: 1769733441
🌟 **Want more?** Click here for more info! 🌟
