The Friction That Stopped Waste

One observation from using agentic engineering with Brighter is that the old adage of “work expands to the resources available” is definitely true. In an OSS context, where I am paying for tokens out of pocket, that is my call, but the trade-offs need thought in commercial settings.

Quality From The Loop

The cause, I think, is that the loop of generate => evaluate => repeat. It helps drive quality, typically higher than we would have reached through manual effort.

I set this up so a sub-agent (or new agent) with a fresh context can review our last milestone. The review agent assigns a score derived from its evaluation. We want to ignore findings below a certain score as “noise” so we don’t get too many “false positives.” We break the loop when all of the evaluation findings fall below that threshold. Typically, the review process is run after gathering requirements, creating the program design, building the task list, and generating the code.

In essence, it helps to prevent the slop that a first generation may create. Often, that is about the evaluator having a fresh context, both in terms of context rot and the agent’s tendency to assume that earlier work is right, whereas the sub-agent is instructed to be adversarial.

While those iterations increase cost, the result is higher quality, which is what we want. Right?

How Much Quality?

This is our first trade-off. What quality threshold do we need? Well, it's OSS, right? I want folks to be able to rely on it. So, we set a low-ish score for what we want to address.

That is the first cost issue. Some of those items might have been skipped in the past if the trade-off between my time, shipping the feature to get feedback, and the effort was weighed against how important that finding was.

But I also find myself more inclined to take the harder path. Some features might have choices about what we offer. Typically, what edge cases might we support?

A Brighter Example

_My current example: I am working on a feature to add DB migration for our Outbox and Inbox. At startup, we will check that you have the latest version, and if not, migrate you. We will lock the producers and consumers during an update, so that it works in a distributed environment.

But what about existing databases? Do we just assume that you are on the DDL we shipped with V10, and only upgrade you from there? Perhaps you are stuck on V9 because the cost of a DB migration is a pain point? Maybe you are on an older, now unsupported version, because of this.

One answer is to go back and figure out all the versions we have shipped from the DDL change history in Brighter. In that way, it doesn’t matter which version you are on; we can upgrade you. (There is a little trade-off in that we can’t switch you from text to binary content as part of that, but you probably don’t want that during an upgrade, as it’s a choice.)_

Now, that is quite a lot of research to go through the git history across multiple DBs we support, and it carries a high risk of getting it wrong if we do it manually. But an agent is good at this kind of research. So, before I know it, I am asking the agent to investigate, burning tokens to assess the feasibility of something I would probably have rejected if I had to do it by hand.

_I would have favored just getting it out and assuming folks are on the V10 baseline, perhaps V9, as we support that, if I had to do this by hand.

But now, I am burning tokens, and the agent has answers. And now I have spent tokens on the answers, well, isn’t that the hard part? Why not just work with the agent on the requirements and design?_

_And before I know it, we are burning tokens on the design, after all, it’s quick to see what this will entail.

And having burned those tokens investigating, designing, well… it would be a shame not to spend tokens implementing it._

It’s seductive. I could have made this better than I would have if the friction of the time commitment to OSS hadn’t held me back. I can make my dreams real. I just need to pay for the tokens.

But token costs have always been subsidized…the first hit is always for free kids…and soon the choices may be harder.

Cost Pressure Lowers Cognitive Load

And perhaps, for OSS that many will use, where I feel the token cost because they come out of pocket, I can easily make this call.

But in a commercial setting? If friction is low, I may feel pressure to hit the high bar; I don’t want my colleagues to think I ship AI slop, and I don’t want to produce unreliable software. And so the token cost goes up.

But perhaps, as importantly, the software’s cognitive load is increasing. It handles more edge cases, includes paths for very specific circumstances, and may not opt for

simplifications that might have been forced upon us by friction.

When we talk about cognitive debt, it’s not simply about failing to observe the loop or to appreciate that we are still programming, just not coding. It’s also about our ability to add software we might have previously rejected due to friction.

We have been burned in the past, when we made something hard easy (for example, when we made it easy to write a new service via FaaS and ended up with a nanoservice sprawl). It’s hard not to believe that we won’t get fooled again.

Rising Token Costs To The Rescue?

But perhaps rising token costs will actually help. Maybe it becomes the new friction, the new “is this worth it”. Once it was my time, or commercially, the team’s time, when there were so many other things to build. Now it’s the token spend. Will this be the best use of our token budget this month?

The free lunch may be over…soon…but maybe some friction will help us keep cognitive load lower again.

Copyright © Ian Cooper 2024+ All Rights Reserved.