I rebuilt a fifteen-year-old Mac audio app with an agent fleet
There is an error code I will remember for the rest of my career: 4099.
It appeared, disappeared, and reappeared in an oscillation between a music player and a plugin process that were supposed to be talking to each other. Some days it cost an editor window. Some days it cost nothing. Reproducing it took longer than fixing it. That bug, and a few dozen of its cousins, are what this essay is actually about. But first, some history.
A survivor from a different era
Fidelia is a Mac music player for people who still own their files. Macworld covered the original release in 2011, back when there was a whole shelf of serious Mac audio software from small shops. Most of that shelf is gone now. The companies consolidated, the apps went unmaintained, the 32-bit deprecations and notarization deadlines and architecture transitions picked off the stragglers.
Fidelia survived, barely. When I acquired Audiofile Engineering in 2017, the app worked, customers loved it, and the codebase was aging out of the platform underneath it. The honest choices were: let it die with dignity, or rebuild it. Completely. New engine, new UI, new DSP, App Store sandbox, the works.
I am one engineer. The previous era’s answer to “rebuild a commercial audio app catalog” was a team and a couple of years. I did not have either. What I had was two decades of shipping products, a working knowledge of where every fiddly bit was buried in the old code, and a new kind of leverage: a fleet of coding agents.
What “agent fleet” actually means here
Not autocomplete. Not a chatbot that writes functions on request. A pipeline.
Agents in this shop write code against the same gates a human team would face: every change goes through a merge request, continuous integration runs lint and the full test suite, and nothing merges red. The agents do the bulk of implementation, test-writing, and regression archaeology. I do architecture, taste, and verification on real hardware with real plugins and real ears.
Two disciplines turned out to matter more than anything else.
First: evidence over plausibility. An agent will happily produce a fix that sounds right. The rule here is that no fix lands without the failure pinned first: the crash symbolicated to a line, the race observed in a log, the bad behavior reproduced in a harness. We built deterministic replay harnesses for the nastiest subsystems so that a bug observed once on hardware could be replayed forever in CI. Agents are spectacular at building those harnesses and terrible at knowing they need one unless you make it law.
Second: distrust green. CI auto-retries crashed tests, which means a green pipeline can hide a crash that happened and then “passed” on retry. One of the standing rules in this shop is that someone greps the job trace for restart markers before anything ships. The someone is usually an agent. The rule exists because of the one time it wasn’t.
The mountain: third-party plugins in a sandboxed App Store app
The rebuilt Fidelia hosts Audio Units: real, third-party, commercial plugins, inside a sandboxed app distributed through the Mac App Store. Very few apps even attempt this, and the reason is simple. Audio Units are a contract from 20 years ago. The sandbox concept arrived a decade later. Nobody renegotiated.
Every plugin is secretly its own application. It has its own UI toolkit opinions, its own copy protection, its own assumptions about what it is allowed to read, write, install, and phone home to. A music player that hosts plugins is really a tiny operating system for other people’s software, and other people’s software does not care about your sandbox entitlements.
Some scars, with the recipes left out:
A plugin that loaded perfectly on first launch and died if you relaunched the app too quickly, because the previous host process had not finished letting go. A plugin window that came up dead, every time, until we understood the exact order in which the plugin expected to be asked for things, an order documented nowhere. Copy protection that wanted to install helpers a sandbox will never permit, producing four permission dialogs in a row on first encounter. Four. And the 4099 oscillation, which I will not explain in detail except to say that interprocess communication between a host and a crashed plugin is a haunted house, and every door creaks.
The architecture that survived all of this is isolation: plugins run in separate processes, supervised, with the app built to outlive them. When a plugin gives up, the music keeps playing. That sentence took the better part of a release cycle to make true, and it is the proudest boring sentence in the product.
Why bother? Because every one of those misbehaving plugins is somebody’s favorite. The insert slots exist for sober work: headphone EQ, crossfeed, room correction. But racks are personal. If an app says it hosts Audio Units, it should host yours, including the weird ones, including the one with the wood-railed UI that you bought because it sounds like 1976.
What the agents were good at, and what they were not
Good: endurance. The four-hundredth lint violation gets the same attention as the first. Breadth: a fleet can sweep every call site, every platform directory, every test target, where a tired human samples three and hopes. Harnesses: given a failing recording, agents built replay infrastructure that I would never have justified building alone. Archaeology: tracing a regression through eighteen months of merge history is exactly the kind of patient, unglamorous work that agents do without resentment.
Not good: knowing when fixed is not fixed. An agent will see the test pass and believe. It has never heard a click at a buffer boundary, never had a customer’s session ruined by a crash that only happens on the second Tuesday of a sample-rate change. It does not know that the real acceptance test for an audio app is a human listening, slightly annoyed, on hardware the developer does not own. And agents never once told me an idea was bad. Taste does not delegate.
The honest summary is that agents amplify whatever discipline already exists. With gates, evidence rules, and a human who owns the judgment calls, they are the difference between maintaining a catalog and abandoning it. Without those things, they are a way to produce impressive wreckage very fast.
Why this matters beyond one music player
The economics of small software changed this year, quietly. A catalog that once required a funded team to keep alive can now be kept alive, and advanced, by one person who knows what they are doing. That is not a threat to craft. It is the thing that lets craft survive in a market whose gravity is consolidation. The strange little tools, the opinionated players, the software with wood rails: those only exist if small shops can afford to keep building them.
The soul of this work is unchanged. It is in the listening, the restraint, and the stubborn belief that software for musicians should respect both the music and the person. The agents are infrastructure, like the buried fiber that brings the internet to my studio in the forest. Nobody confuses the fiber for the work.
Fidelia is on the Mac App Store with a free trial, three insert slots, and no subscription. If you want to hear your own library through your own rack, bring the weird plugins.
And if your team is staring at its own haunted house, a stalled release, a plugin that will not load, an app the sandbox keeps eating: that is the day job. audiofile.engineering/engineering.