AI, Enterprise, Entrepreneurship, Startup

When (and Why) You Might Need to Break Up with Your Favorite Frontier Model

Most founders treat their primary API as a fixed variable. The ones who have been through a forced migration know better.

Does this sound familiar? A founder spent months fine-tuning prompts around a specific model’s behavior––its formatting quirks, its tendencies under long context, the exact phrasing that kept it on-task. Then, the provider shipped an update. Nothing broke visibly, but the outputs were different enough that downstream parsing started failing enough that downstream parsing started failing, and clinical tone checks that had passed reliably now flagged errors. The model they had built around no longer quite existed.

I heard this story in several different forms across our research. The details change—sometimes it is a cost wall, sometimes an enterprise client’s infrastructure demands, sometimes a provider update that silently shifts model behavior—but the structure is the same. A founder builds something carefully calibrated to a specific model, and then something external forces a reckoning with how deeply that dependency runs.

The question of when to leave a frontier model is one that most early-stage founders defer as long as possible. That deferral is usually rational. But the founders who have been through a forced migration (unplanned, under pressure) consistently say the same thing: the cost of switching is set by decisions made months before the switch becomes necessary.

Why Founders Stay Longer Than They Should

The attachment to a primary frontier model is not irrational. OpenAI, Anthropic, and Google have built APIs that are genuinely fast to integrate, well-documented, and capable of handling a remarkable range of tasks without specialisation. For a founder in Phase 1 (focused entirely on finding product-market fit before the runway expires) this is exactly what the situation demands. Pick the model that works, build around it, move fast.

The problem is that “building around it” accumulates in ways that are not always visible. Prompt engineering is the most obvious form: a team invests weeks crafting system prompts, few-shot examples, and output constraints calibrated to a specific model’s idiosyncrasies. Less obvious is the architectural debt—parsing logic, downstream workflow assumptions, evaluation baselines, and customer-facing behaviour—all of which quietly encode assumptions about how a particular model responds.

Even a 5% shift in a provider’s base model behaviour can break your entire downstream formatting. This is not hypothetical—Apple ML Research documented it formally in 2024: when pretrained base models are updated, downstream fine-tuned adapters experience what researchers called “negative flips,” instances previously handled correctly that are now handled incorrectly. Providers optimize for aggregate performance gains and give less attention to behavioural compatibility with prior versions.

Several founders we spoke with had experienced exactly this. One described it as “building on sand,” not because the model was unreliable in any obvious sense, but because the ground could shift without warning, and the system had no way to detect the shift before it reached users.

The Four Signals That It’s Time to Move

In practice, the decision to leave a frontier model (or at least to stop relying on it exclusively) is almost never philosophical. It is triggered by something concrete. Across our interviews, four distinct triggers surfaced repeatedly.

The cost wall arrives

Startup credits expire and real inference bills appear. Average monthly AI spend across scaling teams rose to over $85,000 in 2025, with high-volume workloads pushing well beyond that. When a frontier model is handling tasks that a cheaper, smaller model could manage equally well, the economics become impossible to ignore.

An enterprise client has infrastructure requirements

A client requires deployment inside their VPC. Or they mandate a specific cloud provider. Or they will only accept an open-weights model they can audit themselves. These are not edge cases—66% of IT leaders in a 2025 Broadcom survey reported being very or extremely concerned about storing data on public cloud environments.

A provider update changes behaviour without warning

No announcement, no changelog entry that maps to your specific failure. Just outputs that are subtly different in ways your test suite may not catch. The founders most exposed to this are those who have invested heavily in prompt engineering without building an evaluation baseline to detect regressions.

The task no longer needs frontier-level reasoning

This one is easy to miss. A task that genuinely required a frontier model at launch—because the product was still being defined and the edge cases were unknown—may be narrow and well-understood enough, a year later, to be handled reliably by a fine-tuned smaller model at a fraction of the cost.

The Migration Tax Nobody Talks About

When I ask founders who have been through a model migration what they wish they had done differently, the answers cluster around one theme: documentation.

Not documentation of the model itself, but documentation of their own system. Which prompts depend on which behavioural assumptions. Which downstream parsers are sensitive to output formatting. Which evaluation tests exist, and what baseline they were run against. Which enterprise client requirements are already in the pipeline, even if they haven’t been formally scoped yet.

One founder described deliberately avoiding deep integration with specific agent frameworks in the early days, not because he had a clear migration plan, but because he had seen what lock-in looked like at a previous company and wanted to preserve his options. His system is model-agnostic by design. When a better model becomes available, or when a client demands a specific one, swapping it in is a configuration change rather than an architectural surgery.

“Instead of just changing the model and hoping it works, we run our entire codebase against the new model first. We ask: are we happy with these results?”

-Founding engineer, Series A HR automation startup

That evaluation infrastructure—running a full suite against any candidate model before committing—is what separates a migration that takes a week from one that takes three months. It also changes the emotional calculus. A founder who can validate a new model against a comprehensive baseline can make the migration decision based on data. A founder who cannot is guessing, and knows it, which is why the decision gets deferred.

When Self-Hosting Is (and Isn’t) the Answer

The most common alternative founders reach for when a frontier API stops working for them is self-hosting an open-weights model. The appeal is obvious: no usage-based pricing, no behavioural updates outside your control, no data leaving your infrastructure. For regulated industries in particular, it can be the only viable path.

But self-hosting is not a guaranteed cost saver, and several founders we spoke with had learned this the hard way. One described building his initial architecture around self-hosted models, only to discover that the economics only worked at very high, very consistent traffic volumes. During periods of lower usage, he was paying for idle compute at a rate that exceeded what the API would have cost. He eventually moved to a multi-provider API setup — not because self-hosting was philosophically wrong, but because his traffic patterns didn’t justify the fixed infrastructure cost.

The calculus, per Deloitte research, tips in favour of private infrastructure when cloud-based AI costs start to exceed roughly 60 to 70 percent of the equivalent on-premises hardware cost. Below that threshold, APIs almost always win on unit economics, even accounting for the control and portability advantages of self-hosting.

Before committing to self-hosting, model your traffic at three intervals: current, 6-month projection, and 18-month projection. If the infrastructure cost is justified at your current volume, the decision is easy. If it only becomes justified at projections you are not yet confident in, the API is probably the right call for now, with a deliberate plan to revisit when the volume is real.

What the Most Resilient Founders Are Doing Differently

The founders in our research who had navigated model transitions most cleanly shared a set of practices that, looking back, seem obvious—but that most teams only adopt after a painful migration, not before one.

They treat every model as a dependency, not a platform. The same way a software team documents which library versions a system depends on and maintains tests that would catch a breaking change, these founders maintain evaluation baselines that describe how their system behaves against a known model version. When a provider ships an update, they run the baseline before it reaches production.

They experiment with alternatives before they need to switch. Several described maintaining a second or third model in low-stakes parts of their product specifically to build familiarity with its behaviour, cost profile, and failure modes. When a forced migration arrived, they were not starting from zero.

And they design their architecture to make swapping cheap. This does not require building a complex model-routing layer from day one. It requires being deliberate about where model-specific assumptions are encoded—prompt logic, output parsing, evaluation criteria — and keeping those assumptions in places that can be changed without touching the core product.

None of this is complicated. Most of it is just the discipline of treating a frontier API like the external dependency it is, rather than the fixed foundation it feels like when everything is working.

The question is not really whether you will eventually need to change your primary model. You will. The question is whether that change will be a planned evolution or an unplanned emergency. The answer to that question is mostly set by what you do now, while the pressure is off.