The Great Re-Platforming: Why We Moved Off Kubernetes

For most of the last decade, Kubernetes was treated like the final answer to modern infrastructure. If you wanted to look serious, you adopted it. If you wanted to hire top engineers, you put it in the stack. If you wanted to signal that your company could scale, you showed off your cluster diagrams.

I understand why. Kubernetes solved a real problem at a real moment in time. It gave fast-growing teams a way to standardize deployments, isolate workloads, and build a clean abstraction layer over messy infrastructure. For companies operating at massive scale with highly fragmented engineering teams, it was, and still is, an extraordinary piece of engineering.

But somewhere along the way, the industry made a very expensive mistake. We stopped asking whether we needed Kubernetes and started assuming that any serious company should have it by default.

That assumption is breaking.

Over the past few years, I’ve watched more technical leaders quietly ask the same question: what if the orchestrator has become the bottleneck? What if the platform built to simplify operations is now adding more operational complexity than it removes?

That is the heart of the great re-platforming. Not a rejection of containers. Not nostalgia for “old infrastructure.” Just a growing recognition that many teams climbed too far up the abstraction ladder and are now paying for it every single day.

Kubernetes solved a scaling problem, then became a default religion

The strongest tools in infrastructure often create the strongest cargo cults. Kubernetes is a perfect example.

At very large scale, it makes sense. If you have dozens of services, multiple environments, complex scheduling constraints, frequent deploys, strict multi-tenancy requirements, and a platform team capable of operating the control plane properly, Kubernetes gives you leverage. It turns infrastructure into a programmable substrate. That is powerful.

But most companies don’t actually live in that world.

Most companies have a much simpler reality:

A handful of critical services
Predictable traffic patterns
A small engineering team
Limited platform expertise
A business that cares more about reliability and shipping speed than orchestration elegance

In that environment, Kubernetes often creates an inversion. The business thinks it bought scalability, but what it really bought was a second product to maintain: the platform itself.

Suddenly you are not just running your application. You are running networking overlays, ingress controllers, Helm charts, admission policies, node groups, autoscalers, service meshes, CRDs, observability pipelines, and a permission model that only three people fully understand.

That is not simplification. That is an organizational tax.

The hidden cost is not compute, it’s cognitive overhead

When people debate Kubernetes, they usually talk about infrastructure cost. That’s the wrong argument. The real cost is cognitive overhead.

Every layer of abstraction creates a translation problem. A developer sees a deployment fail. The root cause might be a container image issue, a secret mismatch, a storage class, a policy constraint, a DNS quirk, an ingress timeout, or an unhealthy node pool. Nothing is impossible, which means everything must be considered.

That ambiguity slows teams down in ways dashboards rarely capture.

Here’s what the tax looks like in practice:

Deploy pipelines become fragile because they depend on multiple controllers and environment-specific manifests
Incidents take longer to triage because the failure domain is bigger than the application itself
Onboarding slows down because engineers must learn the platform before they can ship useful work
Security review gets harder because the actual runtime behavior is spread across many moving parts
Ownership becomes blurred because no one knows whether the problem belongs to the app team or the platform team

Every one of those issues is survivable. Together, they create drag. And drag is deadly in infrastructure because it compounds quietly. Teams don’t notice it all at once. They notice it as friction: one more failed deploy, one more confusing outage, one more week lost to platform debugging instead of product progress.

Why boring VMs are winning again

This is why I’m seeing a renewed appreciation for boring virtual machines.

A VM is not fashionable. It doesn’t give conference talks. It doesn’t come with an ecosystem of YAML-powered self-importance. But it has one enormous advantage: it is legible.

When something breaks on a VM, the path to understanding is usually shorter. The service is down because the process died, the disk filled up, the package update broke something, the port is blocked, or the machine ran out of resources. None of that is pleasant, but it is concrete. Engineers can reason about it quickly.

That matters more than most people admit.

Legibility is one of the most undervalued properties in infrastructure design. The easier a system is to understand under pressure, the more reliable it tends to be in the real world. Not because it is theoretically superior, but because humans can actually operate it.

A small fleet of well-managed VMs with good automation, clear deploy scripts, strong monitoring, and disciplined rollback paths is often more resilient than a “modern” platform whose complexity exceeds the team’s operational maturity.

That is the core lesson. Simpler infrastructure is not anti-scale. It is often pro-reliability.

The question is not “Is Kubernetes good?”

The better question is: where does it pay for itself?

I use a simple framework to evaluate this.

Kubernetes is usually justified when at least four things are true:

You operate many independently deployed services with different scaling profiles
You have enough deployment frequency that automation sophistication materially reduces risk
You have a team with genuine platform engineering capability, not just interest
Your business benefits from scheduling flexibility, multi-environment consistency, or tenancy controls that are hard to replicate more simply

If those conditions are not present, the platform is often aspirational, not practical.

That distinction matters. Infrastructure should be shaped by current operating reality, not by the architecture diagram you hope to deserve in three years.

Too many teams build for hypothetical future scale and, in the process, make present-day execution dramatically worse.

The industry is maturing past performative complexity

I think this shift is healthy.

For a while, the market rewarded visible complexity because it looked like sophistication. A custom platform implied seriousness. A dense stack implied technical depth. But eventually reality catches up. CFOs start asking why productivity is flat. CTOs notice that every deploy feels risky. CEOs realize they are funding an infrastructure identity project, not a business advantage.

That is when the re-platforming begins.

Not as a dramatic public confession, usually. More often it happens quietly:

A team removes the service mesh
Then collapses non-critical workloads back onto VMs
Then simplifies CI/CD around direct artifact deploys
Then discovers that shipping gets faster, incidents get smaller, and the team suddenly has more time for work that customers actually notice

This is not moving backward. It is moving toward fit.

What I’d do if I were designing the stack today

If I were building a company from scratch today, I would start much lower on the abstraction ladder than the average startup did in 2021.

I’d use containers, but sparingly. I’d favor a small number of clearly owned services. I’d deploy onto VMs with reproducible automation. I’d invest early in observability, backups, secrets hygiene, rollback, and incident discipline. I’d make every system understandable by a strong engineer at 3am without a platform specialist on standby.

Then, and only then, I’d add orchestration complexity when the pain became real and recurring.

That order matters. The biggest infrastructure mistakes usually come from importing complexity before earning the operational need for it.

Best practice is contextual, not universal

There is a broader lesson here beyond Kubernetes.

In technology, “best practice” is often just the most socially validated answer, not the most economically rational one. The most copied stack is not automatically the most effective stack. It is simply the one that won the narrative.

Strong technical leaders have to resist narrative gravity.

Your job is not to look modern. Your job is to build systems that are reliable, understandable, secure, and proportionate to the business you are actually running.

Sometimes that means adopting the new abstraction early. Sometimes it means ripping it out when the abstraction starts demanding more attention than the problem it was supposed to solve.

That is not failure. That is engineering judgment.

The real competitive advantage is operational clarity

In the end, infrastructure should create leverage, not theater.

The teams that win over the next decade will not be the ones with the most complex platforms. They will be the ones with the clearest operational models. The ones that can explain how their systems work, recover quickly when they fail, and evolve the stack without worshipping it.

Kubernetes still has its place. But the great re-platforming is a reminder that no tool stays a universal answer for long.

Sometimes the mature move is not adding another layer. It is having the confidence to remove one.

Follow the journey

Subscribe to Lynk for daily insights on AI strategy, cybersecurity, and building in the age of AI.

Subscribe →