Internal Developer Platform Failures: 5 Reasons Your DIY Platform Is Dying
Why 80% of internal developer platforms fail — and what to do instead.
The Numbers
- 80% of internal developer platforms fail (platformengineering.org)
- 12-18 months typical Backstage implementation timeline with 3-5 FTEs for maintenance
- ~10% Backstage adoption outside Spotify, despite Spotify claiming 99% internally
The IDP Promise vs. Reality
The pitch is seductive: build an internal developer platform — a self-service layer between your infrastructure and your developers. Developers get golden paths. Platform engineers get control. Everyone ships faster. The community calls this the "Field of Dreams" fallacy — "if we build it, they will come." But mandatory adoption doesn't work.
So your team spends months evaluating tools. You settle on Backstage for the portal, Crossplane for provisioning, ArgoCD for GitOps, and custom Terraform modules to wire it together. As one r/devops user put it, adopting Backstage is like receiving "all the parts of a Chevy on your desk" rather than a finished vehicle. Twelve to eighteen months in, you've got something that works — mostly — for one team's workflow.
Then the maintenance starts. Plugin upgrades break things. New teams have different requirements. The two engineers who built the platform leave, and nobody knows how the custom auth middleware works. Catalog metadata deteriorates rapidly as personnel change. What was supposed to eliminate toil has become a second full-time job — and teams measure deploys per day while ignoring real developer friction.
This isn't a hypothetical. One organization hastily implemented an IDP without involving developers in the process — it resulted in confusion, lower productivity, and ultimately a return to legacy systems. The failure mode isn't dramatic — it's slow. The platform drifts into irrelevance while developers quietly go back to doing things the old way. As the community puts it: "platform engineering IS change management" — yet teams consistently neglect the cultural aspects.
Here are the five failure modes we see again and again.
1. The Platform Team Becomes a Bottleneck
The whole point of building a platform was to eliminate the "file a ticket and wait" workflow. But here's what actually happens: instead of developers waiting on DevOps for a staging environment, they now wait on the platform team to add a new template, fix a broken golden path, or grant access to a new cluster. Every request gets labeled "urgent" to bypass the platform queue.
You didn't eliminate the bottleneck. You renamed it.
This happens because most teams treat the platform as infrastructure, not a product. The team builds what they need right now, declares victory, and moves on. But a platform is a product with users — and those users have requests, bugs, and edge cases. Developers who lose the freedom to choose their own tools will find ways to circumvent the platform. When the platform team consists of three infrastructure engineers who never built software products before, the backlog grows faster than they can work through it.
One pattern we see repeatedly: a platform team builds a Backstage instance with a service catalog and a few custom plugins. As one developer on r/devops noted: "The idea of Backstage is super cool, but the fact that I need to write a lot of React code instead of GitHub workflows and Terraform files made me leave the project." When the mobile team needs iOS build support, or the data team needs Spark job templates, the requests pile up. The platform team becomes the gatekeeper for every new workflow — the exact dynamic the IDP was supposed to prevent.
The fix isn't "hire more platform engineers." Forcing adoption via mandates just creates shadow IT. Successful teams learned to make the right way the easiest way — compliance through convenience, not coercion. The fix is choosing a platform that ships with extensibility built in, where adding a new template or workflow doesn't require a platform engineer to write a custom plugin.
2. Maintenance Eats All Your Capacity
This is the silent killer of DIY platforms. Your platform works on day one. By month six, you're spending most of your time keeping it alive. At scale, as one engineer warned, "Backstage is not an 'other duties as assigned' sort of tool. It will require dedicated resources" — typically 3 to 5 full-time engineers just for ongoing maintenance.
Platform Team Time Allocation — DIY IDP (typical)
- Maintaining CI/CD glue — 35%
- Fixing broken env configs — 25%
- Upgrading Kubernetes / Helm — 15%
- Answering developer tickets — 15%
- Actually building new features — 10%
The numbers aren't exaggerated. When you build a platform from open-source components — Backstage, Crossplane, ArgoCD, custom Terraform modules, homegrown CLI tools — each component has its own release cycle, breaking changes, and security patches. Backstage typically requires 12 to 18 months for full implementation, ships weekly releases, and its plugin ecosystem frequently introduces compatibility issues.
The commercial alternatives aren't immune either. One team's experience with Port.io: "POC in several days and we were super excited" — but it "turned out to be insanely expensive" at scale. Another team noted that Cortex "pricing was expensive... We left Cortex for OpsLevel for half the price." The community consensus is clear: "You can't really buy an IDP, you can only build one. For portals, there are off-the-shelf offerings." Meanwhile, Backstage adoption outside Spotify averages around 10% of engineers, despite heavy investment — because the team burned out on maintenance before delivering features developers actually wanted.
The math is brutal. If you have a three-person platform team, and two of them are spending their time on Kubernetes upgrades, Helm chart fixes, and CI/CD pipeline debugging, you effectively have one-third of an engineer building net-new platform capabilities. At that rate, you'll never outpace the feature requests coming in from developer teams.
This is the core build-vs-buy calculation that most teams get wrong. They estimate the build cost but forget the carry cost — the ongoing maintenance that compounds every quarter. If deployment takes 5 minutes but debugging takes 4 hours due to opaque abstractions, the platform has failed. You're not saving time — you're redistributing it to places that are harder to measure.
3. It Only Works for One Team's Workflow
Here's a pattern that plays out at nearly every company that builds a DIY platform: the platform team builds golden paths based on their own stack. If the team runs Next.js on Kubernetes with Postgres, the templates work beautifully for that exact combination. Then the payments team shows up with Spring Boot, RDS, and a completely different deployment model, and nothing fits.
This is one of the most common mistakes: over-engineering for theoretical future needs and attempting an all-in-one platform without validating. Teams focus only on Day 1 — app creation and scaffolding — while Day 2 through Day 50 operations have far greater impact on developer productivity. The industry mantra of "start with ONE critical bottleneck" gets ignored in favor of boiling the ocean.
The fundamental tension is between opinionated defaults and flexibility — what the community calls the "Golden Cage Syndrome." You need opinionated defaults to move fast — but you also need escape hatches. When abstraction without escape hatches means something breaks and developers can't see what's happening underneath, trust erodes immediately. Most DIY platforms nail the defaults and completely miss the escape hatches.
When a team hits an edge case the platform can't handle, they have two options: (1) wait months for the platform team to add support, or (2) work around the platform entirely. Most choose option two — creating shadow IT. Developers who lose autonomy will find ways to circumvent the platform. And once a team goes around the platform, they never come back.
Successful teams learned to start with a Minimum Viable Platform — demonstrate value within weeks, not months. Allow inner sourcing so teams can contribute back. The platform should solve the most painful bottleneck first, not try to be everything. When it tries to be everything, it solves 80% of one team's problems and 20% of everyone else's — which means everyone else ignores it.
4. No One Documents It (Bus Factor of 1)
Every DIY platform starts as a hero project. One or two senior engineers who deeply understand the infrastructure decide to "build a proper platform." They work nights and weekends, stitching together custom controllers, webhook handlers, and deployment pipelines. They know every quirk of the system because they built it.
Then one of them takes a new job. The other goes on parental leave. Now you have a critical piece of infrastructure — the system that manages how code gets to production — and nobody understands how it works.
This is the bus factor problem, and it's endemic to DIY platforms. Catalog metadata in tools like Backstage deteriorates rapidly as personnel change — service ownership records become stale, API docs go out of date, and the platform slowly fills with ghost entries that nobody maintains. The platform is treated as infrastructure — something that should "just work" — and therefore doesn't get the same engineering rigor as application code.
The tribal knowledge problem compounds with time. Custom Helm charts reference internal conventions nobody wrote down. The CI pipeline has a conditional branch that handles a specific edge case from two years ago — but the comment just says // don't remove this. The authentication middleware uses an undocumented API from your identity provider that worked in v2.3 but breaks in v3.0.
When something breaks at 2 AM — and it will — the on-call engineer is reading through uncommented Go code trying to figure out why the custom admission webhook is rejecting deployments. This is not an efficient use of engineering talent. It's an organizational risk.
5. You Can't Keep Up with the Ecosystem
The cloud-native ecosystem moves fast. Kubernetes ships three major releases per year. Helm, Terraform, and ArgoCD all have their own release cycles. New patterns emerge — GitOps, policy-as-code, eBPF networking, AI-powered infrastructure. Your DIY platform needs to keep pace with all of it.
But it won't. Because your platform team is busy fighting the fires from failures #1 through #4. They don't have time to evaluate whether the new Kubernetes gateway API should replace your custom ingress controller, or whether Karpenter would save 40% on your node costs compared to the Cluster Autoscaler you configured eighteen months ago.
Every time a company adds new applications, services, and clusters, the IDP needs changes. New deployment targets, new cloud regions, new compliance requirements — each one requires platform work. The Stack Overflow 2023 survey found that 47% of developers experience anxiety about job security when new technologies are introduced. In a fast-growing company, the platform is always six months behind what teams actually need — and the people using it are already stressed about the constant churn.
The ecosystem problem also hits developer experience. Your developers see that competitor companies have ephemeral preview environments for every PR, AI-assisted debugging, and automated cost optimization. Your DIY platform still requires developers to SSH into a shared staging server and tail logs manually. The gap between what's possible and what your platform provides widens every quarter.
This is the fundamental disadvantage of building in-house: you're competing with companies whose entire business is building developer platforms. They have dedicated teams for each capability — environment management, cost optimization, observability, security. Your three-person platform team can't match that investment, no matter how talented they are.
The Alternative: Buy the Platform, Own the Infrastructure
The build-vs-buy debate for developer platforms has shifted. In 2022, building your own IDP was defensible — the commercial options were immature and expensive. In 2026, after watching 80% of DIY platforms fail according to platformengineering.org, the industry has learned a hard lesson: voluntary adoption over mandates wins every time. Make the platform so good that developers choose it. That's where your engineering team should be spending its innovation budget.
The modern approach is BYOC — Bring Your Own Cloud. You keep full ownership of your infrastructure: your AWS account, your Kubernetes clusters, your data. The platform vendor provides the orchestration layer — environment provisioning, templates, RBAC, cost controls, and developer experience — without ever touching your production data.
This model solves each of the five failures directly:
- Bottleneck eliminated: Developers get self-service from day one, with a template catalog that the platform team curates but doesn't gatekeep
- Maintenance offloaded: The vendor handles upgrades, security patches, and ecosystem compatibility. Your team focuses on defining golden paths, not fixing infrastructure
- Multi-workflow by default: Support for Docker Compose, Helm, Terraform, and Kubernetes Manifests in the same environment — not just the stack your platform team happens to know
- Documentation built in: A commercial platform has documentation, support, and a community. No tribal knowledge, no bus factor of 1
- Ecosystem keeps pace: The vendor's engineering team tracks Kubernetes releases, adds new integrations, and ships features continuously
Bunnyshell is built on this model. You connect your existing EKS, GKE, or AKS clusters. Define your environments in a single bunnyshell.yaml file. Publish templates to a service catalog. Set RBAC policies and cost limits. Developers get self-service environments — full-stack, production-like, ephemeral — without filing a single ticket.
Most teams are productive within days, not the 12-18 months a Backstage implementation typically requires. No need for 3-5 dedicated FTEs just to keep the platform running. And because Bunnyshell handles the platform layer, your engineers can focus on what they were hired to do: building your product.
Ship faster starting today.
14-day full-feature trial. No credit card required. Pay-as-you-go from $0.007/min per environment.
Related Comparisons
Platform Engineering
Internal developer platforms and platform engineering tools. Bunnyshell provides environment self-service without building a custom IDP.
Frequently Asked Questions
What percentage of internal developer platforms fail?
According to platformengineering.org, after auditing dozens of enterprise initiatives, 80% of internal developer platforms fail. Roughly 70% of platform engineering initiatives struggle with adoption. The primary causes are the Field of Dreams fallacy (assuming mandatory adoption works), measuring the wrong metrics, maintenance burden, and treating the platform as infrastructure rather than a product. The community consensus is clear: platform engineering IS change management, yet teams consistently neglect the cultural aspects.
Why do DIY internal developer platforms fail?
The five most common failure modes are: (1) the platform team becomes a bottleneck instead of an enabler, (2) maintenance consumes all capacity leaving no time for new features, (3) the platform only works for one team workflow and cannot scale to others, (4) tribal knowledge and zero documentation create a bus factor of 1, and (5) the platform cannot keep pace with the rapidly evolving cloud-native ecosystem.
How long does it take to build an internal developer platform?
Backstage alone typically requires 12-18 months for full implementation, with ongoing maintenance requiring 3-5 dedicated full-time engineers. But the build phase is only the beginning. Many teams underestimate the carry cost and end up spending 60-80% of their platform engineering capacity just keeping the lights on.
Is Backstage enough to build an internal developer platform?
Backstage is a developer portal framework, not a complete IDP. While Spotify claims 99% internal adoption, Backstage adoption outside Spotify averages around 10% of engineers despite heavy investment. Catalog metadata deteriorates rapidly as personnel change, and the community compares adopting Backstage to receiving all the parts of a Chevy on your desk rather than a finished vehicle.
What is the BYOC model for developer platforms?
BYOC (Bring Your Own Cloud) means you keep full ownership of your infrastructure — your cloud accounts, Kubernetes clusters, and data — while the platform vendor provides the orchestration layer. This gives you the benefits of a commercial IDP (self-service, templates, RBAC, cost controls) without vendor lock-in or data leaving your infrastructure. Bunnyshell uses this model: you connect your existing EKS, GKE, or AKS clusters, and Bunnyshell provisions environments as namespaces within them.
How is buying a platform different from building one?
When you build, you own the entire stack but also the maintenance, security patches, upgrades, documentation, and on-call burden. When you buy with a BYOC model, the vendor handles the platform layer while you retain infrastructure ownership. The key difference is time-to-value (days vs. 12-18 months) and no need for 3-5 FTEs dedicated solely to platform maintenance.