Feature Flags Are a Code Smell (Eventually)

Feature Flags Taste Good, But...

Every codebase that has been around for a while has feature flags. Open yours up and count them. There are probably more than you think, and a few you would struggle to explain. They show up in every deployment talk and every DevOps playbook, and for good reason. They let you ship code without releasing it, try things in production, and kill a broken feature in seconds instead of scrambling for a hotfix.

But here is what those talks tend to leave out. Spend a day in any mature codebase and you will find flags that are years old. Flags nobody remembers creating. Flags that are on for some users and off for others because of a decision that lives in one engineer's memory or a Slack thread that scrolled away two years ago.

That is the tension I want to sit with. Feature flags are recommended practice, and flag debt is everywhere. Teams reach for them to make deployments safer and slowly end up with code that is harder to read, harder to test, and harder to change than what they started with.

The flags are not the villain. The mistake is treating them as temporary when, more often than not, they turn out to be permanent.

“Feature flags are scaffolding. Good scaffolding gets removed when the building is finished. Bad scaffolding stays up for years, blocking windows and confusing architects.”

The Appeal: Why They Seem Like the Right Choice

It is worth being honest about why teams reach for flags, because the reasons are good ones. Nobody adopts them by accident.

The big one is that flags let you separate deploying code from releasing a feature. You can push something to production and keep it dark until you are ready. That sounds minor until you have worked without it. It means you can merge often, deploy continuously, and validate in the real environment without booking a launch window with half the company. And when something is obviously broken thirty seconds after it goes live, you flip a switch instead of cutting a hotfix.

Flags also give you a dial instead of an on/off button. You can send a change to one percent of traffic, watch what happens, and back out if the graphs look wrong. You can run a real experiment, let the data pick the winner, and stop arguing about it in design reviews. And you can let three teams build three features at once without anyone stepping on anyone else's release.

In the first week, all of this feels like a cheat code. You ship fast, you roll back instantly, you measure everything. It is genuinely hard to argue with.

“In week one, feature flags feel like a superpower: ship fast, roll back instantly, measure in production.”

The Debt Accumulation: Hidden Complexity

The bill for all that convenience does not arrive in week one. It arrives slowly, over months and years, as the flags pile up.

Start with the arithmetic, because it is worse than it looks. One flag gives you two code paths. Two flags give you four. Three give you eight. It doubles every time. By the time you have ten flags in play, you are looking at more than a thousand possible ways your code can run. You are not testing all of those. Most exist only on paper, but some are quietly live in production because a flag got switched on in one region and left off in another.

Testing is where you feel it first. The combinations outrun your test suite almost immediately, so you do the sensible thing and cover the main paths plus whatever edge cases you can imagine. The ones that bite you are the ones nobody imagined. The flag that was supposed to be off for new users but was on for the legacy cohort. The rollout that was supposed to finish months ago, except the cleanup ticket never got picked up.

Then there is the mental tax. Reading a function stops being about what the function does. Now it is about what the function does under this flag, and that one, and the two of them together. "Is this even live for this user?" turns into a small investigation through a configuration system that may or may not match what is actually deployed.

And refactoring quietly grinds to a halt. The old path is still there. Someone might still be on it. You cannot delete it until you are certain the flag is off everywhere, and you are rarely certain, so the old code stays. Every cleanup now means touching two versions of everything. Over time the codebase gets heavier, not lighter.

None of this is a discipline problem. It is just how flags behave. They make one problem visible for a moment and tuck the resulting complexity out of sight for the next two years.

“Every feature flag is a branch in your code tree. Ten flags aren't ten branches. They're more than a thousand possible paths you will never fully test.”

The Hidden Costs: Why Code Smell Means Real Debt

When I call feature flags a code smell, I do not mean it as an aesthetic complaint. I mean they carry real costs, and those costs compound.

The first thing to go is velocity, and it goes in a sneaky way. Adding a flag is always easier than removing one, so teams drift toward adding. A change that could have shipped and been watched gets wrapped in a flag instead, and the flag stays on. Everyone agrees it is temporary. It has been temporary for two years.

Onboarding slows down too. A new engineer has to learn which path is the old one, which is the current one, and who actually sees what. That knowledge is scattered across config dashboards, half a comment in the code, and the memory of whoever set the flag up. It is almost never written down in one place.

Refactoring carries more risk than it should. Want to speed up a hot loop? Now you are fixing it in two code paths, or you fix the one you trust and accept that some users get the improvement and some do not. That is not consistency. That is a fresh set of edge cases.

It even shows up in hiring. Strong engineers do not enjoy spending a third of their day reasoning about branches that may or may not be active. They want a system they can change with confidence, and a codebase buried in flags is the opposite of that.

There is a runtime cost as well. Every check, every lookup against a flag service, every decision about which way to go has a price. It is tiny per request and very real in aggregate. Code wrapped in flags runs slower than code that is not.

And then there is the audit problem. Which flags touch customer data? Which ones are still in use versus quietly dead? Which configs changed last week, and why? Those should be easy questions. In most places, they are not.

“A codebase with unchecked flag debt is harder to change than one without flags. You've optimized for deployment safety and optimized against velocity.”

When Flags Actually Make Sense

None of this adds up to "never use flags." They solve real problems, and there are times I would reach for one without hesitation. The trouble starts when a flag is allowed to live forever.

The cleanest case is a feature you have not launched yet. Build it on the main branch, keep it behind a flag, flip it on when it is ready, and then delete the flag and the dead path. Start to finish, that is a few weeks, maybe a couple of months.

Flags also earn their place on the risky paths: payments, authentication, the core of the business. Being able to try the new version on a small slice of traffic before everyone touches it is worth a lot. The catch is that it only pays off if you actually retire the flag once the new version proves itself. Run both paths forever and you have just bought permanent complexity.

Experiments are another honest use. You pit one variant against another, let the data settle it, commit to the winner, and remove the flag. That is the rare situation where the information you get back is clearly worth more than the flag costs you.

Migrations fit the pattern too. You stand up the new system next to the old one, move traffic over gradually, and once you trust it, you tear out the old system and the flags that guarded it. There is a finish line.

Notice what all of these have in common. Each one has an ending built in. A flag is fine when you can name the thing that will let you delete it: a date, a metric, a milestone. A flag with no exit plan is not a tool. It is technical debt wearing a nicer name.

“Feature flags work when you treat them like debt: acknowledge the cost, commit to paying it back, and set a payoff date.”

Fixing the Problem: Flag Hygiene Strategies

Teams that keep flag debt under control do not manage it on willpower. They build the cleanup into the process so it happens whether or not anyone is feeling disciplined that quarter.

The most effective habit I have seen is giving every flag an expiry from the moment it is born. Not "we will get to it," but an actual date on the calendar. When that date arrives, the flag either gets removed or someone has to stand up and re-justify it with a new date attached. That re-justification is the whole point. It forces a real conversation, and that conversation tends to surface flags that should have died months ago.

It helps enormously to make the cleanup itself cheap. When a flag passes its expiry with nobody defending it, the system should start making noise: comments on any PR that touches the flag, a nudge in your alerting, ideally tooling that can strip the dead branch and collapse the conditional for you. The gap between "removing this is a chore" and "removing this is one click" is the same gap between flags that get cleaned up and flags that do not.

A budget works wonders too. Decide how many active flags your team is willing to carry, pick an actual number, and hold to it. Once you are at the limit, adding a new flag means retiring an old one. The question quietly shifts from "should we add a flag?" to "which one are we willing to give up for it?" and that changes how people behave.

Underneath all of it, keep an inventory. A plain, boring, automatic report of every live flag: who owns it, when it was created, when it is meant to go, how many users it touches. Nothing fancy. Just visible. Visibility is what creates accountability.

But the thing that matters most is not any single process. It is which way the default points. In a healthy codebase, writing the clean version is the path of least resistance and reaching for a flag is the thing you have to argue for. Most teams have that backwards. They add the flag on instinct and then need a reason to take it out. Flip the default, and most of the debt never gets created in the first place.

“Make clean code the default. Adding a flag should take more justification than not adding one.”

The Right Mental Model

If there is one shift to take away, it is this: stop thinking of feature flags as free, and start thinking of them as debt.

They are debt from the moment you write them. The cost is small at first, when the flag is new and the branch is simple. It grows as conditions accrete, as flags start interacting with each other, as the people who knew why it existed move on.

What makes flag debt sneaky is that it is invisible. A messy module looks messy. Dead code is obvious once you go looking. But a flag that has been quietly on for two years and now governs every user looks exactly like a flag you added yesterday. The complexity is real and almost impossible to see.

Good teams do not respond to this by swearing off flags. Flags are useful and worth keeping in the toolbox. What they do is treat each one as a deliberate choice rather than a reflex. They decide on the retirement date when they decide to create it. They automate the removal. They track flag density the same way they track any other form of debt, because that is what it is.

The health of a codebase was never about the raw number of flags. It is about the ratio: how many live flags you are carrying relative to the size of the system and the number of decisions they force. A big codebase with five flags is in better shape than a small one drowning in fifty.

The teams that scale well are not the ones that somehow avoid complexity. They are the ones that notice when they have taken some on and pay it down on purpose. Feature flags are no different from any other debt. Borrow when it is worth it, and keep a plan for paying it back.

“Feature flags are scaffolding. The difference between scaffolding and a permanent structure is the commitment to take it down.”

Found this useful?

Share it with someone who might be wrestling with the same problem.

ShareLinkedIn

Thanks for reading.

If this resonated and you're hiring for Senior/Staff Frontend roles, I'd love to chat.

Get in touch →← Back to all writing