Feature Toggles are one of the most interesting developments for me in the last few years. It started as a simple experiment a few months back, just one step towards continuous delivery. Now, I wouldn’t know how to survive without this mechanism.
What are feature toggles? Basically it’s just a mechanism that let’s you toggle on or off features through configuration. In that sense feature toggles are not new. They’ve been around as long as there is software. People just started to call them feature toggles and are using them now as an alternative to feature branches. Sometimes they are called feature switches, feature flips, feature bits or feature tags, but I believe that ever since Martin Fowler wrote about them in 2010, people stuck with feature toggles.
This is what a feature toggle looks like in our PHP code:
Is it that simple? Yes it’s that simple. Just an if-statement. The isActivated() function in the collector class will check the configuration and decide whether the feature is toggled on or off. If it’s on, you can do all the cool stuff for the new feature. If it’s off, just do what the software was already doing.
So, when do you use the feature toggles? Only for features? Or also for other code changes like bug fixes or refactorings? This is something that I’m not yet sure of. In theory, any code change can be put behind a feature toggle, however we’re still experimenting with this. Sometimes a bug fix is just one line of code and so obvious that the feature toggle only results in a lot of overhead. We try to use the feature toggles for most refactorings, but if the refactoring touches a lot of files, this is sometimes awkward. I think in practice it’s a matter of common sense.
Now that you know what feature toggles are, why would you use them. According to me there are a few advantages:
- It helps decouple deployment and release
- It avoids branching/merging
- It helps you achieve zero-downtime releases
- It enables you to easily roll back a feature in case of issues, without having to do a new deployment
- It enables phased rollout, thereby minimizing the impact if bugs are found
- It allows you to put unfinished code in Production, so unfinished stories don’t mean you can’t deploy.
Let’s start with the first one. Why would you want to decouple deployment and release? Mainly because this brings you closer to continuous delivery. If you want to continuously deliver software to your users, you don’t want to wait until a moment when all features your team is working on are coincidentally done at the same time and development on new stories hasn’t started yet. Raet has this policy where all products need to be deployed once a month, roughly at the same time. This is partly to avoid the risk of disruptions during the rest of the month, partly because it is very clear for our users when new stuff is delivered, but also because it helps the development teams to find a rhythm, the ‘heart beat’. Everyone knows when the release is and how much time there is to deliver what you promised to deliver. However, this has nothing to do with the deployment. We should be able to deploy whenever a new feature is done, and then separately toggle on the new features when the time is right. Bug fixes or refactorings can be toggled on immediately. Other features can be toggled on when this is communicated to the users.
Because of the feature toggles, code changes can now be done in the main branch, so you don’t need to create separate branches and you don’t need to merge everything back when you’re done. You avoid ‘merge hell’ and all the new code is immediately under continuous integration, so all the unit tests and functional tests are automatically run in the build (which is not always the case in branches).
Releasing a feature is as simple as toggling it on in the configuration, so you get zero-downtime releases.
Rolling back a feature is also zero-downtime, so if in your Production environment a bug is discovered, you can decide to toggle off the feature. The issue is instantly solved, unless the bug caused some data corruption or so, but even in that case, toggling off the feature will probably prevent worse. Of course you still need to fix the bug, but at least you don’t need to panic and hurry to create a fix and quickly deploy it, only to find out you introduced more bugs because you were too rash.
A big advantage of feature toggles which I haven’t discussed yet, is the option of phased rollout. In the configuration you can specify whether a feature is toggled on or off, but you can also add some conditions. For example, the feature can be activated only for people in a certain geographical location, or only users in a specific user group. This also makes it easy to do closed user group testing. When you hear companies like Facebook or Google mention that they will roll out a feature gradually to an increasing number of users, they are referring to their feature toggling mechanism. Facebook uses a tool called ‘Gatekeeper‘, which they built specifically for this purpose. It has the option to filter on browser, geographical location, number of friends, age, ip-address and many other parameters.
Currently we only filter on customer name, that is, on the name of the company of the user,and just because of that, our Product Owner is now our biggest supporter of feature toggles, because it allows him to closely work together with customers and involve them in testing new features. Product owner happy, customer happy, we happy. I guess this is what Raet means with ‘customer intimacy’.
The last advantage I’d like to mention is the fact that you can put unfinished code in Production. Now, in itself this is obviously a bad thing, and many people will not see this as an advantage, … but I do. At Raet we are supposed to deliver software once a month, so every two or three sprints (we use two week sprints) we have a short time window in which we can create a ‘release-build’ that will bring all the new functionality to our users. In the past it has happened that we underestimated a story at the end of the sprint, so the code for this story was only halfway done. It was too much work to remove it again, but it wasn’t finished yet, so ultimately we decided it was too much risk to continue and therefore we didn’t deploy or release any feature at all. That will not happen again! A few months ago we encountered this situation again, but now we were able to continue with the deployment and the release only the finished stories.
Ofcourse there is also a downside to feature toggles, although according to me most of the time it’s just common sense and knowing what situations to avoid.
- More testing is required
- It introduces a new kind of technical debt
- The number of feature toggles will grow faster than you think
The first one is a real disadvantage. Testing may take (much) more time, depending on the number of toggles in the parts of the software that you’re testing. For each feature toggle you now need to test both the situation where it is toggled on and when it is toggled off. Worse, if there are multiple toggles, you need to test all combinations, meaning that the time to test everything grows exponentially. However, in practice you know which feature toggles influence each other and which combinations need extra testing. A rule of thumb is that you only need to test the situation where all feature toggles are toggled on, and the situation where those feature toggles are toggled on that you want to release after your next deployment. My estimate is that it costs us 5-10% more time.
Feature toggles are a kind of technical debt. According to me this is not so much a disadvantage, but more of a warning. As soon as your feature is in Production and it has proven itself, you should remove the if-statements from your code and from the configuration files. If you don’t clean up as soon as possible, the list of feature toggles will keep growing and your code will become a mess. Also, development will become more complex as older feature toggles cannot be toggled off without causing a few others to fail because of dependencies. Even with a central administration interface, you will loose the overview, and may even end up releasing features that are not ready or forget to release features that were promised to your users long ago.
So, good feature toggle life cycle management is crucial. You need a clear process that tells you when to toggle features on and when to remove them from the code. Some people have mentioned the amount of administration as a disadvantage. I disagree, because the time to create a feature toggle, to move it through your staging environments and to remove if afterwards is maybe 10-15 minutes per feature. Branching and merging may be quicker, but it may also take a lot more time if you find yourself in merge-hell. On average, the ‘extra’ time will be the same as with feature branches, but you gain a lot of flexibility.
Now, let’s have a look at our implementation. We decided not to work with existing frameworks. Partly because there were no broadly supported frameworks available for PHP (apart from the feature toggles in Symfony, which we didn’t want to start using just for the toggles), and partly because it wasn’t a lot of work to create something ourselves that was completely tailored to our needs. It took me two days to create a first proof of concept, and maybe 1 or 2 days since then to further improve on that.
Usually feature toggles are configured in local configuration files. We decided to put the configuration in the database, because that way it was a lot easier to create a proof of concept. Also, it was easier to distribute the configuration to our Development, Test, Acceptance and Production environments. The biggest disadvantage from this choice is that if we want to refactor the database logic, we can’t use feature toggles.
Although not required, we found it a huge benefit to have a central administration page in which we could see the state of all feature toggles across all environments. As you can see in the screenshot below, we grouped the features by deployment. This improves the overview and it makes it immediately clear which feature toggles are longer in Production and will need to be removed soon. Once a feature toggle is removed from the code, we have to wait until that code is in Production before we can remove it also from the database. The last field is used to remember when this is. So, after removing the toggle from the code, we set the date of the next deployment in this field. After doing a deployment, we always check this administration interface to check which features need to be toggled on, and which toggles can be removed.
We also found that often not everyone knew exactly when a feature could be moved to the next environment, or why a toggle was temporarily toggled off. Therefore we added a ‘remarks’ field that gives extra information on the status and tells us what needs to be done before moving on with that feature.
In conclusion, we are very happy with this addition to our release process. It gives us lots of extra flexibility with almost no extra effort. I can imagine that this does not work for everyone, but I can recommend it nonetheless.