Reducing risk of release day

Large changes do not have to fill you with fear

Saturday March 2nd, 2019

Ideally, changes of behaviour are released incrementally to base your next steps on feedback. But sometimes you think this just isn't possible. Maybe it had been decided that large public behaviour changes must all go live at once; or sometimes planned changes are so large, you might initially think that there is no alternative other than for it all to go live at once, a few days, weeks, months [or even longer!] down the road. Ironically, it is exactly for the sorts of changes where you might have difficulty seeing how it could be split up that are the most important to split up: they are complex and so high risk.

Three common classes of such changes are:

a user-facing feature that is seemingly small, but actually needs large scale internal code changes;
a larger user-facing feature-set that needs lots of UI and backend changes [often released at once for marketing reasons];
an internal property of an application that is seemingly all-or-nothing, such as bumping the version of a framework through a breaking change that requires large scale changes in the codebase.

You can automated test, QA, or load test these large changes all you like, but these checks are rarely a replacement for feedback from production: in terms of bugs but especially for performance. The performance characteristics of large swathes of code that has not run in production by large numbers of users can rarely be predicted.

However, it is often still possible to split up the changes to get production feedback on a lot of code before release day. As is typical for many [all?] changes, each change should be:

small: ~1 day's work;
safe: reviewed/tested/QA-ed as you deem appropriate;
released: so the code actually runs in production, and its result used.

To split the work, on what seem to be "all at once" problems, into such pieces, there are two classes of techniques, detailed below.

Long-running feature branch [but keeping it as small as possible]

A long-running feature branch is often the traditional way of doing large changes, periodically merging in any changes from the master branch, and releasing by merging to master at the end. This is by far the worst technique. Not only are you essentially maintaining two versions of the application [similar to The Big Rewrite], but come release day, it will all hit production at once!

However, a variation of this can often have great properties that reduce risk on release day. Instead of just periodically taking changes from master into the feature branch, you also do it other way around. In more detail...

As you're working on a feature branch, always think about what could be released [and used in production] first, without changing the public behaviour. Put another way, try to refactor* the production code so it can run both the original behaviours, and the new ones, and release the refactoring before the actual feature is released.

Ideally, you can plan this: do the refactoring up-front, and then move onto the new behaviour that uses the refactored code. However this is often not possible: not until having written some of the feature do you then realise some parts of the code need to be refactored, and this should/could be released first. It can be tricky to reorder this: the refactoring done second is to be released first.

There is a technique that is especially helpful here: keep each commit small, releasable, and put test additions/changes in the same commit as the corresponding production code change. This requires a bit of diligence, and often use of git rebase -i to edit and squash commits. But if you do this, then it's far easier to re-order them safely, again using git rebase -i: the safety is achieved by running the tests* for each re-ordered commit to make sure everything works. So, if you do refactoring at the end of a feature branch, with small, releasable commits, you can then re-order them so the refactoring is put at the beginning.

There may be conflicts when you re-order, but if you fix them, run tests* on each commit, and it all works, then you know

the refactoring doesn't break any of the existing code or the feature;
the refactoring can be usefully released first.

So you do exactly that: you separate the refactoring commit(s) into a separate branch [git cherry-pick can be used for this], and go through your process to merge this into master and get it released. You then rebase your feature branch onto master, and ensuring to fix conflicts and making sure tests pass for each commit. At this point, master will contain the refactoring for the change, and the feature branch will just contain the [non refactoring] change. Repeat for as many refactorings as needed.

The rebase onto master, as opposed to a merge: this replays [the diff of] each commit from the feature branch, one by one, and with each you fix conflicts and can run tests. Here is where the benefit of rebase [over merge] with small releasable commits is revealed: you get smaller conflicts which are usually more managable.

Another benefit to this comes if you're not the only person working on this code. Getting each refactoring into master as soon as it's known to be required, means the rest of the team can rebase onto these changes, and deal with any conflicts sooner [and so typically with less or simpler code] rather than later. If each person in the team has been writing small, releasable commits on their own branches, then any conflicts can often be safely and quickly sorted out.

The most extreme example of this that I've done was for Django upgrade of a fairly complex application that was coupled to a lot of subtle/semi-private Django behaviour. To limit the risk of the Django upgrade + application code changes all going live at once, I repeatedly refactored the application code so the same code worked in both versions. Ultimately, there were about ~20 refactoring branches, each reviewed, QA-ed, and released, before the actual Django upgrade itself.

Note: it was crucial that the actual code changes are run in production to get value from this technique. If I had code like the below,

if DJANGO_VERSION == NEW:
  ...
else:
  ...

I would have been effectively incrementally adding a feature flag with lots of code behind it, which would have all been switched on on release day. This may be acceptable for an initial investigation, but before the actual production switch of Django version, it should be refactored to not have this conditional behaviour, as discussed below.

Feature flag [minimising code behind flag]

An alternative to a feature branch is a feature flag. This is in many ways similar to a feature branch, as in the feature code is kept from running by most users. However, with a feature flag the code is deployed to the production application, but it's only run by certain users with a certain flag enabled, or perhaps by hitting a certain URL.

This is often acceptable at the beginning of a feature [say, to get feedback on the UI of a feature before it's fully designed or working], but always consider what will happen down the line on release day. On release day, the feature flag will be effectivly enabled for all users, and potentially lots of code will be run by lots of users that wasn't before.

For this reason, don't enable the feature flag for everyone until the code behind the feature flag is minimised. Often it's possible to have not much more than just the UI that enables the particular feature. The rest of the code was run by large number of users long before the public release.

A great benefit of this technique is that often it is possible to incrementally refactor* a lot of the code out from behind the feature flag. Even if at the beginning there might be a fair bit of code behind it, you can go through several release/QA cycles, moving more and more of the code into production before release day.

A frequent example of this is changing a feature that uses 1 of "something", to using many. You can have a feature flag that enables the UI for this, but often to get this working requires changes through multiple layers of code, and possibly even the database.

What you can often do is to make a configuration option that limits the maximum of the "something", and configure all users to have a maximum of 1. Then you can work, layer at a time, making it work with N, but because each user is configured to have a maximum of 1, they will see no difference. Each layer can be tested*, QA-ed, and released. If there are problems, you have a lot of chances to get feedback before release day. Only when all the layers of the code are done, can you change the configuration option for production users.

In this particular case, once you know of nothing more to be done to the code, you don't even have to release the feature to all users all at once. You can slowly bump up the maximum for some users, in order to be able to get feedback from production, in terms of behaviour of the UI, from any reported bugs, and from performance. If you don't have a production use-case for each user to have a different maximum, then you can then set the maximum globally and remove some of the per-user configuration.

When to use which technique?

I don't know of a hard-and-fast rule: each case is different. However, some considerations:

A long running [but kept small] feature branch may not be feasible when more than one person needs to work on that feature branch. A feature flag is probably more helpful here: multiple people can more easily work on branches changing code behind the flag / refactoring it out from behind the flag.
Long-running feature branches, even if as small as possible, are harder to work with when there are lots of conflicting master branch changes: you may hit The Big Rewrite issues of maintaining two applications. A feature flag may be better here: impact on the "other" application will be visible in the code exactly when making changes, rather than making it a problem down the line when merging/rebasing.
A feature flag may not be helpful for a change that needs a lot of refactoring, but which is itself small in terms of final change that will be released on release day, and can/should be done by one person. A long-running [small] feature branch will likely be suitable in this case.

Addressing common critiscms

There are a a few common critiscms of the frequent-release techniques:

Each release introduces risk, and so releasing often is more risky than less.

At first glance, this argument makes sense. However, these techniques push you to split up what you will be releasing anyway, rather than releasing more things [which could be more risky].

Frequent releasing == testing less. This is risky! We don't do risk here.

Make sure each release is tested/QA-ed as you would any other change. This is not "move fast and break things", this is "do what you need to do to minimise the risk of each change that is going to happen anyway".

This is wasting QA time: you'll testing the same thing multiple times

If a change is deemed "large", then I think it's more appropriate to spend longer on QA: this is time spent reducing risk.

* It's important to have high-level feature/integration tests to support refactoring and safe rebasing. Low-level tests [often called unit tests] are less helpful.