Avoid stateful variables

How I gently prod myself to write purer code

Sunday February 12th, 2017

Coming from an imperative programming background, or working in an imperative code base, it can be tricky to write purer code. However, there is one guideline I try to follow that gently prods me in the "right" direction.

Avoid stateful variables

Following this guideline leads to a number of things that I think makes the code "better".

Stateful variables

What I mean by stateful variable is a variable whose value changes once set. This include simple re-assignment,

a = f()
a = g(a)

but it also includes calls that change the internal state of an object after it has been created.

a = MyClass()
a.setFoo('bar')

Both of these mean there is a state that changes over time in the code. It might be short-lived, just in a single function, but it can still make the code harder to reason about. There is no, easily determined, single expression that can replace a stateful variable wherever it is used. You have to think about the entire history of the variable, and what caused it to change. Using a more technical term, stateful variables cannot be used in a referentially transparent way.

Once I have identified a stateful variable in code I am about to write or change, I consider alternatives.

Alternative 1: Declare more stateless variables

A way to avoid stateful variables is to declare more stateless variables. These are variables that don't change once set. For example,

a = f()
a = g(a)

could become as below.

a = f()
b = g(a)

This might be a bit more complex if you have branching involved.

a = f()
if test():
  a = g(a)

This above could become as below.

a = f()
b = g(a) if test() else a

Or, if you can change g, it might make sense for it to handle a boolean,

a = f()
b = g(a, test())

or even for it to call a function that returns a boolean.

a = f()
b = g(a, test)

Alternative 2: Declare fewer variables by inlining

A simple technique is just to not use as many variables. The code

a = f()
a = g(a)

could just be as below.

a = g(f())

A more horrible example is if you have a stateful variable that depends on another stateful variable.

condition = condition_1()
if condition and condition_2():
  condition = condition_3() 
a = f()
if condition:
  a = g(a)

This abomination is equivalent to the below, assuming all functions have no side-effects.

if (condition_1() and not condition_2()) or (condition_1() and condition_2() and condition_3())
  a = g(f())
else:
  a = f()

Yes, there is a little bit of duplication. but I wouldn't fear this, especially for such a tiny amount of duplication. It's case-dependent, but it may well be worth seeing exactly the cases certain code is run, and this might mean a little bit of duplication.

duplication is far cheaper than the wrong abstraction
Sandi Metz

Alternative 3: Create objects as needed rather than changing them

a = MyClass()
if test():
  a.setFoo('mary')

a = MyClass(foo='mary' if test() else None)

This code might not be strictly equivalent, but I would put in the effort to make sure MyClass works with foo being None.

As a sidebar, the None case above makes foo look look it might be an optional argument, which isn't great, for the reasons outlined in the post about optional arguments. Ideally, you would be able to factor out the code from MyClass so that it doesn't need to treat None as a special ase.

Alternative 4: Use higher order functions

In some cases you have a "pipeline" where a value is repeatedly modified for various reasons. For example a list of filters that are each applied conditionally.

a = f()

if condition_1():
  a = filter_1(a)

if condition_2():
  a = filter_3(a)

if condition_3():
  a = filter_3(a)

if condition_4():
  a = filter_4(a)

filtered = a

In this case, you can construct a structure of functions, and run them using a reduce.

a = f()

def conditional_filter(condition_filter, value):
  return condition_filter['filter'](value) if condition_filter['condition']() else value

condition_filters = [
  {'condition': condition_1, 'filter': filter_1, },
  {'condition': condition_2, 'filter': filter_2, },
  {'condition': condition_3, 'filter': filter_3, },
  {'condition': condition_4, 'filter': filter_4, },
]

filtered = reduce(conditional_filter, condition_filters, a)

Going overboard

I don't follow the above alternatives in all cases. There are times when making stateless code would be less clear, would be too much of a project, or would make the code too rigid.

For example, in the case of Alternative 4 above, there are reasons why this final pure code is not better than the original. It's remove some state, and added the ability that it's really easy to make the list of conditions and filters dynamic, but at the cost of making it harder to do something different/hacky in the middle of the filtering process. Say, save a file or make an API request. This would of course make the code slightly horrible, but in a real-world situation with time constraints and other priorities, this might be the best option.

Also, you should consider who else will be working on this code, and their familiarity with working with higher order functions like reduce. Of course, this should be covered in programming 101, but for me at least, it really wasn't.

In cases where a stateful variable is to be used, I usually try to make sure its lifetime is as limited as possible, make it so it is modified in consitent ways, and it is modified for consistent reasons.

As everything, it's a trade-off.