Type simplicity

If you can't have type safety, have the next best thing

Thursday January 4th, 2018

There is a "feature" of type-safe language: it is often an effort to use union types. This has the consequence that the developer is given a little push to avoid such types, and consider alternatives.

In this post I argue that the simpler, non-union, types and corresponding code are often easier to reason about, and less likely to have bugs. Therefore, even in a in a type-unsafe environment, you should give yourself this push for simpler types.

What are union types?

They are types that have two (or more) cases. For example, in Haskell, you have to be explicit, via a bit of effort, as to "which" type is used. A basic situation is using Either, which allows a Left type and a Right type:

a = if condition then Left 3 else Right "three"

In this case, a would have the type Either Int String. [This is slightly inaccurate due to typeclasses, but this isn't relevant here.]

When in an type-unsafe environment, variables don't have types, just the data they refer to does. However, it is often useful to mentally note down the all the types of variable's data and construct "what the type of the variable would be" if the environment had the concept of variables having types.

For example, in Python, you can write:

if condition:
  a = 3
else:
  a = "three"

where the type of a would be Union[int,string] (using Python's type annotation syntax). The similar Javascript code:

var a;
if (condition) {
  a = 3;
} else {
  a = "three";
}

would result in the type of a being the union type number | string (using Flow's type annotation syntax).

A more complex, but more realistic example, would be an object with differerent keys:

var obj = {}
if (condition_1) {
  obj.key_1 = "value_1";
}
if (condition_2) {
  obj.key_2 = "value_3";
}
if (condition_3) {
  obj.key_2 = "value_3";
}

Slightly shockingly, this is a union of 8 types:

{}
| {key_1: string}
| {key_1: string, key_2: string}
| {key_1: string, key_2: string, key_3: string}
| {key_1: string, key_3: string}
| {key_2: string}
| {key_2: string, key_3: string}
| {key_3: string}

[You can write a similar type more concisely in Flow. The types are written out here to make clear all the different cases.]

Special mention: Maybe/Optional/None/null

These are a common union type that crops up in code. In Haskell again you have to be explicit adding a Just to wrap a value:

a = if condition then Just 3 else Nothing

However in Python you can just set the values as desired,

a = None
if condition:
  a = 3

where the type would be Union[int, None] or more concisely Optional[int]. Similarly you write in Javascript:

var a = null;
if (condition) {
  a = 3;
}

where the Flow type annotation would be ?number.

Special mention: optional arguments with default of None

It is common to use optional arguments to function that have a default of None:

def my_func(a=None):
  ...

You should be conscious of the fact that, unless None is the only value ever passed, the arguments will be of a union type.

Special mention: variable only sometimes defined

In imperitive languages, it's often possible to only sometimes define a variable. For example the_filter is only sometimes defined below.

if condition:
  the_filter = filter_function

In Python, it would causes a runtime error if you try to access the variable when it's not defined. However, it's useful to keep in mind that there are 2 cases, so the type of the variable can be usefully classed as a union, of the defined and not defined cases.

When and why should they be avoided?

"Avoided" is perhaps too strong. However, since they result in a number of different cases the code has to deal with, choosing an alternative that reduces the amount of code that has to deal with multiple cases may result in code that is easier to reason about, and even improve effectiveness of tests and manual QA.

Alternative: Single simple type

Often, types are simple in an initial version of the code. They are then transformed into union types when adding a feature, where the original case should behave as before. Wonderfully, it is often possible to refactor the old case to be a special case of the new.

For example, if you're adding a feature where an existing list is filtered by a filter based on some conditions, you may have written:

selected_filter = None
if condition_1:
  selected_filter = filter_function_1
elif condition_2:
  selected_filter = filter_function_2

if selected_filter is not None:
  data_filtered = filter(data, selected_filter)
else:
  data_filtered = data

The type of selected_filter would be Union[function, None]. You can rewrite this to always set a filter, using a constant function for the original case:

selected_filter = \
  filter_function_1 if condition_1 else \
  filter_function_2 if condition_2 else \
  lambda x: True

data_filtered = filter(data, selected_filter)

In this version, the type of selected_filter would just be function. There are fewer cases in the code, all things being equal, this makes the code easier to reason about.

Also, the difference in what code is run between cases is extremely small: every single case will call the filter function. Therefore tests written before the filtering feature was added would still call filter. If the tests pass, this is evidence that the new behaviour will work just as expected in all cases.

Further still, calling filter creates a new list. If there is code mutating lists somewhere, it is crucial to make sure the correct list is being mutated. Always creating a new list rather than sometimes creating a new list helps to avoid subtle bugs relating to this. Tests can't cover every combination of cases, but we can maximise their value by making sure differences between the cases they do test is minimal.

[There are arguments against using the ternary operator, especially the Python ternary operator due to the order of arguments. However it does lend itself to making sure that variables are always defined, and I find the above layout for the nested case fairly easy to parse to understand the different cases and values.]

Alternative: Multiple simple types with separate code paths

In some cases the behavior of the cases of the types are quite different, but for historical reasons are munged together. When processing the type, there could be runtime checks for something already known up the call stack / by the client. An alternative would be to split the code at the earlier point, using different types, with two entirely different code paths for the different types.

For example, a "Send email" button may have been written first, and a "Save draft" feature added later. The "Safe draft" button POSTs similar, but not the same, data to the original endpoint on a server. This is often done in the name of DRY, and to avoid touching existing code as much as possible. However, sending email and saving draft are now tightly coupled together and hard to reason about, so making changes are likely to be slow or introduce bugs.

Better would be to have a separate endpoint for drafts. Two data types would be in-play, an "email to send" type, and "email to save" type, and the server wouldn't need to dynamically determine which: it's known by the which endpoint the data was POSTed to. The two cases are separate from the point of hitting the buttons: each endpoint would only contain code relevant to that case, without duplicated runtime tests for things already known. Therefore each should be easier to reason about and easier to make futher changes to. There may be some duplication on the server, but this can be minimised by factoring out common code from the two endpoint handlers if desired.

Michal Charemza

Type simplicity

What are union types?

Special mention: Maybe/Optional/None/null

Special mention: optional arguments with default of None

Special mention: variable only sometimes defined

When and why should they be avoided?

Alternative: Single simple type

Alternative: Multiple simple types with separate code paths

Related questions to ask yourself when coding

What would the type of this variable be in a type-safe environment?

How can I make the type of this variable simpler?

What would the code have looked like if I wrote the features in the another order?