Structured data in environment variables

A straightforward way to store nested dictionaries and lists in environment variables

Saturday July 7th, 2018

I recently wrote a small Python application that had to be configurable so it can run differently in different environments. As is often done, I used environment variables for this.

However, for various reasons beyond the scope of this post, it was helpful to have structured data stored in the environment: lists, dictionaries, and even lists of dictionaries. So when the application runs, it should be able to extract this and get something that looks like:

config = {
    "FOO": [{
        "BAR": "setting-1",
        "BAZ": "setting-2",
    }, {
        "BAR": "setting-3",
        "QUE": "setting-4",
    }],
    "FIZZ": [
      "setting-5",
      "setting-6",
    ],
    "BILL": "setting-7",
}

This could be done by encoding such a structure into a single string, perhaps using JSON, throwing it into a single environment variable, and decoding it in the application. However, this would be fairly painful to edit and difficult to debug in many situations.

So I came up with a simple format where each setting is in its own variable, but its name defines where it is in the structure. Specifically each name is

a double-underscore separated list of path components;
and where each level of components defines a dictionary unless all of them parse as integers, in which case the level defines a list.

For example, the above config could be defined in bash by:

export FOO__1__BAR=setting-1
export FOO__1__BAZ=setting-2
export FOO__2__BAR=setting-3
export FOO__2__QUE=setting-4
export FIZZ__1=setting-5
export FIZZ__2=setting-6
export BILL=setting-7

and to convert this list of environment variables to nested Python dictionaries and lists, you can use normalise_environment.py:

import os
config = normalise_environment(os.environ)

This way of incorporating structure into environment variables has some nice properties.

The double underscore is likely to not cause issues because underscores are used in environment variables frequently.
No (extra) escaping needed. You can set them easily on the command line, any old bash script, Travis, CircleCI, CloudFoundry, ECS task defintions.
You can then read them really easily, which is good for debugging.
If you don't currently use any double underscores in environment variable names, you can put your entire environment through normalise_environment.py, and it will be unchanged. This helps in moving existing config to a more structured form in small steps.
Although the function normalise_environment.py is helpful in Python, it's by no means necessary. Any code that can read environment variables can access their contents, without extra steps or dependencies involved.

What to call this

I'm not sure how best to refer to this technique (it's almost so trivial calling it a "technique" is a bit much...). It's not quite flattened since the structure is preserved. Storing the variables denormalised is the best I have, since it can be argued that the location of each variable in the structure is denormalised.