Long files are not the enemy

Spend your time wisely: moving code about may be counterproductive

I have seen multiple developers leap to the conclusion that long files of source code are automatically bad, leading them to separate the code, virtually as-is, into separate files. This post suggests that this view is too simplistic, and other things should be considered before moving code about.

So why do developers think long files are bad?

My suspicion is that upon seeing a long file, there is often an assumption that it's a moving part. Any working with it means you have to reason about it as a whole, this whole is too big to reason about, and therefore you should split it up to make it possible to reason about each of them separately. So firstly...

A file is not necessarily a moving part

I don't have a strict definition of what a code moving part is, but roughly it's something that you have to reason about as a whole, and that interacts with other code or the outside world via some interface. Examples include

  • pure function,
  • group of functions that call each other or communicate via some state,
  • script that is run in response to a web request,
  • class, especially if its instances maintain an internal state,
  • Angular directive or a React component.

You can have one of these in a file, multiple in a file, or even one spread across multiple files.

If there's a problem, you're just moving it about...

Consider a 1000 line file that has 40 functions in it. These functions might not call each other, read or change state, and they don't have any side effects beyond those they advertise, such as touching a database. Each of these functions I would class as a moving part, and each of these function might have problems. In another file, these problems would be exactly the same.

... and maybe making it even worse

Consider if those 40 functions each change and read state, call each other in a complex web of booleans and optional arguments, none of them having a clearly separate purpose, and the behaviour of any part of the code is very brittle with respect to changes in other parts. This makes the 40 functions into a single moving part, and splitting these functions, as they are, into separate files actually spreads this moving part about. I suspect it is likely that this makes it harder, and not easier, to reason about.

... and makes it harder to see its history

Once you have moved code about, it makes finding the history of it just that little bit harder. If there are problems with the code, the history of the lines may help work out why things are the way they are, and so guide you in the choices you make when making further changes. It's actually with the worst code that the version control history is the most valuable.

So what is the enemy?

Smells/rot in the moving parts themselves. Maybe each moving part has too much responsibility, blurry responsibility, or acheives its goal(s) using too much state or crazy overcomplicated logic. Any one of the tens/hundreds of possible problems code might have that aren't that it happens to be near another moving part in the same file.

What should I do about it?

Refactor as you would, but keep in mind that moving code as-is, especially bad code, has consquences that might not all be positive.

Final semi-stolen quote

To steal/adapt Sean Kelly's point on microservices...

You don’t need to introduce another file as an excuse to write better code.