Learning a Healthy Fear of Legacy Code
Editorial Note: I originally wrote this post for the SmartBear blog. Check out the original here, at their site. While you’re at it, have a look around at some of the other authors posting there as well.
The life of a developer would be pretty much nothing but rainbows and unicorns if all we did was add new code to code bases. Imagine a world without maintenance programming, debugging, and scratching your head while squinting at confusing, existing code. It’d be pretty nice, right?
Sadly, we don’t live in that world. The result is that most of our efforts in software development involve a blend of new and old code. We write some new code, stuff it into some existing code, and then try to figure out how the two things will behave together in production. Consequently, both writing and reviewing code necessarily involve a kind of constant, subconscious risk management. “Hmm… should we really touch this code?”
There’s rarely a set of explicit heuristics that guide this decision; it tends to be a matter of feel. It’d be nice if there were a way to be a bit more deliberate about it.
Understanding Legacy Code
“Legacy code” is a rather nebulous term. Even the wikipedia entry offers multiple, possible meanings.
- Code that relates to a no-longer supported hardware or software dependency.
- Code inherited from someone else.
- Code that’s part of an older version of the software.
- Code that isn’t covered by automated unit tests.
When I think about what pops into my head when someone says, “legacy code,” none of these things would surprise me. I could imagine any or all of them being true. But for me, legacy code really translates to, “code you’re afraid to touch.”
Any of the things mentioned above would make you leery of code. If the code works on outdated dependencies, how do you test to see if your changes work? If it’s inherited from someone else, did they know something that you don’t? If the code is from some outdated version of the software, who will be affected by your changes, and how? And if there are no automated tests, how do you know you’re not breaking things?
All of these questions tend to float, half-formed, through your head as you weed in to existing code and tweak it. And, probably, all of these questions float through your head when you’re reviewing someone else’s code. Perhaps more of them come up when you’re reviewing code, since your primary purpose is checking, rather than modifying. As a result, if you’re going to be reviewing code, you need to understand how to identify legacy code — code that is risky (scary) to change. This is doubly true if you’re evaluating the code of less experienced folks who, perhaps, haven’t yet known the horror of making some tiny change to existing code and introducing a horrific, counter-intuitive bug.
Recognizing The Signs
When should you be nervous? What does code that’s dangerous to change look like? What properties does it have? These are the important questions to answer when evaluating whether you should sign off on a code change or whether you should suggest that your peer find a way to accomplish her task in a way that doesn’t involve touching the code in question. Here are some guidelines.
The Obvious Stuff
Before we delve into nuance and code-specific properties, let’s get some obvious signs of dealing with legacy code out of the way. You can audit the source control history of the code base and ask around with people familiar with its history to learn a number of things. If no one has touched the file(s) in question for years, there’s a heightened element of risk. If the people who wrote the code are no longer with the group, or even the company, there’s danger as well. And, of course, if no one, including the original authors, can easily articulate what the code is supposed to do or why it was written, you’re on thin ice when you change it.
Another obvious consideration is the one addressed by Michael Feathers’ definition of legacy code: code not covered by unit tests. Do you have any way of knowing if the code will continue behaving as expected after the changes? Automated tests are probably the best way to answer this question, but comments, documentation, or manual regression test plans and results can also help. When you look at the code, see if you have any way to verify whether the changes aren’t having unanticipated effects.
Fan-In
Fan-in is a measure of how many elements in a code base depend on a particular element. Think of a method in your code base that was called in 400 different places. This method would have a very high degree of fan-in. High fan-in makes code riskier to change because there are so many different sources using it in so many, potentially different ways. If the code you’re looking at has high fan-in, proceed with caution.
Global State and Hidden Dependencies
Another property of code that should scare you is the interaction with hidden dependencies, and especially global state. Global variables, by their nature, can be accessed and mutated anywhere in the code base, at any time. As a result, any code that deals with such variables is part of a complex interleaving of logic that is non-obvious to someone touching that code. Modifying code like this can have ripple effects throughout the entire code base and should be approached with extreme caution.
Global state tends to be the most dangerous, but there are other, subtler things to beware of as well. The modified code may be collaborating with other code in hidden ways as well. This can happen with runtime binding schemes or side effects not obvious from method and class signatures. Does a method called “CalculateAverage” randomly write something to a database? Code with unpredictable behaviors like that is best approached with a metaphorical hazmat suit.
Complexity
The final heuristic that I’ll offer is to be leery of modifications to methods that are complex. Complexity can come in various forms, but there are two particularly obvious and telling signs to note. The first is the size of the method. If methods are hundreds or thousands of lines long, they’re doing a lot of things — more things than a human can keep straight in his head. The second sign of complexity is the number of control flow statements or paths through the code. This is known as cyclomatic complexity. If there are dozens or hundreds of paths through the code, odds are that you’re not covering all test scenarios after you’ve modified the code.
Proceed with Caution
Learning when to have a healthy fear of old code is a skill that will serve you well, both as a reviewer and while writing code. If you develop a deliberate approach to deciding whether or not to touch existing code, you’ll wind up spending fewer late nights chasing down arcane bugs. And even when you have no choice but to touch it, you’ll better understand the risks and be able to allocate more time for regression testing. Until you find yourself in the land of rainbows and unicorns, writing only new code, recognizing legacy code will at least make your life a little better.