How Do You Know When to Touch Legacy Code?
Editorial Note: I originally wrote this post for the SmartBear blog. Head over there and check out the original if you like it. There’s a lot of good stuff over there worth a look.
Many situations in life seem to create no-win scenarios for you as you innocently go about your business. Here’s one that’s probably familiar to developers for whom pairing or code review is a standard part of the workflow.
You’re tasked with adding a bit of functionality to a long-lived, well-established code base. It’s your hope that new functionality means that you’re going to be writing purely new code, but, as you dig in, you realize you’re not that lucky. You need to open some Death Star of a method and make some edits.
The first thing that occurs to you while you’re in there is that the method is as messy as it is massive. So, you do a bit of cleanup, compacting some code, extracting some methods and generally abiding by the “Boy Scout Rule.” But when you submit for code review, a scandalized and outraged senior developer rushes over to your desk and demands to know if you’re insane. “Do you know how critical that method is!? One wrong move in there and you’ll take down the whole system!”
Yikes! Lesson learned. The next time you find yourself confronted by an ugly juggernaut, you’re careful to be downright surgical, only touching what is absolutely necessary. Of course, this time during code review, the senior developer takes you to task for not going the extra mile. “You know, that’s some pretty nasty code in there. Why didn’t you clean up a little as long as you were already in there?”
Frustrating, huh?
It’s hard to know when to touch existing code. This is especially true when it’s legacy code. Legacy is a term for which you might see any number of definitions. As a TDD practitioner, I personally like Michael Feathers’ definition of legacy code (from his book, Working Effectively with Legacy Code), which is “code without unit tests.”
But for the purposes of this post, let’s go with a more broadly relatable definition: legacy code is code that you’re afraid to touch. I’m sure you can relate to this, even if you’ve never had a senior developer yell at you about how touching it is dangerous (or better yet, leaving an all caps comment in the code threatening anyone who touches it). You size up a giant method with its dozens of locals, loops nested 5 deep, and global variable access galore, and think to yourself, “I have no idea what might happen if I change something here.”
So, what do you do? Should you touch it?
At first, this might seem obvious. If you’ve traced the bug you’re hunting here, or if this is code you have to touch to add a new feature, well, then you have to touch it. Otherwise, leave it alone, right?
When In Doubt, Hands Off (If Possible)
As you might expect, things are rarely that simple. The simplest thought that I can offer is that your first guiding principle should be, “when in doubt, and if I don’t strictly need to touch it, I shouldn’t touch it.” In other words, let your default be not to touch legacy code when there’s any doubt.
Legacy code has been around for a while and found its way into production use, doubtlessly in a variety of scenarios. It may be awkward, sub-optimal, or even flat out wrong, but if it has been production, someone will likely have come to depend upon its current functionality, as ably illustrated by this XKCD comic. If you go changing it, there’s a chance you’ll break a stakeholder you never considered in a way that you couldn’t have imagined.
Is It Avoidable?
Let’s assume that the above advice doesn’t apply. It appears you to make changes to some existing legacy code. But, before you touch it, are you sure?
Is there really no way around touching it? You couldn’t add a facade or perhaps an adapter? You couldn’t define a class that inherits from the one you’re looking to touch? You’d be surprised at how often you don’t actually, truly need to touch the code in order to add or correct some behavior. Now, I’ll go on to explain some situations where you might want to touch the legacy code, but before you go further, you should feel pretty confident about your answer to the question, “do I absolutely need to modify this?”
I said before that your default should be to avoid touching it. But let’s talk now about some situations where you might want to touch it. After all, if you want to improve any code, you’re going to have to touch it.
Is The Change Low Risk?
Not all legacy code is created equal in terms of the risk of touching it. An inscrutable, 5,000 method called in dozens of places throughout your application is a much different beast than a simple, 5 line method that is rarely, if ever, invoked. If you trace your work to a method that you can identify as low risk to change, then think about making it a little nicer in accordance with the aforementioned “Boy Scout Rule.”
Are You In a Major Overhaul?
Making changes to legacy code as part of a minor patch is a far cry from making changes to legacy code during a massive overhaul of your software. If people on your team are hacking away at existing code all over the place and you’re not shipping for a while, bug fixes and new functionality that touches legacy code is no problem. And, in such situations, you should seriously consider doing boy-scouting whenever the opportunity presents itself.
Are Behavior and Expected Behavior Extremely Clear?
One final consideration that I’ll advise you to factor into your decision is the degree to which you understand both how the code you’re touching is expected to behave and how it actually behaves. The main danger in touching legacy code is what has the senior developer from my little story so upset — no one really understands what the system does, so touching it is dangerous. It’s like the scene in an action movie where the protagonist has 30 seconds to disable some complicated bomb that no one really understands.
But if you do understand the code, then there isn’t much danger to modifying it. That’s why automated unit tests are so important for preventing legacy code. In one fell swoop, they document both intent and actual behavior of the code. But unit tests are not the only path to understanding. There could be extensive documentation on the code or it could be simple enough to be clear by inspection (though such simple code often does not need “boy scouting.”) Whatever the mechanism, the key property is clear understanding.
If You Touch It, Characterize
I will leave you with one final thought. If you weigh the considerations I’ve laid out here and decide that it’s worth touching legacy code that you’re looking at, make sure you characterize it first. By writing tests to document the code’s current behavior, you accomplish two things: you elevate your understanding of the code, and you build yourself a warning system for whether your changes are breaking things. Mining through legacy code is a dangerous business, so if you’re going to do it, make sure you bring a canary.
I create
copyClassA{}, copyFuncB()
and go from there
Do those exist in prod at the same time as the originals?
Sure!
Everyone had a different view on what legacy code is. This is my take http://www.depthinfinity.com/2015/11/the-legacy-of-legacy-code/
Interesting take 🙂
I like the “orphaned” concept. That’s something worth expressing that happens a lot, in my experience. It’s easy to regard code as “legacy” if no one that wrote it is around to answer questions or maintain it.
Especially when you consider the working memory of developers. Even when developers are still physically present in the company, their memory and understanding of the inner workings of those code bases can grow fuzzy enough to still consider them “orphaned”. To all intents and purposes, the “creators” of that code have gone and noone understands it anymore.
Or something else we might consider, is the code simply done? It was written, does it’s job and has continued to do so. There’s no need for modifications. It’s so perfect it’s orphaned 🙂
I do like Michael Feathers’ term ‘Seam’. It’s quite easy to remember and again, as a mem, gives you some directions when you need to touch the legacy code.
Definitely a good word and one that I tend to use a lot when discussing code (legacy or not — e.g. “are we creating good seams as we go here?”)
This reflects a typical thought process of most of the development teams. I see the point when it is about real-real legacy code. But nowadays developers started calling any code legacy, once the developer who wrote that leave the team. I have seen it going all wrong in such cases, because developers make this as excuse for duplicating code and creating workarounds.