Stories about Software


How Do You Know When to Touch Legacy Code?

Editorial Note: I originally wrote this post for the SmartBear blog.  Head over there and check out the original if you like it.  There’s a lot of good stuff over there worth a look.

Many situations in life seem to create no-win scenarios for you as you innocently go about your business. Here’s one that’s probably familiar to developers for whom pairing or code review is a standard part of the workflow.

You’re tasked with adding a bit of functionality to a long-lived, well-established code base. It’s your hope that new functionality means that you’re going to be writing purely new code, but, as you dig in, you realize you’re not that lucky. You need to open some Death Star of a method and make some edits.


The first thing that occurs to you while you’re in there is that the method is as messy as it is massive. So, you do a bit of cleanup, compacting some code, extracting some methods and generally abiding by the “Boy Scout Rule.” But when you submit for code review, a scandalized and outraged senior developer rushes over to your desk and demands to know if you’re insane. “Do you know how critical that method is!? One wrong move in there and you’ll take down the whole system!”

Yikes! Lesson learned. The next time you find yourself confronted by an ugly juggernaut, you’re careful to be downright surgical, only touching what is absolutely necessary. Of course, this time during code review, the senior developer takes you to task for not going the extra mile. “You know, that’s some pretty nasty code in there. Why didn’t you clean up a little as long as you were already in there?”

Frustrating, huh?

It’s hard to know when to touch existing code. This is especially true when it’s legacy code. Legacy is a term for which you might see any number of definitions. As a TDD practitioner, I personally like Michael Feathers’ definition of legacy code (from his book, Working Effectively with Legacy Code), which is “code without unit tests.”

But for the purposes of this post, let’s go with a more broadly relatable definition: legacy code is code that you’re afraid to touch. I’m sure you can relate to this, even if you’ve never had a senior developer yell at you about how touching it is dangerous (or better yet, leaving an all caps comment in the code threatening anyone who touches it). You size up a giant method with its dozens of locals, loops nested 5 deep, and global variable access galore, and think to yourself, “I have no idea what might happen if I change something here.”

So, what do you do? Should you touch it?

Read More


Characterization Tests

The Origins of Legacy Code

I’ve been reading the Michael Feathers book, Working Effectively with Legacy Code and enjoying it immensely. It’s pushing ten years old, but it stands the test of time quite well–probably much better than some of the systems it uses as examples. And there is a lot of wisdom to take from it.

When Michael describes “legacy code,” he isn’t using the definition as you’re probably accustomed to seeing. I’d hazard a guess that your definition would be something along the lines of “code written by departed developers” or maybe just “old, bad code.” But Michael defines legacy code as any code that isn’t covered by automated regression tests (read: unit tests). So it’s entirely possible and common for developers to be writing code that’s legacy code as soon as it’s checked in.

HouseOfCardsI like this definition a lot, and not, as some might suspect, out of any purism. I’m not equating “legacy” with “bad,” embracing the definition as a backhanded way to say that people who don’t develop the way that I do write bad code. The reason I like the “test-less” definition of “legacy code” is that it brings to the fore the largest association that I have with legacy code, which is fear of changing it.

Think about what runs through your head when you’re tasked with making changes to some densely-packed, crusty old system. It’s probably a sense of honest to goodness unease or demotivation as you realize the odds of getting things right are low and the odds of headaches and blame are high. The code is rat’s nest of dependencies and weird work-arounds, and you know that touching it will be painful.

Now consider another situation that’s different but with similar results. You have some assignment that you’ve worked on for weeks or months. It’s complicated, the customer isn’t sure what he wants, there have been lots of hiccups and setbacks, and there’s budget and deadline pressure. At the bitter end, after a few all-nighters, a bit of scope reduction, and some concessions, you somehow finally get all of the key features working for the most part. You check in the code for shipping, thinking, “I have no idea how this is working, but thank God it is, and I am never touching that again!” You’ve written code that was legacy code from the get-go.

Legacy code isn’t just the bad code that the team before you wrote, or some crusty old stuff from three language versions ago, or some internal homegrown VBA and Excel written by Steve, who’s actually an accountant. Legacy code is any code that you don’t want to touch because it’s fragile.

Getting Things Under Control

In his book, Michael Feathers lays out a lot of excellent strategies for taming out-of-control legacy code. I highly recommend giving it a read. But he coins a term and technique that I’d like to mention today. It’s something that I think programmers should be aware of because it helps lower the barriers to getting started with unit testing. And that term is “characterization tests.”

Characterization tests are the “I’m Okay, You’re Okay,” Rorschach approach to documenting code. There are no wrong answers–just documenting the way things exist. So if you have a method called AddTwoNumbers(int, int) and it returns 12 when you feed it 1 and 1, you don’t say “that’s wrong.” Instead you write a test that documents that return value and you move on, seeing and documenting what it does with other inputs.

Sound crazy? Well, it’s really not. It’s not crazy because things like this actually happen in real life. When code goes live, people work their processes around it, however much its behavior may be goofy or unintended. Once code is in the wild, “right” and “wrong” cease to matter, and the requirements as they existed some time in the past are forgotten. There only is “what is.” Sounds very zen, and perhaps it is.

One of the most common objections when it comes to unit testing is from developers that work on legacy systems where code is hard to test and no tests exist. They’ll say that they’d do things differently if starting from scratch (which usually turns out not to be true), but that there’s just no tackling it now. And this is a valid objection–it can be very hard to get anything under test. But characterization tests at least remove one barrier to testing, which is having extensive experience writing proper unit tests.

With characterization tests, it’s really easy. Just write a unit test that gets in the vicinity of what you want to document, finagle it until it doesn’t throw runtime exceptions, assert something–anything–and watch the test fail. When it fails, make note of what the expected and actual were, and just change the expected to the actual. The test will now pass, and you can move on. Change some method parameters or variables in other classes or even globals–whatever you have access to and can change without collapsing the system.

Through this poking, prodding, and documenting, you’ll start getting a rudimentary picture of what the system does. You’ll also start getting the hang of the characterization test approach (and perhaps unit testing for real as an added bonus). But most importantly, you’ll finally have the beginnings of an automated safety net. There’s no right and wrong per se, but you will start to be able to see when your changes to the system are making it behave differently in ways you didn’t expect. In legacy, different is dangerous, so it’s invaluable to have this notification system in place.

Characterization tests aren’t going to save the day, and they probably aren’t going to be especially easy to write. At times (global state, external dependencies, etc.) they may even be impossible. But if you can get some in place here and there, you can start taking the fear out of interacting with legacy code.

By the way, if you liked this post and you're new here, check out this page as a good place to start for more content that you might enjoy.