Stories about Software

Jul
24

Understanding Degrees of Code Flexibility

Category: Language Agnostic Tags: Software Engineering | 9 Comments

In some projects I’ve been managing of late, I’ve noticed a continuous question cropping up: How flexible should we make the different parts of the system? I’m currently working with a bright crew of people, so they’re picking up on this quickly, but I thought I’d do a bit of a write-up to help the process along. And, as long as I’m doing write-ups like this, I might as well post them.

In discussions of software, there is a large issue that gets lost in the shuffle. You frequently hear people argue the merits of different styles of or approaches to programming. Unit testing or not? TDD? IoC or inline new? What’s the appropriate size of a method? ORM or inline SQL? SQL at all or NoSQL? You get the idea. But one thing that I find is often glossed over is the idea of system changes as a function of ease of making those changes. In other words, if a user comes a-hollering and says, “I want, nay, demand the ability to do X,” how hard is it to make that happen and to verify the results?

And by “hard,” I don’t mean, “do you write code for a day or for three weeks?” I mean, “what do the changes look like in terms of risk and deliverables?” In other words, can you make that happen by changing a configuration file or does it require code changes? Will you need to re-deploy or can you somehow patch on the fly through a plugin architecture? And is it testable? Can you verify 99% of the changes by swapping out a configuration setting, or do you have radically different production and test setups?

So I’m going to define some concepts to flesh out an idea. This isn’t exactly a formalized theory or anything. It’s rather just a working lexicon of how I think about my application. This is a scale of system flexibility for a given future change. Or, put another way, here is a way of assessing how much effort on the part of the entire development/operations group doing X for the aforementioned user will be, from least to most significant.

Users can do it themselves.
An IT-level change is required (e.g. changing a config file, swapping out images, etc.)
An architect/dev change is required to configuration (e.g. XML for an IoC container)
A non-compiled source code change is required (e.g. you update the markup for a site but not the underlying code)
An Open/Closed Principle Compliant source code change is required (basically adding new code).
A localized tweak to existing code is required.
A substantial change to existing code is required spanning various modules.

Now when considering this list, it makes sense to assess your change-set in terms of the furthest down thing it requires. So maybe you need to change a logo on your website, which is easy, but you have an unwieldy switch statement somewhere that chooses swaps it out in certain circumstances that you now need to change. This is going to be probably a 6 rather than a 2. A given change is going to be as rigid as the most rigid link in the chain, so to speak.

Here are the kinds of changes described in more detail.

Users do it themselves.

There are some sorts of changes to the system that need not involve anyone from your team/staff/company. These are things that users do, through the application. An obvious example is a banking or commerce website in which users can change their passwords. “Password” has nothing to do with the business logic of commerce, so this is functionally a meta-piece of administrativa that you’re entrusting to users.

A dev-ops or operations person changes meta-data in production.

This is something that you (hopefully) don’t trust a user to do but that doesn’t require any actual knowledge of the code base. A good example of this might be a desktop application that has an XML configuration file (or an INI file, if you’re willing to show your age with me). This file might be modified to have the application point to a different database or log to a different file or something. This is not something the average user could or should do, but it’s a relatively lightweight change in that it requires only minimal training and no re-deployment of any kind.

An architect or developer changes meta-data in production.

The next step down in operational rigidity is a meta-data change that cannot be performed without an understanding of the code base. The best example here is the configuration of an IoC Container that has been extracted to XML. Figuring out which service is used by which ViewModel is not something that anyone without sophisticated knowledge of your source code can do, but, on the plus side, it’s still just a change to a setting in production.

Someone changes source code that does not involve re-compilation

This is what happens when a developer logs on to the web server and stars editing HTML or CSS or even a server side script like PHP in the files. This is really not a good idea for a variety of reasons, but it is possible and may be something you have to do in a pinch, so it’s worth noting.

Someone makes a code change that more or less just involves adding code and very little modification

Now we’re down deep enough into rigid territory that a new deployment/install is required in order to push the changes. From here on, this cannot be done in production, so if you’re doing this you’re going to incur all of the overhead of building/running automated tests (hopefully), quality assurance, creating a deployment and deploying (your process may vary). But on the plus side, this is pretty low risk as far as code changes go. Adding things is generally both easy to verify for correctness of functionality and unlikely to mess up existing code.

Someone makes a code change that involves lightweight changes to existing code.

The most common scenario here is probably a bug fix, though it may be new functionality too, depending on how flexible your architecture is. This is a higher risk proposition than adding new source code because you’re now creating a risk of regressions. You still have all of the same considerations about build and deployment, but the risk of problems is higher. Your testing and verification overhead should also be higher. This is a heavier change.

Significant work on the code base is required.

This is what happens when a code base that models a company and implements Office as a singleton suddenly needs to accommodate the new office you opened up in Texas. You designed your code under the assumption that there could only ever be one office location, and you were right about that — right up until you weren’t. Oops. Now things get ugly because management comes to you and says, “we’re opening a new office, so the application is going to need to handle that” and your response is, “that’s completely out of the question. Why, even the thought is preposterous!” You tell them that substantial rework is going to be required.

Making Sense of your Options

So why did I list all of these out? Well, I did it because I feel it’s important to know what your options are when you’re designing and that it’s important to anticipate, rather than react knee-jerk style, to changes. If you sit down before you start putting together a code base, think about what users might want and then go through the exercise of figuring out which number of change would be required. If you find likely changes that would be 6s or 7s (and there is certainly a sliding scale at the 6-7 level), that’s a problem that you should start addressing now. If you find extremely unlikely changes that are 1s and 2s, that’s not necessarily a problem. But it is a point of design flexibility that you could get away from, and it may be that you have pointless abstractions and complexity (though I’d be a lot more hesitant to introduce rigidity because you think flexibility is unneeded than vice-versa).

Another interesting exercise is to consider categories of these things. For instance, 5-7 are all things that require compiled code changes and 1-4 are all things that do not. This is an interesting way to split up your functionality, and it’s obviously the backbone of this post, but you can divvy these up in other ways as well. For instance, if you’re writing software that for some reason has no field or ops support, then 1, 5, 6 and 7 are your options, and 2-4 are basically non-starters. Or, if you’re considering things in which source control is an issue, then 4-7 are in a category and 1-3 are in a different category (most likely, as I’d think that you’d favor generating meta-data files as part of your deployment rather than source controlling different configurations).

None of this is even remotely comprehensive, but my goal here is really just to encourage people to understand at design time the difficulty of changing something at production time. It seems quite often to be the case that people don’t really think about this, and simply because no one has ever pointed it out to them. Your mileage may vary on the number of categories in the list and your preference for certain options, but at the core of this is a basic and incredibly important idea: you should always play “what if” when it comes to changes that users might request and understand how much of a headache it will be for you if the “what if” comes true. Oh, and also try to minimize the number of headaches. But hopefully that goes without saying.

Jul
19

By Erik Dietrich

Throw Out Your Code

Category: The Life of a Programmer Tags: Software Engineering | 13 Comments

Weird as it is, here’s human nature at work. Let’s say that I have a cheeseburger and you’re hungry. I tell you that I’ll sell you the cheeseburger for $10. You say, “pff, no way — too expensive.” Oh well, I eat the cheeseburger and call it a day. But I’ve learned my lesson. The next day at lunch, to execute my master cheeseburger selling plan, I slide the cheeseburger over in front of you and tell you that you can have it: “you can have this cheeseburger…” Just as you’re about to take a bite, however, I cruelly say “…for ten dollars!” You grumble, get out your wallet and hand me a ten dollar bill.

This is called “The Endowment Effect,” and it’s a human cognitive bias that causes us to value what we have disproportionately. I blogged about it here previously in the context of why we think that our code is so good we should SPAM it all over the place with control-V. But even if you don’t do that (and, really, please don’t do that), you still probably get overly attached to your code. I do. After all, we, as humans, have a hard time defying our own natural instincts.

I’m certainly no anthropologist, but I suspect that our ancestry as nervous, opportunistic scavengers on the African Serengeti has everything to do with this. Going and snatching a morsel that a hungry lion is eyeing is a pretty bad idea. But if you already have the morsel, what the hey, you might as well take it with you as you run away. But, however we’re wired, we’re capable of learning and conditioning our own responses. After all, we don’t go bolting away from the deli counter after the guy there hands us our two pounds of salmon. We’ve learned that this is a consequence-free transaction.

It’s time to teach yourself that lesson as it relates to your code. It’s not so much that deleting functional code is consequence free (it isn’t). But deleting it isn’t nearly as big of a deal as you probably think it is. When it comes to code that you’ve spent two weeks writing, I’m pretty willing to bet that if you trashed it all and started from scratch (no peeking at source control history), you could rewrite it all in about two days. If that sounds crazy, ask yourself whether the majority of the time you spend programming is spent furiously typing as if you were taking a words-per-minute test or if most of it is spent drawing things on scratch-paper, squinting at your screen, pushing code around unit tests, muttering to yourself, and tapping a pen on your desk. I’m betting it’s the latter, and, when you rewrite, it’s activities from the latter that you don’t do nearly as much. You’ve already blazed a trail for yourself and now you’re just breezing through for a second trip.

Write some code and throw it out. Do a code kata with the stipulation that the code is deleted, never to be recovered. Then try it again the next day and the day after that. Or create a copy of your production code at work, engage in some massive, high-risk, high-wire-act refactoring, and then just delete it. With either of these things, I promise you that you’ll learn a lot about efficient coding and your code base, respectively. But you’ll also learn a subtle lesson: the value you’re creating as you code can be found more in the knowledge and experience you’re acquiring as you do it than the bits sitting in source control.

Practice throwing out your code so that you stop neurotically overvaluing it. Practice throwing out your code because it’ll probably happen by accident at some point anyway. Practice throwing out your code because your first crack at things usually kind of sucks. And practice throwing out your code because end users and the world are cruel, and not everything that you write is going to make it gift-wrapped into production. The more you learn to let go, the happier and more productive you’re going to be as a programmer.

Jul
17

By Erik Dietrich

Easy Deployment: the Alpha and the Omega

Category: Language Agnostic Tags: Software Engineering, Software Process |

A bit of housekeeping…you may have noticed that the social media buttons look a bit different if you’re not accessing through RSS. The old plugin that I was using seems not to be supported anymore, and the Facebook button vanished for a bit. I tried out a replacement and liked it, so I kept it. My thanks to Active Bits for the Social Sharing Toolkit.

Wrong But Fast

There’s a pretty good chance that your deployment process is both too painful and not painful enough. But before I return to that cryptic statement, let me talk a bit about something I’ve observed in developers — especially ones that are newer to the industry. Here’s an example of a series of exchanges that has become pretty familiar to me:

User: It would be nice if the profile screen had a way I could change my password.

Young Buck: That’ll take like, literally, two seconds. I’ll be right back!

(Fifteen minutes later)

Young Buck: Okay, I pushed it out to the server, so you can change your password now.

User: Wow! It’s live already? That’s really cool! Thank you! Let me try it out. Let’s see… oh, hmmm. When I try to log in it looks like it crashes.

Young Buck: That doesn’t seem right. I mean, the only thing I changed was…. oh! I know exactly what happened. Give me three minutes, and I’ll be back.

(30 minutes later)

Young Buck: Alright, should be good.

User: Well, I can get in again, and there’s the change password button, but when I click, nothing happens.

Young Buck: That’s not possible. You must have forgotten to clear your cache. Unless…wait, I think I know what’s going on here. Give me 10 minutes.

User: Uh, tell ya what — just don’t worry about it. Maybe I’ll try it again next week.

Young Buck: (Crestfallen) But it’ll only take 10 minutes and I know exactly what the problem is.

User: (With pity) I’m sure you do, but I’ve got a lot of things to get done today.

This is not professional behavior. Imagine if you took your car in for repairs somewhere and things went this way. You’d probably have the Better Business Bureau on the phone shortly or at least be headed to another mechanic. The young buck is sloppy because he’s brash and arrogant. Right?

Or does it just come off that way a little because of how sure he seems when really he’s just eager to please the user and prove himself? Personally, I find that in the overwhelming majority of cases this is really what’s going on. People often get into programming because they like solving problems. And many programmers were some of the smartest kids in their classes growing up — the ones waving their hands frantically, demanding that the teacher give them a chance to show that they know the answer.

As entry level programmers, the school game has been all that they know. It’s a balance between rushing to get the right answer (teacher calling on students, timed exams, cramming in homework, etc.) and getting answers right, with the former often winning. In college, most programming assignments are evaluated by programs that allow you to submit your code as often as you want and get immediate feedback. There are also office hours, so students who visit professors and TAs the most frequently and submit the most work tend to get the best grades and the most feedback. Computer Science students are used to a paradigm where ideas are valued over execution.

Welcome to the Real World, Grasshopper

In the professional world, however, execution reigns supreme. Ideas are cheap. You may be able to rattle off the quick-sort algorithm in pseudo-code faster than anyone around you, but that’s not going to win you any startup capital. With software, even the intellectual property system (USPTO, anyway) is a joke seemingly designed to let Apple, Microsoft and Google endlessly sue each other and occasionally to swat little guys like bugs. Having the idea first and/or quickly is not as important as getting the idea right in the end.

It takes people some time to learn this upon entering the work force, and exchanges like the one I mention above are common. Users more or less say, “Go away and come back to me when you have something that makes my life better and not a second before. I don’t care if you thought of it in five seconds or if it took you five minutes or if it would have taken you five minutes except that the database was actually not normalized to BCNF and blah, blah, blah. Whatever.” Some people figure this out quickly, and some never figure it out. But whatever the speed may be and however much your group may or may not have come to terms with this, there needs to be structure in place to stop the madness.

In other words, a lot of the developers in your group are going to be eager to please. This is especially true if they regularly interact with their users. There is going to be pressure on them to say things like, “sure, I’ll have that for you in 10 minutes.” But they need not to say things like that. If they can say things like that and they can successfully (attempt to) make them happen, your deployment process is not painful enough.

Hurts so Good

Deployment is not to be taken lightly, especially if there is a release and the users are going to be seeing the result of the work. If you can deploy effortlessly in minutes, there’s a very good chance your process is not painful enough. The situation I’ve been describing above suffers from this very problem — it’s too easy for eager crowd-pleasers to deploy and thus it’s too easy for them to depend on users to be their fast feedback mechanism.

Developers are smart and often opportunistic. Getting fast feedback is extremely important to them, and they’ll naturally seek out ways to procure it. If you let them, they’ll use end users as their feedback mechanism (and, in a tone-deaf sense that ignores end-user perception, this is actually optimal), but you can’t permit this. Rather than following that path of least resistance, or at least familiarity from school/hobbyist days, you need to choke off that path and force them to carve a new riverbed. You need to make deployment more painful.

Now there’s “antiseptic on a cut” painful and “shark gnawing on your leg” painful. You want to gun for the former. A lot of deployment processes that enable developers to SPAM end users with non-functional updates are the product of amateur hour: xcopying files to the server, putting an executable on a share drive, zipping things up and emailing them, etc. These things tend to be both easy to do and easy to botch, so simply setting policies in place that prevent developers from doing them is antiseptic on the cut.

Deployments and especially releases to end users need to follow some process, and coding is simply one stop along the way — not the entirety of it. Ideally, there should be the coding and then developer testing, but from there, automated unit and integration tests, code reviews, static analysis, exploratory/manual testing by QA and observation by a UX group can all be part of the mix. These good practices serve to improve quality in and of themselves, but they also serve to prevent spurious, sloppy releases. If you know you can make a change in five minutes and have it in front of a user in six minutes, you’re going to do that. But if you know that you can make the change in five minutes and it’ll be days of going through the entire release process before you hear back from the end user, you’ll start finding other ways to get fast feedback (such as running unit tests, asking other developers to take a look and working closely with QA). By making deployment more painful you ensure that a lot more care goes into it.

But Hurts Too Much

If you’ve been reading along and grinding your teeth in objection to my premise that your deployment process needs to hurt more, I understand. Things shouldn’t be painful, and deliberately hurting yourself is a form of madness. If your deployment process hurts, it’s too painful, even though I just told you it’s not painful enough. But you have to prevent pain in the right way. Xcopy deployment is like being a boxer addicted to morphine — your process is horrible but pain free. Now imagine that you realize that being addicted to drugs is a problem, so you cut back and start feeling the pain each time a professional puncher unloads on you. In one sense, you’re feeling too much pain because it hurts to get punched in the face, but at the same time you’re not feeling enough pain because the amount you’re feeling isn’t causing you to consider another vocation.

The analogy here may not line up exactly, but the idea is similar. A lot of development and deployment processes are problematically painful in that they’re error-prone and difficult, but not painful enough in that they don’t prevent over-eager deploys and bad decisions. The solution? Get off the junk and stop letting yourself get pummeled by human wrecking balls. Or, in software terms, have a process that makes bad deployments hard and good ones very easy.

Now, getting to this release nirvana is not, itself, simple, but life is simple once you get there. The path to it generally involves a lot of automated testing and good planning. It involves a predictable release cadence, such as a sprint, and a commitment to following the process, not making exceptions nor cramming things in at the 11th hour or pushing back release dates. It involves continuous integration rather than periodic, nightmarish feature branch integrations. It involves a resistance to patching frantically when you make mistakes (you’ll learn a valuable lesson for next time). It involves a simple, fully-automated, easily repeatable build process. It involves a single click/button push deployment process. Summed up, it means that every time someone on your team checks in code, a series of automated tests and checks ensure that the code would be a good candidate for production or it is rejected from checkin until it would be. It means that your code could be shipped with a reasonably high degree of confidence on every checkin and that whether or not to actually ship is a business decision — not a technical one.

I encourage you to take a look at your build process and deployment process. Is it easy because Jim in accounting could do it? If so, it’s too easy and it’s definitely causing you problems. Is it hard enough that you do a lot of checking beforehand because you won’t want to do it again if things go wrong? That’s an improvement because it forces your hand for early testing and vetting, but it’s still a time-wasting problem. First, think about putting obstacles in place to guard against careless deployment, then think about refining those obstacles into process-helping practices, and finally, think about smoothing over the obstacles in the form of complete automation, leaving nothing but good, easy process.

Jul
15

By Erik Dietrich

Intro to Unit Testing 5: Invading Legacy Code in the Name of Testability

Category: Language Agnostic Tags: C#, MS Test, Starting To Unit Test, Unit Testing | 11 Comments

If, in the movie Braveheart, the Scots had been battling a nasty legacy code base instead of the English under Edward Longshanks, the conversation after the battle at Stirling between Wallace and minor Scottish noble MacClannough might have gone like this:

Wallace: We have prevented new bugs in the code base by adding new unit tests for all new code, but bugs will still happen.

MacClannough: What will you do?

Wallace: I will invade the legacy code, and defeat the bugs on their own ground.

MacClannough (snorts in disbelief): Invade? That’s impossible.

Wallace: Why? Why is that impossible? You’re so concerned with squabbling over the best process for handling endless defects that you’ve missed your God-given right to something better.

Goofy as the introduction to this chapter of the series may be, there’s a point here: while unit testing brand new classes that you add to the code base is a victory and brings benefit, to reap the real game-changing rewards you have to be a bit of a rabble-rouser. You can’t just leave that festering mass of legacy code as it is, or it will generate defects even without you touching it. Others may scoff or even outright oppose your efforts, but you’ve got to get that legacy code under test at some point or it will dominate your project and give you unending headaches.

So far in this series, I’ve covered the basics of unit testing, when to do it, and when it might be too daunting. Most recently, I talked about how to design new code to make it testable. This time, I’m going to talk about how to wrangle your existing mess to start making it testable.

Easy Does It

A quick word of caution here before going any further: don’t try to do too much all at once. Your first task after reading the rest of this post should be selecting something small in your code base to try it on if you want to target production and get it approved by an architect or lead, if that’s required. Another option is just to create a playpen version of your codebase to throw away and thus earn yourself a bit more latitude, but either way, I’d advise small, manageable stabs before really bearing down. What specifically you try to do is up to you, but I think it’s worth proceeding slowly and steadily. I’m all about incremental improvement in things that I do.

Also, at the end of this post I’ll offer some further reading that I highly recommend. And, in fact, I recommend reading it before or as you get started working your legacy code toward testability. These books will be a great help and will delve much further into the subjects that I’ll cover here.

Test What You Can

Perhaps this goes without saying, but let’s just say it anyway to be thorough. There will be stuff in the legacy code base you can test. You’ll find the odd class with few dependencies or a method dangling off somewhere that, for a refreshing change, doesn’t reference some giant singleton. So your first task there is writing tests for that code.

But there’s a way to do this and a way not to do this. The way to do it is to write what’s known as characterization tests that simply document the behavior of the existing system. The way not to do this is to introduce ‘corrections’ and cleanup as you go. The linked post goes into more detail, but suffice it to say that modifying untested legacy code is like playing Jenga — you never really know ahead of time which brick removal is going to cause an avalanche of problems. That’s why legacy code is so hard to change and so unpleasant to work with. Adding tests is like adding little warnings that say, “dude, not that brick!!!” So while the tower may be faulty and leaning and of shoddy construction, it is standing and you don’t want to go changing things without putting your warning system in place.

So, long story short, don’t modify — just write tests. Even if a method tells you that it adds two integers and what it really does is divide one by the other, just write a passing test for it. Do not ‘fix’ it (that’ll come later when your tests help you understand the system and renaming the method is a more attractive option). Iterate through your code base and do it everywhere you can. If you can instantiate the class to get to the method you want to test and then write asserts about it (bearing in mind the testability problems I’ve covered like GUI, static state, threading, etc), do it. Move on to the next step once you’ve done the easy stuff everywhere. After all, this is easy practice and practice helps.

Go searching for extractable code

Now that you have a pretty good handle on writing testable code as you’re adding it to the code base and getting untested but testable code under test, it’s time to start chipping away at the rest. One of the easiest ways to do this is to hunt down methods in your code base that you can’t test but not because of the contents in them. Here are two examples that come to mind:

public class Untestable1
{
    public Untestable1()
    {
        TestabilityKiller.Instance.DoSomethingHorribleWithGlobalVariables();
    }

    public int AddTwoNumbers(int x, int y)
    {
        return x + y;
    }
}

public class Untestable2
{
    public void PerformSomeBusinessLogic(CustomerOrder order)
    {
        Console.WriteLine("Total is " + AddTwoNumbers(order.Subtotal, order.Tax));
    }

    private int AddTwoNumbers(int x, int y)
    {
        return x + y;
    }
}

The first class is untestable because you can’t instantiate it without kicking off global state modification and who knows what else. But the AddTwoNumbers method is imminently testable if you could remove that roadblock. In the second example, the AddTwoNumbers method is testable once again, in theory, but with a roadblock: it’s not public.

In both cases, we have a simple solution: move the method somewhere else. Let’s put it into a class called “BasicArithmeticPerformer” as shown below. I do realize that there are other solutions to make these methods testable, but we’ll talk about them later. And I’ll tell you what I consider to be a terrible solution to one of the testability issues that I’ll talk about now: making the private method public or rigging up your test runner with gimmicks to allow testing of private methods. You’re creating an observer effect with testing when you do this — altering the way the code would look so that you can test it. Don’t compromise your encapsulation design to make things testable. If you find yourself wanting to test what’s going on in private methods, that’s a strong, strong indicator that you’re trying to test the wrong thing or that you have a design flaw.

public class BasicArithmeticPerformer
{
    public int AddTwoNumbers(int x, int y)
    {
        return x + y;
    }
}

Now that’s a testable class. So what do the other classes now look like?

public class Untestable1
{
    public Untestable1()
    {
        TestabilityKiller.Instance.DoSomethingHorribleWithGlobalVariables();
    }

    private int AddTwoNumbers(int x, int y)
    {
        return new BasicArithmeticPerformer().AddTwoNumbers(x, y);
    }
}

public class Untestable2
{
    public void PerformSomeBusinessLogic(CustomerOrder order)
    {
        Console.WriteLine("Total is " + AddTwoNumbers(order.Subtotal, order.Tax));
    }

    public int AddTwoNumbers(int x, int y)
    {
        return new BasicArithmeticPerformer().AddTwoNumbers(x, y);
    }
}

Yep, it’s that simple. In fact, it has to be that simple. Modifying this untestable legacy code is like walking a high-wire without a safety net, so you have to change as little as possible. Extracting a method to another class is very low risk as far as refactorings go since the most likely problem that could possibly occur (particularly if using an automated tool) is non-compiling. There’s always a risk, but getting legacy code under test is lower risk in the long run than allowing it to continue rotting and the risk of this particular approach is minimal.

On the other side of things, is this a significant win? I would say so. Even ignoring the eliminated duplication, you now have gone from 0 test coverage to 50% in these classes. Test coverage is not a goal in and of itself, but you can now rest a little easier knowing that you have a change warning system in place for half of your code. If someone comes along later and says, “oh, I’ll just change that plus to a minus so that I can ‘reuse’ this method for my purposes,” you’ll have something in place that will throw up a bid red X and say, “hey, you’re breaking things!” And besides, Rome wasn’t built in a day — you’re going to be going through your code base building up a test suite one action like this at a time.

Code that refers to no class fields is easy when it comes to extracting functionality to a safe, testable location. But what if there is instance-level state in the mix? For example…

public class Untestable3
{
    int _someField;

    public Untestable3()
    {
        TestabilityKiller.Instance.DoSomethingHorribleWithGlobalVariables();
        _someField = TestabilityKiller.Instance.GetSomeGlobalVariableValue();
    }

    public int AddToGlobal(int x)
    {
        return x + _someField;
    }
}

That’s a little tougher because we can’t just pull _someField into a new, testable class. But what if we made a quick change that got us onto more familiar ground? Such as…

public class Untestable3
{
    int _someField;

    public Untestable3()
    {
        TestabilityKiller.Instance.DoSomethingHorribleWithGlobalVariables();
        _someField = TestabilityKiller.Instance.GetSomeGlobalVariableValue();
    }

    public int AddToGlobal(int x)
    {
        return AddTwoNumbers(x, _someField);
    }

    private int AddTwoNumbers(int x, int y)
    {
        return x + y;
    }
}

Aha! This looks familiar, and I think we know how to get a testable method out of this thing now. In general, when you have class fields or local variables, those are going to become arguments to methods and/or constructors of the new, testable class that you’re creating and instantiating. Understand going in that the more local variables and class fields you have to deal with, the more of a testing headache the thing you’re extracting is going to be. As you go, you’ll learn to look for code in legacy classes that refers to comparably few local variables and especially fields in the current class as a refactoring target, but this is an acquired knack.

The reason this is not especially trivial is that we’re nibbling here at an idea in static analysis of object oriented programs called “cohesion.” Cohesion, explained informally, is the idea that units of code that you find together belong together. For example, a Car class with an instance field called Engine and three methods, StartEngine(), StopEngine( )and RestartEngine() is highly cohesive. All of its methods operate on its field. A class called Car that has an Engine field and a Dishwasher field and two methods, StartEngine() and EmptyDiswasher() is not cohesive. When you go sniping for testable code that you can move to other classes, what you’re really looking for is low cohesion additions to existing classes. Perhaps some class has a method that refers to no instance variables, meaning you could really put it anywhere. Or, perhaps you find a class with three methods that refer to a single instance variable that none of the other 40 methods in a class refer to because they all use some other fields on the class. Those three methods and the field they use could definitely go in another class that you could make testable.

When refactoring toward testability, non-cohesive code is the low-hanging fruit that you’re looking for. If it seems strange that poorly designed code (and non-cohesive code is a characteristic of poor design) offers ripe refactoring opportunities, we’re just making lemonade out of lemons. The fact that someone slammed unrelated pieces of code together to create a franken-class just means that you’re going to have that much easier of a time pulling them apart where they belong.

Realize that Giant Methods are Begging to be Classes

It’s getting less and less common these days, but do you ever see object-oriented code which you can tell that the author meandered his way over to from writing C back in the one-pass compiler days? If you don’t know what I mean, it’s code that has this sort of form:

public void PerformSomeBusinessLogic(CustomerOrder order)
{
    int x, y, z;
    double a, b, c;
    int counter;
    CustomerOrder tempOrder;
    int secondLoopCounter;
    string output;
    string firstTimeInput;
            
    //Alright, now let's get started because this is going to be looooonnnng method...
    ...
}

C programmers wrote code like this because in old standards of C it was necessary to declare variables right after the opening brace of a scope before you started doing things like assignment and control flow statements. They’ve carried it forward over the years because, well, old habits die hard. Interestingly, they’re actually doing you a favor. Here’s why.

When looking at a method like this, you know you’re in for doozy. If it has this many local variables, it’s going to be long, convoluted and painful. In the C# world, it probably has regions in it that divide up the different responsibilities of the method. This is also a problem, but a lemons-to-lemonade opportunity for us. The reason is that these C-style programmers are actually telling you how to turn their giant, unwieldy method into a class. All of those variables at the top? Those are your class fields. All of those regions (or comments in languages that don’t support regioning)? Method names.

In one of the resources I’ll recommend, “Uncle” Bob Martin said something along the lines of “large methods are where classes go to hide.” What this means is that when you encounter some gigantic method that spans dozens or hundreds of lines, what you really have is something that should be a class. It’s functionality that has grown too big for a method. So what do you do? Well, you create a new class with its local variables as fields, its region names/comments as method titles, and class fields as dependencies, and you delegate the responsibility.

public class Untestable4
{
    public void PerformSomeBusinessLogic(CustomerOrder order)
    {
        var extractedClass = new MaybeTestable();
        extractedClass.Region1Title();
        extractedClass.Region2Title();
        extractedClass.Region3Title();
    }
}

public class MaybeTestable
{
    int x, y, z;
    double a, b, c;
    int counter;
    CustomerOrder tempOrder;
    int secondLoopCounter;
    string output;
    string firstTimeInput;

    public void Region1Title()
    {

...

In this example, there are no fields in the untestable class that the method is using, but if there were, one way to handle this is to pass them into the constructor of the extracted class and have them as fields there as well. So, assuming this extraction goes smoothly (and it might not be that easy if the giant method has a lot of temporal coupling, resulting from, say, recycled variables), what is gained here? Well, first of all, you’ve slain a giant method, which will inevitably be good from a design perspective. But what about testability?

In this case, it’s possible that you still won’t have testable methods, but it’s likely that you will. The original gigantic method wasn’t testable. They never are. There’s really way too much going on in them for meaningful testing to occur — too many control flow statements, loops, global variables, file I/O, etc. Giant methods are giant because they do a lot of things, and if you do enough code things you’re going to start running over the bounds of testability. But the new methods are going to be split up and more focused and there’s a good chance that at least one of them will be testable in a meaningful way. Plus, with the extracted class, you have control over the new constructor that you’re creating whereas you didn’t with the legacy class, so you can ensure that the class can at least be instantiated. At the end of the day, you’re improving the design and introducing a seam that you can get at for testing.

Ask for your dependencies — don’t declare them

Another change you can make that may be relatively straightforward is to move dependencies out of the scope of your class — especially icky dependencies. Take a look at the original version of Untestable3 again.

public class Untestable3
{
    int _someField;

    public Untestable3()
    {
        TestabilityKiller.Instance.DoSomethingHorribleWithGlobalVariables();
        _someField = TestabilityKiller.Instance.GetSomeGlobalVariableValue();
    }

    public int AddToGlobal(int x)
    {
        return x + _someField;
    }
}

When instantiated, this class goes and rattles some global state cages, doing God-knows-what (icky), and then retrieves something from global state (icky). We want to get a test around the AddToGlobal method, but we can’t instantiate this class. For all we know, to get the value of “someField” the singleton gets the British Prime Minster on the phone and asks him for a random number between 1 and 1000 — and we can’t automate that in a test suite. Now, the earlier option of extracting code is, of course, viable, but we also have the option of punting the offending code out of this class. (This may or may not be practical depending on where and how this class is used, but let’s assume it is). Say there’s only one client of this code:

public class Untestable3Client
{
    public void SomeMethod()
    {
        var untestable = new Untestable3();
        untestable.AddToGlobal(12);
    }
}

All we really want out of the constructor is a value for “_someField”. All of that stuff with the singleton is just noise. Because of the nature of global variables, we can do the stuff Untestable3’s constructor was doing anywhere. So what about this as an alternative?

public class Untestable3Client
{
    public void SomeMethod()
    {
        TestabilityKiller.Instance.DoSomethingHorribleWithGlobalVariables();
        var someField = TestabilityKiller.Instance.GetSomeGlobalVariableValue();
        var untestable = new Untestable3(someField);
        untestable.AddToGlobal(12);
    }
}

public class Untestable3
{
    int _someField;

    public Untestable3(int someField)
    {
        _someField = someField;
    }

    public int AddToGlobal(int x)
    {
        return x + _someField;
    }
}

This new code is going to do the same thing as the old code, but with one important difference: Untestable3 is now a liar. It’s a liar because it’s testable. There’s nothing about global state in there at all. It just takes an integer and stores it, which is no problem to test. You’re an old pro by now at unit testing that’s this easy.

When it comes to testability, the new operator and global state are your enemies. If you have code that makes use of these things, you need to punt. Punt those things out of your code by doing what we did here: executing voids before your constructors/methods are called and asking for things returned from global state or new in your constructors/methods. This is another pretty low-impact way of altering a given class to make it testable, particularly when the only problem is that a class is instantiating untestable classes or reaching out into the global state.

Ruthlessly Eliminate Law of Demeter Violations

If you’re not familiar with the idea, the Law of Demeter, or Principle of Least Knowledge, basically demands that methods refer to as few object instances as possible in order to do their work. You can look at the link for more specifics on what exactly this “law” says, and what exactly is and is not a violation, but the most common form you’ll see is strings of dots (or arrows in C++) where you’re walking an object graph: Property.NestedProperty.NestedNestedProperty.You.Get.The.Idea. (It is worth mentioning that the existence of multiple dots is not always a violation of the Law of Demeter — fluent interfaces in general and Linq in the C# world specifically are counterexamples). It’s when you’re given some object instance and you go picking through its innards to find what you’re looking for.

One of the most immediately memorable ways of thinking about why this is problematic is to consider what happens when you’re at the grocery store buying groceries. When the clerk tells you that the total is $86.28, you swipe your Visa. What you don’t do is wordlessly hand him your wallet. What you definitely don’t do is take off your pants and hand those over so that he can find your wallet. Consider the following code, bearing in mind that example:

public class HardToTest
{
    public string PrepareSsnMessage(CustomerOrder order)
    {
        return "Social Security number is " + order.Customer.PersonalInfo.Ssn;
    }
}

The method in this class just prepends an explanatory string to a social security number. So why on earth do I need something called a customer order? That’s crazy — as crazy as handing the store clerk your pants. And from a testing perspective, this is a real headache. In order to test this method, I have to create a customer, then create an order and hand that to the customer, then create a personal info object and hand that to the customer’s order, and then create an SSN and hand that to the customer’s order’s personal info. And that’s if everything goes well. What if one of those classes — say, Customer — invokes a singleton in its constructor. Well, now I can’t test the “PrepareSsnMessage” in HardToTest because the Customer class uses a singleton. That’s absolutely insane.

Let’s try this instead:

public class HardToTest
{
    public string PrepareSsnMessage(string ssn)
    {
        return "Social Security number is " + ssn;
    }
}

Ah, now that’s easy to test. And we can test it even if the Customer class is doing weird, untestable things because those things aren’t our problem. What about clients, though? They’re used to passing customer orders in, not SSNs. Well, tough — we’re making this class testable. They know about customer order and they its SSN, so let them incur the Law of Demeter violation and figure out how to clean it up. You can only make your code testable one class at a time. That class and its Law of Demeter violation is tomorrow’s project.

When it comes to testing, the more stuff your code knows about, the more setup and potential problems you have. If you don’t test your code, it’s easy to write train wrecks like the “before” method in this section without really considering the ramifications of what you’re doing. The unit tests force you to think about it — “man, this method is a huge hassle to test because problems in classes I don’t even care about are preventing me from testing!” Guess what. That’s a design smell. Problems in weird classes you don’t care about aren’t just impacting your tests — they’re also impacting your class under test, in production, when things go wrong and when you’re trying to debug.

Understand the significance of polymorphism for testing

I’ll leave off with a segue into the next chapter in the series, which is going to be about a concept called “test doubles.” I will explain that concept then and address a significant barrier that you’re probably starting to bump into in your testing travels. But that isn’t my purpose here. For now I’ll just say that you should understand the attraction of using polymorphic code for testing.

Consider the following code:

public class Customer
{
    public string FirstName { get { return TestabilityKiller.Instance.GoGetCustomerFirstNameFromTheDatabase(); } }
}

public class CustomerPropertyFormatter
{
    public string PrepareFirstNameMessage(Customer customer)
    {
        return "Customer first name is " + customer.FirstName;
    }
}

Here you have a class, CustomerPropertyFormatter, that should be pretty easy to test. I mean, it just takes a customer and accesses some string property on it for formatting purposes. But when you actually write a test for this, everything goes wrong. You create a customer to give to your method and your test blows up because of singletons and databases and whatnot. You can write a test with a null argument and amend this code to handle null gracefully, but that’s about it.

But, never fear — polymorphism to the rescue. If you make a relatively small modification to the Customer class, you set yourself up nicely. All you have to do is make the FirstName property virtual. Once you’ve done that, here’s a unit test that you can write:

public class DummyCustomer : Customer
{
    private string _firstName;
    public override string FirstName { get { return _firstName; } }

    /// 

    /// Initializes a new instance of the DummyCustomer class.
    ///

public DummyCustomer(string firstName) { _firstName = firstName; } } [TestMethod, Owner(“ebd”), TestCategory(“Proven”), TestCategory(“Unit”)] public void Adds_Text_To_FirstName() { string firstName = “Erik”; var customer = new DummyCustomer(firstName); var formatter = new CustomerPropertyFormatter(); Assert.IsTrue(formatter.PrepareFirstNameMessage(customer).Contains(firstName)); }

Notice that there is a class, DummyCustomer declared inside of the test class that inherits from the Customer class. DummyCustomer is an example of a test double. You’ll notice that I’ve created a scenario here where I define a version of FirstName that I can control — a benign version, if you will. I effectively bypass that database-singleton thing and create a version of the class that exists only in the test project and allows me to substitute a simple, friendly value that I can test against.

As I said, I’ll dive much more into test doubles next time, but for the time being, understand the power of polymorphism for testability. If the legacy code has methods in it that are hard to use, you can create much more testable situations by the use of interface implementation, inheritance, and the virtual keyword. Conversely, you can make testing a nightmare by using keywords like final and sealed (Java and C# respectively). There are valid reasons to use these, but if you want a testable code base, you should favor liberal support of inheritance and interface implementation.

A Note of Caution

In the sections above, I’ve talked about refactorings that you can do on legacy code bases and mentioned that there is some risk associated with doing so. It is up to you to assess the level of risk of touching your legacy code, but know that any changes you make to legacy code without first instrumenting unit tests can be breaking changes, even small ones guided by automated refactoring tools. There are ways to ‘cheat’ and tips and techniques to get a method under test before you refactor it, such as temporarily making private fields public or local variables into public fields. The Michael Feathers book below talks extensively about these techniques to truly minimize the risk.

The techniques that I’m suggesting here would be ones that I’d typically undertake when requirements changes or bugs were forcing me to make a bunch of changes to the legacy code anyway, and the business understood and was willing to undertake the risk of changing it. I tend to refactor opportunistically like that. What you do is really up to your discretion, but I don’t want to be responsible for you doing some rogue refactoring and torpedoing your production code because you thought it was safe. Changing untested legacy code is never safe, and it’s important for you to understand the risks.

More Information

As mentioned earlier, here are some excellent resources for more information on working with and testing legacy code bases:

Working Effectively with Legacy Code by Michael Feathers
Clean Code by Robert (“Uncle Bob”) Martin
Clean Coders video series, by Robert Martin
The Art of Unit Testing by Roy Osherove (I have not personally read this, but I respect his work that I’m familiar with and have seen it recommended)

And, of course, you can check out my book about unit testing: Starting to Unit Test, Not as Hard as You Think.

Jul
7

By Erik Dietrich

Slow Posting Week

Category: Announcements |

This is just a brief announcement that I’m probably not going to be posting this coming week — I forgot to mention it last week. Readers in the USA and perhaps others know that Thursday was US Independence Day, so between the long weekend for me and the fact that I’m currently out of town (pleasure, not business) until late next week, I’m not really planning to write any blog posts. All I have with me are an Android tablet and an Ubuntu Netbook Remix machine, neither of which has a serious IDE, so if I do get time it’ll probably be philosophical more than technical.

Assuming that the blog is quiet this coming week, there is still stuff to look forward to. I’m continuing to write for the unit testing series, and the Expert Beginner E-Book is in its last phase before release — illustrations and format cleanup. So stay tuned for those and other random thoughts.

DaedTech

Understanding Degrees of Code Flexibility

Users do it themselves.

A dev-ops or operations person changes meta-data in production.

An architect or developer changes meta-data in production.

Someone changes source code that does not involve re-compilation

Someone makes a code change that more or less just involves adding code and very little modification

Someone makes a code change that involves lightweight changes to existing code.

Significant work on the code base is required.

Making Sense of your Options

Throw Out Your Code

Easy Deployment: the Alpha and the Omega

Wrong But Fast

Welcome to the Real World, Grasshopper

Hurts so Good

But Hurts Too Much

Intro to Unit Testing 5: Invading Legacy Code in the Name of Testability

Easy Does It

Test What You Can

Go searching for extractable code

Realize that Giant Methods are Begging to be Classes

Ask for your dependencies — don’t declare them

Ruthlessly Eliminate Law of Demeter Violations

Understand the significance of polymorphism for testing

A Note of Caution

More Information

Slow Posting Week

About Me

Developer Hegemony

Search the Site

Post Archives By Month