DaedTech

Stories about Software

By

Introduction to Unit Testing Part 4: Design New Code For Testability

In the last post in this series, I covered what I think of as an important yet seldom discussed subject: how not to overwhelm yourself and get discouraged when you’re starting to unit test. In the post before that, I showed the basics of writing a unit test. But this leaves something of a gap. You can now write a unit test in a vacuum for an extremely simple class, and, when looking at your legacy code base, you know what to avoid. But you don’t necessarily how to write a non-trivial application with unit tests.

You might understand now how to write tests, assuming that you have some disconnected new class in your code base without dependencies and barriers to testing, but you wonder how to get to that point in the first place. I mean, your application doesn’t seem to need a prime number finder or a bowling score calculator. It needs you to add lines of code to existing methods or new methods to existing classes. It doesn’t really seem to need new classes, so the initial momentum and resolution you’ve built reading these first three posts to go and be a unit tester sort of fizzles anticlimactically when you stare at your code base.

What’s going on here?

Recognizing Inhospitable Terrain

The first thing to understand is that your code base probably wasn’t written with testability in mind. There’s nothing wrong with you for not being able to see where unit testing fits in, because it doesn’t. Last time I talked about things you’ll see that will torpedo your efforts to test a particular method or piece of code, but let me speak to some entire architectures and patterns that don’t lend themselves to testability. It’s not that code written with these technologies and patterns can’t be tested–it’s just that it won’t be easy for you. As you read through this, you might feel like I’ve read your code base’s mind.

  1. Active Record as an architectural pattern
    This is a pattern in which you create classes that are in-memory representations of database tables, views, or stored procedures. If you see in your code base a class called “Customer” that has methods like “GetById()”, “Update()” and “MoveNext()” you’ve got yourself an Active Record architecture. This architecture tightly couples your database to your domain logic and your domain logic to the rules for navigating through domain objects. You can’t test any of these objects since any operation you perform on them sends them scurrying off to create database connections and parameterized queries and all manner of other untestable stuff. And since decomposition and decoupling is the path toward unit testing, this sort of tight coupling of everything in your code is the path away from it.
  2. Winforms
    Winforms in the .NET world are tried and true when it comes to rapidly cranking out functional little applications, but you have to work really, really hard to make code that uses them testable. Q&A sites are littered with people trying to understand how to make Winforms testable, which should tell you that making them testable is not trivial. If you have Winforms and Active Record both in the same code base, at least the architecture is split into two concerns. But it’s split into two thoroughly untestable ones.
  3. Webforms
    See Winforms. Webforms is very similar in terms of framework testability, and for pretty similar reasons. Webforms is arguably even harder to test, however, because it’s predicated on spewing out reams and reams of HTML, CSS, and Javascript while allowing you to pretend you’re writing a desktop app. I’ve talked about my opinion of this technology before.
  4. Wizard/markup-reliant code
    Do you use the Webforms grid wizard thing to generate your grids? Do you define object data sources, such as DB connections or files, in the markup? Practices like these are the epitome of quick and dirty, rapid-prototyping implementations that hopelessly cross couple your applications beyond all testability. If this is something that’s done in your group/code base, testing is basically a non-starter until you go in a different architectural direction.
  5. Everything in your application is in a user control/form
    I’ve seen this called “Smart UI” and it basically means that there’s absolutely no separation of concerns in your code. The UI elements create database connections, write to files, implement business rules–they do everything. Code like this is impossible to unit test.

If any of this is sounding familiar, your task might be daunting. I have my own preferences, but I’m trying not to offer a value judgment here as much as I’m letting you know what you’re up against. I’m like a mortgage broker that’s saying to you, “if you want to own a home, that’s a great goal. But if you are eight months behind on your rent and have no personal savings, you’re going to have some work to do first.” If I’ve described your code base in the list above, you face different challenges than a green field developer. And since you’ve presumably been contributing to these code bases, you’re probably very used to implementation techniques that don’t result in testable code. You’re going to need to change your thinking and your coding practices in order to start writing testable code.

Once we’ve discussed how to get you writing testable code, I’ll come back to these macroscopic concerns and give some pointers for how to improve the situation. But, for now, on to a revised approach to coding. The following holds true whether you’re banging away at some legacy Winforms/Active Record application or starting a brand new MVC 4 site.

Add New Methods and Classes First, Ask Questions Later

First thing to abide by is to favor adding new things to the code over modifying existing things in the code. You may have heard this before in the context of the “Open/Closed Principle”, but that’s a guide for how to write your classes. (Basically, it admonishes you to write classes that others can extend and override rather than change.) I’m talking about how to deal with existing code bases. To put it simply and bluntly, it’s a lot easier to both code and test brand spankin’ new classes than to write and test changes to existing ones. We all know this. It’s at the heart of why we as developers always lean toward rewriting others’ code instead of understanding and working with it.

Now, this might not win you friends. In shops where people tend to write procedural code (you know, the kind you’re trying to get away from writing), they seem to have some weird fear of creating too many classes and masochistic attachment to monolithic structures. You might have to compromise or practice on your own if you run afoul of the project’s architect, but the exercise is invaluable. It’s going to propel you toward decoupling as a default rather than an exception. Doing new things? Time for a new class and some unit tests.

I know what you’re thinking: “But what if the thing that needs to be done has to be done in the middle of some method somewhere?” Well, instantiate your new class at that point and use it. “But what if it needs a bunch of fields from the class it’s in and variables from the method?” Pass them in through the constructor or method call. “But won’t that make my design bad?” It already is bad, but at least now you’re making part of it testable. When your code is testable and under test, everything is easier to fix later.

You’ll have to use some discretion, obviously, but shift your attitude here. Don’t look at a project that has nothing but .aspx files and their code behind and hide classes in there for fear of breaking with tradition. Boldly add pure .cs files to the code base. Add unit tests for those .cs files you’re creating. (Apologies to Java readers, but this has no real Java equivalent that I can think of having used. Plus, the Java stack in general seems not to have the same level of untestable cruft built into frameworks.) It’s a lot harder for someone, even the project architect, to give you a hard time if your new way of doing things is covered by unit tests. Even if they’re hostile Expert Beginners, they’re hard pressed not to sound silly if they say, “we don’t do that here.” You’ll at least have a better chance of pushing this change through with the unit tests than without them.

Ask Questions in the Right Order: What, How, When

Now that you’re creating a lot of new classes and instantiating them in the old untestable ones, it’s time to start working on what kind of code you write. If you’ve practiced and come back, I suspect that you’re starting to be able to write a few useful tests but are perhaps still struggling. And I bet it’s because the line between where the old class ends and your new one begins is a little hazy. Maybe they share some common fields. Maybe when you instantiate the class you’re testing, you hand it a “this” reference so that it can go picking through the properties on the untestable behemoth from which you’re escaping. This is the next thing we need to tighten up–stop doing that. A clear, concise division of labor between the classes is necessary, and it’s not possible if they share all of the same fields, properties, state, etc. That’s like a break up where you two continue to live together, share a car, and go to the movies on weekends.

The best way to achieve this clean split is with good abstraction, and the best way to do that is to remember “what, how, when.” When it’s time to change the code base, remember that you want to favor creating a new class. But before you do that, ask yourself “what?” but not the other two questions. What should I name this class? What should it do? What should it expose as its public methods? Don’t start thinking about how those methods or classes should work and don’t you dare start thinking about when anything at all should happen–just think about what. Give it a good name that defines a clear purpose, and then give it a good set of methods and properties that draw attention to why it’s a different concept than the class that will be using it. Ask yourself what the boundaries between the two classes will be so that you can minimize the amount of shared information. And now, start stubbing out those methods with no implementation and start stubbing out some test methods with names that say what the methods will do.

At this point, you’re ready for “how.” Start actually implementing the “what” and testing that your implementation works in the unit tests. Believe me, it’s much, much easier to implement methods this way. If you think about “what” and “how” at the same time, you start writing confusing code that not even you have faith in when you’re done. Implementation alone is much easier when you have a clear picture of “what,” and unit testing is a breeze.

Once everything is implemented, you can start thinking about “when.” When should you instantiate the class and when should you call its methods? But don’t spend too much time with “when” in your head because it’s dangerous. Does that sound weird? Let me explain.

“When Code” is Monolithic Code

Picture two methods. One is a 700 line juggernaut with more control flow statements than you can keep track of without a spreadsheet. The other is five lines long, consisting only of an initialize statement, a foreach, and a return statement. With which would you rather work? I imagine the response is unanimous here. Even if you tend to crank out these kinds of large methods, when you step through the debugger, looking for the cause of a bug and finding yourself in some huge method, your heart sinks and you settle in with snacks and caffeine because it’s going to be a long day.

Now with these two methods in mind, imagine if we were pair programming together and I simply asked “when?” With the tiny method, you’d probably say, “What do you mean by ‘when’–I mean, you initialize before the loop and you return when you find the record you’re looking for. What a weird question!” With the other method, you’d probably affect a thousand-yard stare and say, “Man, I don’t even know where to begin.” But the answer to that question would fill pages. Books. Because you set the first loop counter j equal to the third loop counter k about twenty lines before the fourth try-catch and fifteen lines before you set the middle loop counter j equal to four. Unless, of course, you threw that exception up on line 2090, in which case j might never have been initialized. Er, wait, I think that happened somewhere near the fifth while loop in that else condition up there. Oh, there’s so much “when,” but it’s all slammed together in a method where you can’t possibly test any of it. Lots of thinking about “when” breeds huge methods like a Petri dish for bacteria.

“When” code is procedural at its core, and procedural, “when” code is an anathema to object-oriented unit testing, which is all about “what” and “how.” Remember earlier in the series when I said that multi-threaded code was really, really hard to test? Well, that’s just a subset of an idea called “temporal coupling,” and what we’re talking about here also falls under that umbrella. Temporal coupling is what happens when things have to be executed in a specific order or else they do not work.

Imagine that you’re coding up a model for someone’s day. When you think of how to do this, do you think, “first he gets up, then he brushes his teeth, then he showers, then he puts on his clothes, then… then he comes home, then he eats dinner, then he watches TV, then he goes to bed?” Do you code this up with constructs like:

public void GoAboutMyDay()
{
    bool wokeUp = WakeUp();
    if (wokeUp == true)
    {
        bool showered = Shower();
        if (showered == true)
        {
            bool putOnClothes = PutOnClothes();
            if(putOnClothes)
                //You get the idea
        }
    }
}

This method is all about “when.” It’s entirely procedural, and it’s going to be horrifying when it’s complete. When you get into the fortieth nested if condition, maybe someone will come along and flatten it out with a bunch of inverted early returns. Or maybe not, because maybe some of if clauses start sprouting else conditions with loops in them. And maybe the methods being called start communicating with one another via boolean flag fields in the class. Who knows–this thing is on the precipice of becoming unstoppable. It might just achieve sentience at some point, so that when you try to start deleting conditionals, it says, “I can’t let you do that, Dave,” and puts them back.

The root problem behind it all is the “when” and the procedural thinking because you’re orienting the implementation around the order of the activities rather than the nature of the activities. Unit testing is all about deconstructing things into their smallest possible chunks and asserting things about those chunks. Temporal coupling and “when” logic is all about chaining and fusing things together.

If you were thinking about “what” first here, you would form a much different mental model of a person’s day. You’d say things to yourself like, “well, during the course of a person’s day, he probably wakes up, gets dressed, eats breakfast–well, actually eats one or more meals–maybe works if it’s a weekday, goes to bed at some point,” etc. Whereas in the procedural “when” modeling you were necessarily building a juggernaut method, here you’re dreaming up the names of methods and/or classes that can be unit tested separately and in isolation. It’s no reach to say, “okay, let’s have a Meal class that will have the following methods…”

Only at the end will you decide “when.” You’ll decide it after you’ve stubbed things out with the “what” and implemented/unit tested them with the “how.” “When” is a detail that you should allow yourself to figure out at any point down the line. If you nail down what and how, you will have testable, modular, and manageable code as you create your classes.

Other Design Considerations for Your New Classes

I’ll wrap up here with a few additional tips for creating testable designs when adding code-to-code bases:

  1. Avoid using fields to communicate between your methods by setting flags and tracking state. Favor having methods that can be executed at any time and in any order.
  2. Don’t instantiate things in your constructor. Favor passing them in (we’ll talk about this in detail in a future post in the series).
  3. Similarly, don’t have a lot of code or do a lot of work in your constructor. This will make your class painful to setup for test.
  4. In your methods, accept parameters that are as decomposed as possible. For instance, don’t accept a Customer object if all you do with it is read its SSN property. In that case, just ask for the SSN.
  5. Avoid writing public static methods. These are easy enough to test (often), but they start introducing testability problems when you write code that uses them. (This might be hard to swallow at first, but mull over the idea of simply not using static methods anymore.)
  6. The earlier you start writing your unit tests, the better. If you find that you’re having a hard time testing your new code, it’s more likely a problem with the code than with unit testing it, and if you write tests early, you’ll discover these problems before you get too far and fix them.

This post has covered ways to write unit tests “from here forward” and ways to stop adding untested code to code bases. In the next post, I’ll talk about how to start getting the legacy code under test.

Addendum: Mitigating the Hostile Test Environments

Finally, as promised, here are ways to accommodate testing in the less-than-ideal architectures mentioned above, if you’re curious or want to do some more research:

  1. Instead of Active Record, look at some kind of ORM solution like NHibernate or Entity Framework. These are tools that generate all of the code for you to access the database so that you don’t have to worry about testing that code and you can focus on writing only your (testable) domain code. Barring that, try to separate the three concerns of Active Record objects: modeling the database, connecting to the database, and modeling a domain object. The first concern adds no value, and the second two can be broken out into separate objects where the only thing hard to test is the actual database access.
  2. Instead of Winforms, favor WPF when possible. If that isn’t possible, see if you can use the Model-View-Presenter (MVP) pattern to move as much logic out of the untestable code-behind as possible.
  3. To be blunt, from a testing/decoupling perspective, Webforms is a disaster. You can have some limited success by adopting a more passive binding model and moving as much code out of the code-behind as possible, but it’s all pretty awkward. Webforms really seems more about rapid-prototyping and Microsoft-Accessing web development than producing scalable, sophisticated architectures.
  4. If you’re using wizards to generate your application’s architecture, cut it out. If you’re defining implementation details in markup, cut it out. Markup is for layout, not unit-testable business logic or state logic. If you depend on definitions in markup to drive your application’s behavior, you’re relying exorbitantly on a third-party framework, which is always extremely brittle from a testability perspective.
  5. To fix Smart UI, you just have to factor toward a more decoupled architecture. Start pulling different concerns out of the user controls and forms and finding a home for them.

By

Introduction to Unit Testing Part 3: Unit Testing Sucks

I don’t know about you, but I remember desperately wanting to be able to drive right up until I was fifteen years old and I got my learner’s permit. I thought about it a lot–how fun it would be, how much freedom I would have, how my trusty old bike would probably get rusty from disuse. About a month after getting my permit, I desperately wanted my license and to drive on my own without supervision. But I’m omitting a month there, during which an unexpected thing happened. I realized that driving was stupid and awful and it sucked and I hated it and I’d never do it, so just forget it!

It was in that month that the abstraction of operating a car and having freedom became the reality of hitting the gas when I meant to hit the brake pulling out of my driveway or not knowing when I was supposed to go after stopping at a stop sign. It was a weird mix of frustration, anger, and fear that tends to accompany new activities–even ones that you know will benefit you. And that’s why the title of this post isn’t simple link bait. I did that not to satirize a position, but to empathize. Like many things when you’re new to them, starting to unit testing quite frankly sucks. It’s frustrating, foreign, and hard to get right. Accordingly, it’s easy to abandon it when you have deadlines to meet.

Accident

This post is about minimizing frustration and barriers to adoption by staying focused and setting reasonable expectations. I would argue that if you’re new to writing tests, writing a few and enjoying localized success without high coverage is a lot more important than suddenly becoming a TDD (Test-Driven Development) expert with 100% test coverage right out of the gate (or at least trying to become one). Incremental progress is good.

Don’t Try TDD Just Yet

I’m a little torn as I write this, but the first thing that I’ll suggest is that you not try TDD if you have no experience unit testing. Some might disagree with this suggestion, but I think that you’re going to be trying to learn too many new things all at once and will be a lot more likely to get frustrated. Unit tests are simply pieces of code that you write, as covered in more detail in the last post in the series. It’s a new kind of code to be writing, but you’re just learning about new methods to call and attributes (or annotations, in Java) to use. You’ll get there.

But TDD is an entirely new way of writing code. It’s a discipline in which you do not write any production code until you have written a unit test that fails. Then you get that test and all other tests to pass and refactor the code as needed. Does that sound crazy (if you discount the fact that a number of developers you respect probably do it)? Exactly. Probably not for you right now. It’s a bridge too far, and you’re more likely to throw up your arms in disgust and quit if you try to learn both things right now. I speak from experience, as, years ago, I was introduced to unit testing and TDD at the same time. I was overwhelmed until I just went back to figuring out the whole unit testing thing alone first. Maybe that wouldn’t happen to you, but I’d caution you to be wary of learning these two things simultaneously.

So let’s stick to learning what unit tests are and how to write them.

Test New Classes Only

In my pluralsight course, I use the example of a method that identifies numbers as prime or not, and in a series of posts I did last fall on TDD, I use the example of something that calculates a bowling score. I’ve also done other code katas and exercises like these in the past to show people both the mechanics of unit testing and TDD.

When I do this, one of the things people frequently say is something along the lines of “pff…sure, when you’re writing something stupid and easy like a prime number finder, but there’s no way that would work on our code base.” I then surprise these people by agreeing with them. I’m sure it wouldn’t work on your code base. Why? Well, because unit tests don’t just magically spring up like mushrooms after a few days of rain. They’re more like roses–you have to plan for them from the start and carefully cultivate an environment in which they can thrive.

Some years back, I saw an excellent talk on “The Deep Synergy Between Testability and Good Design,” by Michael Feathers. I highly suggest watching this talk if you haven’t seen it, but to summarize, he states (and I agree) that well-designed and factored code goes hand in hand with testability. You’re much more likely to find that code written to be testable is good code and, conversely, code written without unit tests in mind is not the greatest. And so if you’re deeply invested in a code base that has never been covered by unit tests, it doesn’t surprise me to hear that you don’t think unit testing would work on your code. I imagine it wouldn’t.

But don’t throw out unit testing because it looks like it wouldn’t work in your code base. Just resolve to do it on new classes that you create. As you go along and get better at unit testing, you’ll start to understand how to write testable classes. It will thus get easier and easier to test all new additions to the code, and you’ll start to get the hang of it with relatively minimal impact on your existing code, your process, or your time. Starting to unit test doesn’t mean that you’re suddenly responsible for testing every line of code in history, nor does it mean you must test every single new line. Just start out by writing a few that you think will help.

Test Existing Code by Extracting Little Classes

Once you get the feel for adding unit tests for new classes/code that you add to the code base, it’s a good time to start taking baby steps toward getting tests in place for your legacy (non-tested) code. Now, some procedural, monolithic mass of code that wasn’t testable a month ago when you started out isn’t magically testable now because you have some practice. It’s still a problem.

You’re going to have to chip away at it. And you’re going to have to do this by developing a new skill: identifying pieces of functionality that you can pull out into new classes and test. Go look through methods and classes and find things that don’t have a lot of dependencies on class fields or (yuck) global/static variables. Excellent candidates for this are methods with pure in-memory operations and ones that deal largely with primitives. Do you have some gigantic method that has a whole region buried in it that does nothing but cobble together a string to be used later in the method? Pull that out into a new class, and write unit tests that make assertions about the string it returns.

As you practice this, you’ll get a better and better feel for what you can pull out with a minimum of friction. You’ll find yourself not only getting more of your codebase under test, but also that you’re improving its design and modularity.

Know When to Fold ‘Em

This is another one that’s hard to type, but you really have to learn to look at code and just say, “nope, not happening.” There are classes and methods that you simply are not going to be able to test unless you come back with a green belt in unit testing–or pair with someone who has hers. And, even then, the prognosis may be that you need to rewrite the legacy class/method altogether to make it testable. Here is a quick list of things that, early in your unit-testing career, you should consider to be deal-breakers and simply move on from to avoid frustration. As a beginner, avoid testing code (class methods and properties) that:

  1. Calls static methods. At best, a static method is functional and returns something that depends only on its inputs. If this is the case (such at functions like Math.Pow() or Math.Abs()), the code is still testable, but a far more common case, especially if the static methods are ones in your own code base, is that they manipulate some kind of global state. Global state is testability kryptonite. I’ll explain more later, but for now, please take my word for it.
  2. Invokes singletons. The singleton design pattern is used almost universally as a politically correct way to hide your global variables in plain sight. For what this means to testability, see the last bullet. If it calls singletons, forget it, move on.
  3. Dispatches background workers or manages threading. When unit tests are run, the unit test runner is responsible for managing threading and it will run your tests in parallel. If you’re trying to make sure your threads and thread management are in one state for production and another for testing, you are about to ruin your day and probably your week. It’s not worth it–don’t try.
  4. Accesses files, connects to databases, calls web services, etc. I mentioned this in the first post in the series, but that was in the context of saying that these things aren’t considered unit tests. Well, another issue here is that they’re also relatively brittle and long running. If you write tests that do these things, they’re going to fail at weird times and in unpredictable ways. You’ll be used to all of your tests passing and suddenly one fails and then passes again, and it turns out it’s because Bill from accounting bumped into the database server and its Nic card is a little “tricky.” If you have unit tests that fail for borderline-inconceivable reasons beyond your control, you will become discouraged.
  5. Code that triggers any of the above anywhere in the call stack. You don’t escape the problems of threading, global state, or externalities by not using them directly. If you trigger them, it’s the same difference.
  6. Classes that require crazy amounts of instantiation. If you want to test a method, but it has forty-five parameters, most of which are classes that are difficult or complex to create, forget it. That code sorely needs reworking, and creating massive, brittle tests for it this early in your career will be a world of pain. Chip away at making the design better before you tackle it.

Don’t worry–I’m not suggesting that you give up on a long timeline, and I’ll continue on with this series and discuss strategy for addressing these things later. But for now, just consider them signals that this code is out of bounds for testing. If you don’t, there’s a high likelihood that you’ll spin your wheels and get angry, frustrated, and irritable, making it more likely that you’ll give up. I can’t eliminate the frustration of being new at something like driving, but I can at least steer you away from six-way traffic lights and three-lane roundabouts.

By

Don’t Write Code You Don’t Need

I was reviewing some code the other day, and I saw a quick-and-dirty logger implementation that looked something like this (I’m re-creating from memory and modifying a bit for illustrative purposes in this post):

public class Logger
{
    public string Path { get; set; }

    public Logger()
    {

    }

    public Logger(string path)
    {
        Path = path;
    }

    public void WriteEntry(string entry)
    {
        using (var writer = File.AppendText(Path))
        {
            writer.WriteLine(entry);
        }
    }
}

A few things probably jump out at you, depending on your C#, OOP, and code reviewing chops. I’d imagine that, at the very least, you think, “you could just delete the default constructor.” You might also wonder why the other constructor exists since you could just set Path whenever you wanted. There’s the lack of null/valid check on path before creating a stream writer with it, and the unnecessary, potentially problematic, and definitely not threadsafe creation of the stream writer over and over. These things are all valid to point out, but I’d say that they’re also all symptoms of two larger root causes that I want to talk about.

Overeager to Please

TooMuchCode

You can offer too much code. By this I mean something subtly but critically different from “you can write too much code,” as when you are needlessly complicated or verbose. Offering too much code means that you’re giving users of the public interface of your classes too many options. Before my TDD days, this was something with which I constantly struggled. To understand what I mean, consider the thinking that probably went into the creation of this class:

Well, let’s see. A file needs a path, so a file logger should probably also take a path as input. From there, it should handle the details of streams and all that other stuff so that all we have to do is give it stuff to put in the file.

So far, so good. This is pretty clever as far as abstractions goes, and whoever wrote this class has a good grasp on how to create abstractions that provide value. But here’s what probably came next.

So, for path, let’s make that a public property in case the user of the class wants to change the path later. For convenience, let’s add a constructor that lets the user specify the path, but let’s also let him know that he doesn’t have to.

That thinking is considerate and helpful, but it’s offering too much code. You need to take ownership of your abstractions and be a little forceful and unyielding with them. You provide what you provide, and if users don’t like your class, they can create an inheritor or write their own or something: “this is my class, take it or leave it.”

In this case, you have to make concrete decisions about how and when the path of the logger is set:

In this class, the path is set at instantiation time and only instantiation time. As long as you live with my logger, you will obey my rules!

public class Logger
{
    public string Path { get; private set; }

    public Logger(string path)
    {
        Path = path;
    }

    public void WriteEntry(string entry)
    {
        using (var writer = File.AppendText(Path))
        {
            writer.WriteLine(entry);
        }
    }
}

Notice that we’ve already eliminated the problem of the pointless constructor by making this decision. We’re also well poised to handle the null/valid checking of the path since we can just do it in the constructor instead of doing it in the constructor and in the setter for path and worrying about duplication, when exactly to validate and how, what to do on invalid set when the last path was valid, etc. It’s also going to be very easy to take care of the issue with the needless creation of StreamWriter instances. Now that you can’t change the path once the instance is created, you can simply create the writer in the constructor and reuse it for the lifetime of the object by storing it as a private field. Now making this threadsafe becomes a pretty manageable task. In general, eliminating one conceptual option eliminates a lot of internal implementation complexity.

But what about the users who want to set the path at some point later in the object instance’s lifetime? Psst…that’s not a real common use case. And you know what? If they really want to do that, they can just instantiate a second logger. Don’t make your life really complicated because you think users of your code might want flexibility. Because you know what they want more than extra flexibility? Code that works. And it’s a lot harder to offer them code that works if you’re bending over backwards to handle every conceivable thing they might want to do.

Pointless or Speculative Mutability

The last section touches on this obliquely, but I’d like to bring this to the fore. Mutable, state-based code is a lot more prone to problems than immutable or functional code that has fixed or no state, respectively. Consider the refactoring path I’ve proposed here. In the last section, I chose to eliminate the public setter for Path instead of the constructor that took Path as an argument. Did that seem like a coin flip and pick one to you? Well, it wasn’t.

With the constructor injection of path, we have to do validity checking and initialization only once, to establish preconditions for the object’s existence. With the public setter for path, we have to maintain object invariants, which tends to be more complex. A good example of this is how we handle the transition from a valid path to the user setting an invalid or null path. Do we throw an exception? Revert to the previous path? You don’t have a discussion when there was no valid previous path as is the case with constructor injection.

This is just an example of a broader idea, which is that mutability creates state transition management tasks and that these tasks are harder to get right. To understand what I mean, imagine that you’re implementing the API for an array. You tell your users, “alright, when you create the array, you specify how many elements it will have, and then you can set and access the elements as you please by index.” The users come back and say, “well, we want to change the size of the array on the fly,” to which you respond, “oh, crap,” as you start to wonder what you’ll do if they want to resize it smaller than what they’ve used or if they try to make it negative or too big or something. You can feel the number of edge cases multiplying.

If you come from a background where you write a lot of procedural, script-based, or hacked-together utility code, this may seem like a weird thing to think about. You’re probably used to doing things like declaring boolean ‘flags’ somewhere in a method and setting them much later in that same method in order to keep track of something further down the line. And if you do this, you’re probably used to a lot of tweaking, guessing, and hoping for the best. But as it turns out, you’re doing things the hard way.

The easy way is to program without any state at all. That means no assignment and no variables, the way you might do if asked to write a method that took x and y and returned 4x + 2y. There’s no need to do any of that: just have “return 4x + 2y” and be done with it. This way of doing things is called functional programming, and its calling card is that output is always a pure function of the input and is entirely predictable.

If functional paradigm isn’t an option (and it often isn’t in the entirety of an application), the next easiest thing to deal with is immutable objects. These are objects that have state, but that state is not modifiable after creation time. An example of an object that can easily be immutable is something like an address. An address is just a handful of strings bound together with a class definition, so why have it be mutable? If you want a different one, just let the one you have go out of scope and create a new one. Then you don’t have to worry about questions like “what if I set a new address1, but not a new address2–that would never happen, right?” In immutable land, that’s simply not a consideration that you bother with.

The hardest way of doing things is with state transition management or mutable state. That’s the nature of the original logger at the top. Anything goes. The object has only transitive state, so it has either to constantly manage all permutations of state changes to check for accuracy or it has to risk being wrong in an invalid state. This is not a great choice and should be avoided if at all possible.

As you develop, you’ll find it’s possible a lot more often than you think. Objects that perform a service, such as a logger, generally have little reason to store mutable state, particularly in a nicely decoupled application. Even a lot of data objects, which represent the very core of what we’d think of as mutable, can be immutable a lot more often than you’d think (e.g. address). Mutability in your code is often best left in places where it’s unavoidable. If you don’t have the luxury of pass-through interaction with a database, you will generally maintain an in-memory domain model that needs to have mutable state because it’s representing the database which is, almost by definition, a whole gigantic system of files full of mutable state. A lot of objects that help you manage user interface will have mutable state (e.g. some class that stores things like “is checkbox X currently checked”). But unavoidable as it may be, you can certainly minimize and isolate it. Whatever you do, don’t introduce it where you don’t need it because you’re creating needless complexity, which means extra things that can go wrong.

Take-Away

The upshot of this exhaustive code-review that I’ve conducted is just advice to consider carefully where your classes fall on the functional-immutable-mutable spectrum and to keep them as far toward the functional side as possible. I’m not suggesting that you run out and rewrite existing code this minute or even that you vow to change how you do things. I’m just suggesting you be aware that mutable objects come with a heavy cost in terms of complexity and difficulty. This will help you in your programming travels in general, and especially if you’re looking to delve into things like architecture, test-driven development, or concurrent (multi-threaded) programming.

By

Language Basics from Unit Tests

Let’s say that in a green field code base someone puts together a type that conceptually is a collection of non-integer values. For the sake of discussion, let’s call it a graph. A graph object might store a series of two-element tuples or perhaps a series of some value type like “point.” The graph might then perform operations on this data, such as IncreaseX() or IncreaseY() or Invert() or Divide()–operations that iterate through the points and do things to them. The actual mechanics of this don’t matter a whole lot. It’s the concept that’s important.

Now let’s say that in the graph the internal representation of the points is a floating point data type such as, well, float. I’m going to save the nuance of floating point arithmetic for a future practical math post, but suffice it say that floats can exhibit some weird-seeming behavior when it comes to comparisons, truncation/rounding, certain kinds of casting and type representations, etc.

[TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
public void Mind_Equals_Blown()
{
    float x = 0.2f;
    float y = 0.1f;
    float z = x + y;

    Assert.IsTrue(z == x + y);  //What the - why does this fail?!?
}

And let’s also say that the person responsible for authoring this graph class hasn’t read a practical math post about floating point arithmetic and is completely oblivious to these potential pitfalls.

And, finally, let’s say that this graph class becomes a mainstay of the business logic in a particular application. It’s modified, extended, and relied heavily upon without a whole lot of attention paid to its internal workings. At least until stuff mysteriously doesn’t work. But when that happens, the culprit isn’t immediately obvious, so strange work-arounds and cargo-cult, oddball solutions spring up when symptoms occur. Extension methods are written, and sometimes entirely different modules are added to the code base because the existing one is “tricky” or “not to be trusted.”

At the application level, this causes maintenance issues, a lot of heated and fruitless arguments, and voodoo approaches to code. From a user interface perspective, this causes quirky behavior. Occasionally a linear graph is completely displaced out of the graph and rendered on some menu somewhere, or the screen goes blank for a few seconds and then the display is restored. Defects and defect reports are created and developers dispatched to track down the issue, but after a few days of fruitless efforts, some project manager quietly sets the defect’s priority from “critical” to “cosmetic” and the software is shipped. It’s embarrassing, but whatcha gonna do. Ya know, computers have a mind of their own sometimes!

MessedUpGraph

Catching it Early

What if, instead of doing things the old-fashioned but all-too-common way, the authors of this code had been writing unit tests and/or practicing TDD? Well, there’s a very good chance that the issue stemming from the graph library is caught immediately as its API methods are being fleshed out from a functionality perspective. There’s a good chance that someone is writing a test and gets to the point that we were at in the code sample above, where they are utterly dumbfounded as to why 1+1 does not equal 2 in float land.

And then, good things happen. The developer in question takes to google or stack overflow, or perhaps he talks to other, more experienced developers on his team. He then gets an explanation, learns something about the language, and leaves the code in a correct state. Contrast this with the non-tested approach of “code it up, build a bad house on the bad foundation, and then ship the result because it’s too late.”

And what if the TDD/unit tests don’t expose this issue? Well, what they’ll do in either case is decouple the code base. So when the issue eventually does crop up via weird GUI behavior, it will be much easier to isolate. When it’s isolated, it will be much easier for the unit-test-savvy developers to write a test that exposes the defect to learn the lesson and fix the issue. It’s still a win.

The point about unit tests helping catch errors and leading to a more decoupled design is hardly controversial. But the benefits go beyond that. Unit tests provide a fast feedback loop for all points in the code base, which lends itself very well to poking and prodding things and experimenting. And that, in turn, leads to better understanding of not only the code, but also the language. If you can execute and get feedback on code extremely quickly, you’re much more likely to ask questions like, “I wonder what happens if I do x…” and then to do it and see. And that sort of experimentation, much like immersion in natural language, leads much more quickly to fluency.

By

Characterization Tests

The Origins of Legacy Code

I’ve been reading the Michael Feathers book, Working Effectively with Legacy Code and enjoying it immensely. It’s pushing ten years old, but it stands the test of time quite well–probably much better than some of the systems it uses as examples. And there is a lot of wisdom to take from it.

When Michael describes “legacy code,” he isn’t using the definition as you’re probably accustomed to seeing. I’d hazard a guess that your definition would be something along the lines of “code written by departed developers” or maybe just “old, bad code.” But Michael defines legacy code as any code that isn’t covered by automated regression tests (read: unit tests). So it’s entirely possible and common for developers to be writing code that’s legacy code as soon as it’s checked in.

HouseOfCardsI like this definition a lot, and not, as some might suspect, out of any purism. I’m not equating “legacy” with “bad,” embracing the definition as a backhanded way to say that people who don’t develop the way that I do write bad code. The reason I like the “test-less” definition of “legacy code” is that it brings to the fore the largest association that I have with legacy code, which is fear of changing it.

Think about what runs through your head when you’re tasked with making changes to some densely-packed, crusty old system. It’s probably a sense of honest to goodness unease or demotivation as you realize the odds of getting things right are low and the odds of headaches and blame are high. The code is rat’s nest of dependencies and weird work-arounds, and you know that touching it will be painful.

Now consider another situation that’s different but with similar results. You have some assignment that you’ve worked on for weeks or months. It’s complicated, the customer isn’t sure what he wants, there have been lots of hiccups and setbacks, and there’s budget and deadline pressure. At the bitter end, after a few all-nighters, a bit of scope reduction, and some concessions, you somehow finally get all of the key features working for the most part. You check in the code for shipping, thinking, “I have no idea how this is working, but thank God it is, and I am never touching that again!” You’ve written code that was legacy code from the get-go.

Legacy code isn’t just the bad code that the team before you wrote, or some crusty old stuff from three language versions ago, or some internal homegrown VBA and Excel written by Steve, who’s actually an accountant. Legacy code is any code that you don’t want to touch because it’s fragile.

Getting Things Under Control

In his book, Michael Feathers lays out a lot of excellent strategies for taming out-of-control legacy code. I highly recommend giving it a read. But he coins a term and technique that I’d like to mention today. It’s something that I think programmers should be aware of because it helps lower the barriers to getting started with unit testing. And that term is “characterization tests.”

Characterization tests are the “I’m Okay, You’re Okay,” Rorschach approach to documenting code. There are no wrong answers–just documenting the way things exist. So if you have a method called AddTwoNumbers(int, int) and it returns 12 when you feed it 1 and 1, you don’t say “that’s wrong.” Instead you write a test that documents that return value and you move on, seeing and documenting what it does with other inputs.

Sound crazy? Well, it’s really not. It’s not crazy because things like this actually happen in real life. When code goes live, people work their processes around it, however much its behavior may be goofy or unintended. Once code is in the wild, “right” and “wrong” cease to matter, and the requirements as they existed some time in the past are forgotten. There only is “what is.” Sounds very zen, and perhaps it is.

One of the most common objections when it comes to unit testing is from developers that work on legacy systems where code is hard to test and no tests exist. They’ll say that they’d do things differently if starting from scratch (which usually turns out not to be true), but that there’s just no tackling it now. And this is a valid objection–it can be very hard to get anything under test. But characterization tests at least remove one barrier to testing, which is having extensive experience writing proper unit tests.

With characterization tests, it’s really easy. Just write a unit test that gets in the vicinity of what you want to document, finagle it until it doesn’t throw runtime exceptions, assert something–anything–and watch the test fail. When it fails, make note of what the expected and actual were, and just change the expected to the actual. The test will now pass, and you can move on. Change some method parameters or variables in other classes or even globals–whatever you have access to and can change without collapsing the system.

Through this poking, prodding, and documenting, you’ll start getting a rudimentary picture of what the system does. You’ll also start getting the hang of the characterization test approach (and perhaps unit testing for real as an added bonus). But most importantly, you’ll finally have the beginnings of an automated safety net. There’s no right and wrong per se, but you will start to be able to see when your changes to the system are making it behave differently in ways you didn’t expect. In legacy, different is dangerous, so it’s invaluable to have this notification system in place.

Characterization tests aren’t going to save the day, and they probably aren’t going to be especially easy to write. At times (global state, external dependencies, etc.) they may even be impossible. But if you can get some in place here and there, you can start taking the fear out of interacting with legacy code.

By the way, if you liked this post and you're new here, check out this page as a good place to start for more content that you might enjoy.