Stories about Software


In Defense of Test Setup Methods

This post started out as a response to a comment on this post, but it started to get rather long for that venue, so I’m making it its own post. The comment concerns my habit of pulling instantiation logic for the class under test into a common setup method executed by the test runner prior to each test. Here are the main points that I’m writing this to address:

  • Why do I dislike code duplication in tests?
  • Extracting logic to a common setup method creates performance problems.
  • The indirection in pulling part of the setup out of a test method hurts readability.

Back Story and Explanations

I wrote this post a little under two years ago. It’s germane here because I explain in it my progression from not using the setup method to doing so. For those not explicitly familiar with the concept and perhaps following along with my Chess TDD series, what I’m referring to is a common pattern (or anti-pattern, I suppose, depending on your opinion) in unit testing across many languages and frameworks. Most test runners allow you to specify a unique method that will be called before each test is run and also after each test is run. In the case of my series, using C# and MSTest, the setup happens on a per class basis. Here’s some code to clarify. First, the class under test:

And, now, the plain, “before,” test class:

And finally, what MS Test allows me to do:

Okay, so what’s going on here? Well, the way that MS Test works is that for each test method, it creates a new instance of the test class and executes the method. So, in the case of our original gravity test class, three instances are created. There is no instance state whatsoever, so this really doesn’t matter, but nonetheless, it’s what happens. In the second version, this actually does matter. Three instances get created, and for each of those instances, the _gravity instance is initialized. The “BeforeEachTest” method, by virtue of its “TestInitialize” attribute, is invoked prior to the test method that will be executed as part of that test.

Conceptually, both pieces of code have the same effect. Three instances are created, three Gravity are instantiated, three invocations of GetVelocityAfter occur, and three asserts are executed. The difference here is that there are three gravity variables of method level scope in the first case, and one instance variable of class scope in the second case.

Okay, so why do I choose the second over the first? Well, as I explained in my old post, I didn’t, initially. For a long, long time, I preferred the first option with no test setup, but I eventually came to prefer the second. I mention this strictly to say that I’ve seen merits of both approaches and that this is actually something to which I’ve given a lot of thought. It’s not to engage the subtle but infuriating logical fallacy in which an opponent in an argument says something like “I used to think like you, but I came to realize I was wrong,” thus in one fell swoop establishing himself as more of an authority and invalidating your position without ever making so much as a single cogent point. And, in the end, it’s really a matter of personal preference.

Why I Don’t Like Duplication

So, before I move on to a bit more rigor with an explanation of how I philosophically approach structuring my unit tests, let me address the first point from the comment about why I don’t like duplication. I think perhaps the most powerful thing I can do is provide an example using this code I already have here. And this example isn’t just a “here’s something annoying that could happen,” but rather the actual reason I eventually got away from the “complete setup and teardown in each test” approach. Let’s say that I like my Gravity class, but what I don’t like is the fact that I’ve hard-coded Earth’s gravity into the GetVelocityAfter method (and, for physics buffs out there, I realize that I should, technically, use the mass of the falling object, gravitational constant, and the distance from center of mass, but that’s the benefit of being a professional programmer and not a physicist — I can fudge it). So, let’s change it to this, instead:

Now, I have some test refactoring to do. With the first approach, I have six things to do. I have to declare a new Planet instance with Earth’s specifications, and then I have to pass that to the Gravity constructor. And then, I have to do that exact same thing twice more. With the second approach, I declare the new Planet instance and add it to the Gravity constructor once and I’m done. For three tests, no big deal. But how about thirty? Ugh.

I suppose you could do some find and replace magic to make it less of a chore, but that can get surprisingly onerous as well. After all, perhaps you’ve named the instance variable different things in different places. Perhaps you’ve placed the instantiation logic in a different order or interleaving in some of your tests. These things and more can happen, and you have have an error prone chore on your hands in your tests the same way that you do in production code when you have duplicate/copy-paste logic. And, I don’t think you’d find too many people that would stump for duplication in production code as a good thing. I guess by my way of thinking, test code shouldn’t be a second class citizen.

But, again, this is a matter of taste and preference. Here’s an interesting stack overflow question that addresses the tradeoff I’m discussing here. In the accepted answer, spiv says:

Duplicated code is a smell in unit test code just as much as in other code. If you have duplicated code in tests, it makes it harder to refactor the implementation code because you have a disproportionate number of tests to update. Tests should help you refactor with confidence, rather than be a large burden that impedes your work on the code being tested.

If the duplication is in fixture set up, consider making more use of the setUp method or providing more (or more flexible) Creation Methods.

He goes on to stress that readability is important, but he takes a stance more like mine, which is a hard-line one against duplication. The highest voted answer, however, says this:

Readability is more important for tests. If a test fails, you want the problem to be obvious. The developer shouldn’t have to wade through a lot of heavily factored test code to determine exactly what failed. You don’t want your test code to become so complex that you need to write unit-test-tests.

However, eliminating duplication is usually a good thing, as long as it doesn’t obscure anything. Just make sure you don’t go past the point of diminishing returns.

In reading this, it seems that the two answers agree on the idea that readability is important and duplication is sub-optimal, but where they differ is that one seems to lean toward, “avoiding duplication is most important even if readability must suffer” and the other leans toward, “readability is most important, so if the only way to get there is duplication, then so be it.” (I’m not trying to put words in either poster’s mouth — just offering my take on their attitudes)

But I think, “why choose?” I’m of the opinion that if redundancy is aiding in readability then some kind of local maximum has been hit and its time to revisit some broader assumptions or at least some conventions. I mean why would redundant information ever be clearer? At best, I think that redundancy is an instructional device used for emphasis. I mean, think of reading a pamphlet on driving safety. It probably tells you to wear your seatbelt 20 or 30 times. This is to beat you over the head with it — not make it a better read.

So, in the end, my approach is one designed to avoid duplication without sacrificing readability. But, of course, readability is somewhat subjective, so it’s really your decision whether or not I succeed as much as it is mine. But, here’s what I do.

My Test Structure

Let’s amend the Gravity class slightly to have it use an IPlanet interface instead of a Planet and also to barf if passed a null planet (more on that shortly):

Let’s then take a look at how I would structure my test class:

Alright, there’s a lot for me to comment on here, so I’ll highlight some things to pay attention to:

  • Instance of class under test is named “Target” to be clear what is being tested at all times.
  • Planet instance is just named “Planet” (rather than “MockPlanet”) because this seems to read better to me.
  • Nested class has name of method being tested (eliminates duplication and brings focus only to what it does).
  • Test initialize method does only the minimum required to create an instance that meets instance preconditions.
  • There’s not much going on in the setup method — just instantiating the class fields that any method can modify.
  • The second two test methods still have some duplication (duplicate setup).

Now that you’ve processed these, let me extrapolate a bit to my general approach to test readability:

  • The class nesting convention helps me keep test names as short as possible while retaining descriptiveness.
  • Only precondition-satisfying logic goes in the initialize method (e.g. instantiating the class under test (CUT) and passing a constructor parameter that won’t crash it)
  • Setting up state in the class under test and arranging the mock objects is left to the test methods in the context of “Given” for those asserts (e.g. in the second test method, the “Given” is “On_Earth” so it’s up to that test method to arrange the mock planet to behave like Earth).
  • I use dependency injection extensively and avoid global state like the plague, so class under test and its depdenencies are all you’ll need and all you’ll see.
  • Once you’re used to the Target/Mocks convention, it’s (in my opinion) as readable, if not more so, than a method variable. As a plus, you can always identify the CUT at a glance in my test code. To a lesser degree, this is true of mocks and constants (in C#, these have Pascal Casing)
  • I suppose (with a sigh) that I’m not quite 100% on the maturity model of eliminating duplication since I don’t currently see a duplication-eliminating alternative to setting up Earth Gravity in two of the test methods. I think it’s not appropriate to pull that into the test initialize since it isn’t universal and adding a method call would just make the duplication more terse and add needless indirection. To be clear, I think I’m failing rather than my methodology is failing — there’s probably a good approach that I simply have not yet figured out. Another important point here is that sometimes the duplication smell is less about how you structure your tests and more about which tests you write. What I mean is… is that “Returns_NineyEight_After_10_Seconds_On_Earth” test even necessary? The duplicate setup and similar outcome is, perhaps, telling me that it’s not… But, I digress.

Addressing the Original Concerns

So, after a meandering tour through my approach to unit testing, I’ll address the second and third original points, having already addressed the question of why I don’t like duplication. Regarding performance, the additional lines of code that are executed with my approach are generally quite minimal and sometimes none (such as a case where there are no constructor-injected dependencies or when every unit test needs a mock). I suspect that if you examined code bases in which I’ve written extensive tests and did a time trial of the test suite my way versus with the instantiation logic inlined into each test, the difference would not be significant. Anecdotally, I have not worked in a code base in a long, long time where test suite execution time was a problem and, even thinking back to code bases where test suite performance was a problem, this was caused by other people writing sketchy unit tests involving file I/O, web service calls, and other no-nos. It’s not to say that you couldn’t shave some time off, but I think the pickings would be pretty lean, at least with my approach of minimal setup overhead.

Regarding readability, as I’ve outlined above, I certainly take steps to address readability. Quite frankly, readability is probably the thing most frequently on my mind when I’m programming. When it comes to test readability, there’s going to be inevitable disagreement according to personal tastes. Is my “Target” and initialization convention more readable than an inline method variable also with the convention of being named “Target”? Wow — eye of the beholder. I think so because I think of the instantiation line as distracting noise. You may not, valuing the clarity of seeing the instantiation right there in the method. Truth is with something like that, what’s readable to you is probably mainly a question of what you’re used to.

But one thing that I think quite strongly about test readability is that it is most heavily tied in with compactness of the test methods. When I pick up a book about TDD or see some kind of “how to” instructional about unit tests, the tests always seem to be about three lines long or so. Arrange, act, assert. Setup, poke, verify (as I said in that old post of mine). Whatever — point is, the pattern is clear. Establish a precondition, run an experiment, measure the outcome. As the setup grows, the test’s readability diminishes very quickly. I try to create designs that are minimal, decoupled, compatible with the Single Responsibility Principle, and intuitive. When I do this, test methods tend to remain compact. Eliminating the instantiation line from every test method, to me, is another way of ruthlessly throttling the non-essential logic inside those test methods, so that what’s actually being tested holds center stage.

So, in the end, I don’t know that I’ve made a persuasive case — that’s also “eye of the beholder.” But I do feel as though I’ve done a relatively thorough job of explaining my rationale. And, my mind is always open to being changed. There was a time when I had no test setup method at all, so it’s not like it would be unfamiliar ground to return there.

By the way, if you liked this post and you're new here, check out this page as a good place to start for more content that you might enjoy.