DaedTech

Stories about Software

By

Go On, Live a Little. Denormalize Your Data

I have years of professional experience with and completed several academic courses about relational databases. This is a pretty familiar topic for me, and I’ve worked with a bunch of them: SQL Server, SQLLite, MySQL, PostgreSQL, Oracle, and even MS Access. I’ve always found them, paradoxically, to be both comfortable and awkward. They’re comfortable because of my familiarity with them and awkward because I’ve used them almost exclusively with OO languages and incurred the impedance mismatch.

While RDBMS are in my comfort zone, NoSQL alternatives are comparably alien but also intriguing. Ayende has made some pretty convincing arguments as to why one should think outside the RDBMS, including a .NET Rocks episode in which he talks about constraints, such as expensive disk space, that applied decades ago when RDBMS were designed and are no longer of concern. I’m receptive, my interest is piqued, and I’ve played a bit with MongoDB To understand how these things work. But I think it really clicked for me the other day when I was staring at a schema that had some definite complexity issues. I think I finally groked the use case for document databases.

I was working with a pre-existing schema recently in which (mildly obfuscated) you have a “customer” who can be part of one or more “programs.” The entities “customer” and “program” each have their own properties and, as any diligent normalizer will tell you, that information should not be repeated. So they get their own tables. Since a customer can participate in multiple programs and multiple customers can be in the same program, this is an “M to N” relationship — represented, predictably, by a linking table. What’s more, customers can participate repeatedly in programs so the linking table will sometimes have multiple participation records per customer-program pair, differing by a participation timestamp. I’m sure those of us who have done any significant database development have played out similar scenarios countless times.

As I wrangled Entity Framework to deal with this and other similar relationships that were making it relatively complicated to pull certain information, I started thinking that this seemed harder than it needed to be. There was no use case for handling “what if the program name or another attribute changes in the middle of its offering,” and there really are no good answers to that question. Right now, the implementation is “you can’t do that,” but this is hardly satisfying. To administrative users, it seems arbitrary and like a system shortcoming. But the alternative is unpleasant, too — additional complexity to allow “partial programs” in the schema or the code.

I started thinking about normalization and whether it really mattered here. Allowing duplication is a cardinal RDBMS sin, but it sure would make life easier for a lot of use cases. I mean, imagine a scheme where there was just a table of customers, and participation was recorded as you went, with the program information being duplicated in each participation record. The downside is the information duplication, but the upside is that changing programs midstream is trivial and there is less table overhead and complexity to maintain. No M to N relationships and worrying about whether a program is current or replaced by a new one.

And that’s when it hit me. RDBMS are great for maintaining, well, relational information. For instance, if I work at a company, I have a boss, and my boss has many reports. Modeling the manager-line employee relationship is a great fit for a relational database because it’s about modeling relationships, like “manager to reports.” Tree structures of composition tend to be a good fit as well, such as assemblies like cars and other pieces of machinery. Parts consist of sub-parts and so on. These are relationships in the sense that operations like “delete” and “replace” make sense.

But what about the one-and-done concept of a customer participating in a program one day? That isn’t a relationship any more than I form a relationship with a waiter that I tip and then never see again. That’s something that happened once — not an ongoing arrangement. And that’s where things really snapped into focus for me. RDBMS and the normalized model is great for managing relationships but is not necessarily the be-all-end-all for managing extremely transactional data in the sense of event recording.

And while I’m probably not in a position for this project to quickly introduce document databases and stir up the works too much, I am considering modeling what I might do storage-wise after a document database there in SQL Server. Perhaps it’s time to throw caution to the winds and start allowing some information repetition in there for historical modeling. If the experiment fails, I can always normalize it. But if it succeeds, adding document databases to the mix will be a lot easier since I’ll just be learning the syntax of the API as I go rather than the fundamental underlying concepts and use cases.

By

YAGNI: YAGNI

A while back, I wrote a post in which I talked about offering too much code and too many options to other potential users of your code. A discussion emerged in the comments roughly surrounding the merits of the design heuristic affectionately known as “YAGNI,” or “you ain’t gonna need it.”

In a vacuum, this aphorism seems to be the kind of advice you give a chronic hoarder: “come on, just throw out 12 of those combs — it’s not like you’re going to need them.” So what does it mean, exactly, in the context of programming, and is it good advice in general? After all, if you applied the advice that you’d give a hoarder to someone who wasn’t, you might be telling them to throw out their only comb, which wouldn’t make a lot of sense.

The Motivation for YAGNI

As best I understand, the YAGNI principle is one of the tenets of Extreme Programming (XP), an agile development approach that emerged in the 1990s and stood in stark contrast to the so-called “waterfall” or big-design-up-front approach to software projects. One of the core principles of the (generally unsuccessful) waterfall approach is a quixotic attempt to figure out just about every detail of the code to be written before writing it, and then, afterward, to actually write the code.

I personally believe this silliness is the result of misguided attempts to mimic the behavior of an Industrial Revolution-style assembly line in which the engineers (generally software architects) do all the thinking and planning so that the mindless drones putting the pieces together (developers) don’t have to hurt their brains thinking. Obviously, in an industry of knowledge workers, this is silly … but I digress.

YAGNI as a concept seems well suited to address the problematic tendency of waterfall development to generate massive amounts of useless code and other artifacts. Instead, with YAGNI (and other agile principles) you deliver features early, using the simplest possible implementation, and then you add more and refactor as you go.

YAGNI on a Smaller Scale

But YAGNI also has a smaller scale design component as well, which is evident in my earlier post. Some developers have a tendency to code up classes and add extra methods that they think might be useful to themselves or others at some later point in time. This is often referred to as “gold plating,” and I think the behavior is based largely on the fact that this is often a good idea in day-to-day life.

“As long as I’m changing the lightbulb in this ceiling lamp, I may as well dust and do a little cleaning while I’m up here.”

Or perhaps:

“as long as I’m cooking tonight, I might as well make extra so that I have leftovers and can save time tomorrow.”

But the devil is in the details with speculative value propositions. In the first example, clear value is being provided with the extra task (cleaning). In the second, the value is speculative, but the odds of realization are high and the marginal cost is low. If you’re going to the effort of making tuna casserole, scaling up the ingredients is negligible in terms of effort and provides a very likely time savings tomorrow.

But doesn’t that apply to code? I mean, adding that GetFoo() method will take only a second and it might be useful later.

Well, consider that the planning lifespan for code is different than casserole. With the casserole, you have a definite timeframe in mind — some time in the next few nights — whereas with code, you do not. The timeframe is “down the line.” In the meantime, that code sits in the fridge of your application, aging poorly as people forget its intended purpose.

You wind up with a code-fridge full of containers with goop in them that doesn’t exactly smell bad, but isn’t labeled with a date and you aren’t sure of its origin. And I don’t know about you, but if I were to encounter a situation like that, the reaction would be “I don’t feel like risking it, I’m ordering Chinese.” And, “nope, I’m out of here,” isn’t a good feeling to have when you open your application’s code.

Does YAGNI Always Apply?

So there is an overriding design principle YAGNI, and there is a more localized version — both of which seem to be reactions toward a tendency to comically over-plan and a tendency to gold plate locally, respectively. But does this advice always hold up, reactions notwithstanding?

I’m not so sure. I mean, let’s say that you’re starting on a new .NET web application and you need to test that it displays “hello world.” The simplest thing that could work is F5 and inspection. But now let’s say that you have to give it to a tester. The simplest thing is to take the compiled output and manually copy it to a server. That’s certainly simpler than setting up some kind of automated publish or continuous deployment scenario. And now you’re in sort of a loop, because at what point is this XCopy deploy ever not going to be the simplest thing that could work for deploying? What’s the tipping point?

Getting Away from Sloganeering

Now I’m sure that someone is primed to comment that it’s just a matter of how you define requirements and how stringent you are about quality. “Well, it’s the simplest thing that could possibly work but keeping the overall goals in mind of the project and assuming a baseline of quality and” — gotta cut you off there, because that’s not YAGNI. That’s WITSTTCPWBKTOGIMOTPAAABOQA, and that’s a mouthful.

It’s a mouthful because there’s suddenly nuance.

The software development world is full of metaphorical hoarders, to go back to the theme of the first paragraph. They’re serial planners and pleasers that want a fridge full of every imaginable code casserole they or their guests might ever need. YAGNI is a great mantra for these people and for snapping them out of it. When you go sit to pair or work with someone who asks you for advice on some method, it’s a good reality check to say, “dude, YAGNI, so stop writing it.”

But once that person gets YAGNI — really groks it by adopting a practice like TDD or by developing a knack for getting things working and refactoring to refinement — there are diminishing returns to this piece of advice. While it might still occasionally serve as a reminder/wake-up call, it starts to become a possibly counterproductive oversimplification. I’d say that this is a great, pithy phrase to tack up on the wall of your shop as a reminder, especially if there’s been an over-planning and gold-plating trend historically. But that if you do that, beware local maxima.

By

Moving Away from State: State–

In 1968, Edsger Dijkstra issued a letter entitled “Go To Statement Considered Harmful,” and the age of structured programming was born. The letter was a call to stop programmers of the time from creating ad-hoc control flow structures using goto statements and to instead use these higher level constructs for manipulating flow through a function (contrary, I think, to the oft-attributed position that “goto is evil”). This gave rise to structured programming because it meant that progress in a method would be more visually trackable.

But I think there’s an interesting underlying concept here that informs a lot of shifts in programming practice. Specifically, I’m referring to the idea that Dijkstra conceived of “goto” as existing at an inappropriate level of abstraction when stacked with concepts like “if” and “while” and “case.” Those latter are elements of logical human reasoning while the former is a matter of procedure for a compiler. Or, to put it another way, the control flow statements tell humans how to read business logic, and the goto tells the compiler how to execute a program.

“State Considered Harmful” seems to be the new trend, ushering in the renaissance of functional programming. Functional programming itself is not a new concept. Lisp has been around for half a century, meaning that it actually predates object-oriented programming. Life always seems to be full of cycles, and this is certainly an example. But there’s more to it than people seeking new solutions in old ideas. The new frontier in faster processing and better computer performance is parallel processing. We can’t fit a whole lot more transistors on a chip, but we can fit more chips in a computer — and design schemes to split the work between them. And in order to do that successfully, it’s necessary to minimize the amount of temporarily stored information, or state.

I’ve found myself headed in that direction almost subconsciously. A lot of the handiest tools push you that way. I’ve always loved the fluent Linq methods in C#, and those generally serve as a relatively painless introduction to functional programming. You find yourself gravitating away from nested loops and local variables in favor of chained calls that express semantically what you want in only a line or two of code. But gravitating toward functional programming style goes beyond just using something like Linq, and it involves favoring chains of methods in which the output is a pure, side-effect-free function of the input.

EmptyGlass

Here have been some of the stops on my own journey away from state.

  1. Elimination of global state. The first step in this journey for me was to realize how odious global state is for the maintainability of an application.
  2. No state variables for communication between methods. Next to go for me were ‘flag’ variables that kept track of things in a class between method calls. As I became more proficient in unit testing, I found this to be a huge headache as it created complicated and brittle test setup, and it was really a pointless crutch anyway.
  3. Immutable > mutable. I’ve blogged about an idea I called “pointless mutability,” but in general I’ve come to favor immutable constructs over mutable ones whenever possible for simplicity.
  4. State isolation — for instance, model objects and domain objects for business logic state and viewmodels/controllers for GUI state. Aside from that, a lot of applications for which I am the architect retain virtually no state information. Services and other application scaffolding types of classes simply have interfaced references to their collaborators, but really keep track of nothing between method calls.
  5. Persistence ignorance — letting the application pretend that its state is storage. For less sophisticated (CRUD-style) applications, I’ve favored scenarios in which most of the code’s state is abstracted into a lower layer of the application and implemented only externally. In other words, if things are simple, let something like a database or file system be your application’s state. Why cache and over-complicate until/unless performance is an issue?

And that’s where I stand now as I write object-oriented code. I am interested in diving more into functional languages, as I’ve only played with them here and there since my undergrad days. It isn’t so much that they’re the new hotness as it is that I find myself heading that way anyway. And if I’m going to do it, I might as well do it consciously and in directed fashion. If and when I do get to do more playing, you can definitely bet that I’ll post about it.

By

Introduction to Unit Testing Part 6: Test Doubles

In the last two posts in this series, I talked about how to test new code in your code base and then how to bring your legacy code under test. Toward the end of the last chapter in this series, I talked a bit about the concept of test doubles. The example I showed was one in which I used polymorphism to create a “dummy” class that I used in a test to circumvent otherwise untestable code. Here, I’ll dive into a lot more detail on the subject, starting out with a much simpler example than that and building to a more sophisticated way to handle the management of your test doubles.

First, a Bit of Theory

Before we get into test doubles, however, let’s stop and talk about what we’re actually doing, including theory about unit tests. So far, I’ve showed a lot of examples of unit tests and talked about what they look like and how they work (for instance, here in post two where I talk about Arrange, Act Assert). But what I haven’t addressed, specifically, is how the test code should interact with the production code. So let’s talk about that a bit now.

By far the most common case when unit testing is that you instantiate a class under test in the “arrange” part of your unit test, and then you do whatever additional setup is necessary before calling some method on that class. Then you assert something that should have happened as a result of that method call. Let’s return to the example of prime finder from earlier and look at a simple test:

[TestMethod]
public void Returns_False_For_One()
{
    var primeFinder = new PrimeFinder(); //Arrange

    bool result = primeFinder.IsPrime(1); //Act

    Assert.IsFalse(result); //Assert
}

This should be reviewed from the perspective of “arrange, act, assert,” but let’s look specifically at the “act” line. Here is the real crux of the test; we’re writing tests about the IsPrime method and this is where the action happens. In this line of code, we give the method an input and record its output, so it’s the perfect microcosm for what I’m going to discuss about a class under test: its interactions with other objects. You see, unit testing isn’t about executing your code — you can do that with integration tests, console apps, or even just by running the application. Unit testing, at its core, is about isolating your classes and running experiments on them, as if you were a scientist in a lab. And this means controlling all of the inputs to your class — stimulus, if you will — so that you can observe what it puts out.

Controlling the inputs in the PrimeFinder class is simple. Because I’m telling you that there are no invocations of global/static state (which will become an important theme as we proceed). You can see by looking at the unit test that the only input to the class under test (CUT) is the integer 1. This means that the only input/stimulus that we supply to the class is a simple integer, making it quite easy to make assertions about its behavior. Generally speaking, the simpler the inputs to a class, the easier that class is to test.

There are Inputs and There are Inputs

Omitting certain edge cases I can think of (and probably some that I’m not thinking of), let’s consider a handful of relatively straightforward ways that a class might get ahold of input information. There is what I did above — passing it into a method. Another common way to give information to a class is to use constructor parameters or setter methods/properties. I’ll refer to these as “passive collaboration” from the perspective of the CUT, since it’s simply being given the things that it needs. There is also what I’ll call “semi-passive collaboration,” which is when you pass a dependency to the CUT and the CUT interacts in great detail with that dependency, mutating its state and querying it. An example of this would be “Car theCar = new Car(new Engine())”, in which performing operations on Car related to starting and driving result in rather elaborate modifications to the state of Engine. It’s still passive in the sense that you’re handing the Engine class to the car, but it’s not as passive as simply handing it an integer. In general, passive input is input that the scope instantiating the CUT controls — constructor parameters, method parameters, setters, and even things returned from methods of objects passed to the CUT (such as the Car class calling _engine.GetTemperature() in the example in this paragraph).

In contrast, there is also “active collaboration,” which is when the CUT takes responsibility for getting its own inpu. This is input that you cannot control when instantiating the class. An example of this is a call to some singleton or public static method in the CUT. The only way that you can reassume control is by not calling the method in which it occurs. If static/singleton calls occur in the constructor, you simply cannot test or even instantiate this class without it doing whatever the static code entails. If it retrieves values from static state, you have no control over those values (short of mocking up the application’s global state).

A second form of active collaboration is the “new” operator. This is very similar to static state in that when you create the CUT, you have no control over this kind of input to the CUT. Imagine if Car new-ed up its own Engine and queried it for temperature. There would be absolutely no way that you could have any effect on this operation in the Car class short of not instantiating it. Like static calls, object instantiation renders your CUTs a non-negotiable, “take it or leave it” proposition. You can have them with all of their instantiated objects and global state or you can write your own, buddy.

Not all inputs to a class are created equal. There are a CUT’s passive inputs, in which the CUT cedes control to you. And then there are the CUT’s active inputs that it controls and on which it does not allow you to interpose in any way. As it turns out, it is substantially easier to test CUTs with exclusively passive collaboration/input and difficult or even impossible to test CUTs with active collaboration. This is simply because you cannot isolate actively collaborating CUTs.

Literals: Too Simple to Need Test Doubles

There’s still a little bit of work to do before we discuss test doubles in earnest. First, we have to talk about inputs that are too simple to require stand-ins: literals. The PrimeFinder test above is the perfect example of this. It’s performing a mathematical operation using an integer input, so what we’re interested in testing is known input-output pairs in a functional sense. As such, we just need to know what to pass in, to pass that value in, and then to assert that we get the expected return value.

In a strict sense, we could refer to this as a form of test double. After all, we’re doing a non-production exercise with the API, so the value we’re passing in is fake, in a sense. But that’s a little formal for my taste. It’s easier just to think in terms of literals almost always being too simple to require any sort of substitution of behavior.

An interesting exception to this the null literal (of null type) or the default value of a non-nullable type. In many cases, you may actually want to be testing this as an input since null and 0 tend to be particularly interesting inputs and the source of corner cases. However, in some cases, you may be supplying what is considered the simplest form of test double: the dummy value. A dummy value is something you pass into a function to say, “I don’t care what this is and I’m just passing in something to make the compiler happy.” An example of where you might do this is passing null to a constructor of an object instance when you just want to make assertions as to what some of its property values initialize to.

Simple/Value Objects and Passing in Friendlies

Next up for consideration is the concept of a “test stub,” or what I’ll refer to in the general sense as a “friendly.”

Friendlies

Take a look at this code:

public class Car
{
    public int EngineTemperature { get; private set; }

    public Car(Engine engine)
    {
        EngineTemperature = engine.TemperatureInFahrenheit;
    }
}
public class Engine
{
    public int TemperatureInFahrenheit { get; set; }
}

Here is an incredibly simple implementation of the Car-Engine pair I described earlier. Car is passed an Engine and it queries that Engine for a local value that it exposes. Let’s say that I now want to test that behavior. I want to test that Car’s EngineTemperature property is equal to the Engine’s temperature in fahrenheit. What do you think is a good test to write? Something like this, maybe —

[TestMethod]
public void EngineTemperature_Initializes_Value_Returned_By_Engine()
{
    const int engineTemperatureFromEngine = 200;
    Engine engine = new Engine() { TemperatureInFahrenheit = engineTemperatureFromEngine };
    var car = new Car(engine);

    Assert.AreEqual(engineTemperatureFromEngine, car.EngineTemperature);
}

Here, we’re setting up the Engine instance in such a way as that we control what it provides to Car when Car uses it. We know by inspecting the code for Car that Car is going to ask Engine for its TemperatureInFahrenheit value, so we set that value to a known commodity, allowing us to compare in the Assert. To put it another way, we’re supplying input indirectly to Car by setting up Engine and telling Engine what to give to Car. It’s important to note that this is only possible because Car accepts Engine as an argument. If Car instantiated Engine in its constructor, it would not be possible to isolate Car because any test of Car’s initial value would necessarily also be a test of Engine, making the test an integration test rather than a unit test.

Creating Bonafide Mocks

That’s all well and good, but what if the Engine class were more complicated or just written differently? What if the way to get the temperature was to call a method and that method went and talked to a file or a database or something? Think of how badly the testing for this is going to go:

public class Car
{
    public int EngineTemperature { get; private set; }

    public Car(Engine engine)
    {
        EngineTemperature = engine.TemperatureInFahrenheit;
    }
}
public class Engine
{
    public int TemperatureInFahrenheit
    {
        get
        {
            var stream = new StreamReader(@"C:\whatever.txt");
            return int.Parse(stream.ReadLine());
        }
    }
}

Now, when we instantiate a Car and query its engine temperature property, suddenly file contents are being read into memory and, as I’ve already covered in this series, File I/O is a definite no-no in a unit test. So I suppose we’re hosed. As soon as Car tries to read Engine’s temperature, we’re going to explode — or we’re going to succeed, which is even worse because now you’ll have a unit test suite that depends on the machine it’s running on having the file C:\whatever.txt on it and containing an integer as its first line.

But what if we got creative the way we did at the end of the last episode of this series? Let’s make the TemperatureInFahrenheit property virtual and then declare the following class:

public class FakeEngine : Engine
{
    private int _temperature;

    public override int TemperatureInFahrenheit
    {
        get { return _temperature; }
    }

    public FakeEngine(int temperature)
    {
        _temperature = temperature;
    }

This class is test-friendly because it doesn’t contain any file I/O at all and it inherits from Engine, overriding the offending methods. Now we can write the following unit test:

[TestMethod]
public void EngineTemperature_Initializes_Value_Returned_By_Engine()
{
    const int engineTemperatureFromEngine = 200;
    Engine engine = new FakeEngine(engineTemperatureFromEngine);
    var car = new Car(engine);

    Assert.AreEqual(engineTemperatureFromEngine, car.EngineTemperature);
}

If this seems a little weird to you, remember that our goal here is to test the Car class and not the engine class. All that the Car class knows about Engine is that it wants its TemperatureInFahrenheit property. It doesn’t (and shouldn’t) care how or where this comes from internally to Engine — file I/O, constructor parameter, secret ink, whatever. And when testing the Car class, you certainly don’t care. Another way to think of this is that you’re saying, “assuming that Engine tells Car that the engine temperature is 200, we want to assert that Car’s EngineTemperature property is 200.” In this fashion, we have isolated the Car class and are testing only its functionality.

This kind of test double and testing technique is known as a Fake. We’re creating a fake engine to stand-in for the real one. It’s not simple enough to be a dummy or a stub, since it’s a real, bona-fide different class instead of a doctored version of an existing one. I realize that the terminology for the different kinds of test doubles can be a little confusing, so here’s a helpful taxonomy of them.

Mocking Frameworks

The last step in the world of test doubles is to get to actual mock objects. If you stop and ponder the fake approach from the last section a bit, a problem might occur to you. The problem has to do with long-term maintenance of code. I remember, many moons ago when I discovered the power of polymorphism for creating fake objects, that I thought it was the greatest thing under the sun. Obviously there was at least one fake per test class with dependency, and sometimes there were multiple dependencies. And I didn’t always stop there — I might define three or four different variants of the fake, each having a method that behaved differently for the test in question. In one fake, TemperatureInFarenheit would return a passed in value, but in another, it would throw an exception. Oh, there were so many fakes — I was swimming in fakes for classes and fakes for interfaces.

And they were awesome… until I added a method to the interface they implemented or changed behavior in the class they inherited. And then, oh, the pain. I would have to go and change dozens of classes. And then there was also the fact that all of this faking took up a whole lot of space. My test classes were littered with nested classes of fakes. It was fun at first, but the maintenance became a drudgery. But don’t worry, because my gift to you is to spare you that pain.

What if I told you that you could implement interfaces and inherit from classes anonymously, without actually creating source code that did this? I’d be oversimplifying a bit, but once you got past that, you’d probably be pretty excited. I say this because, as you start to grasp the concept of mocking frameworks, this kind of “dynamic interface implementation/inheritance” is the easiest way to reason about what it’s doing, from a practical perspective, without getting bogged down in more complicated concepts like reflection and direct work with byte-code and other bits of black magic.

As an example of this in action, take a look at how I go about testing the Car and Engine with the difficult dependency. The first thing that I do is delete the Fake class because there’s no need for it. The next thing I do is write a unit test, using a framework called JustMock by Telerik (this is currently my preferred mocking framework for C#).

[TestMethod]
public void EngineTemperature_Initializes_Value_Returned_By_Engine()
{
    const int engineTemperatureFromEngine = 200;
            
    var engine = Mock.Create();
    engine.Arrange(e => e.TemperatureInFahrenheit).Returns(engineTemperatureFromEngine);

    var car = new Car(engine);

    Assert.AreEqual(engineTemperatureFromEngine, car.EngineTemperature);
}

Notice that instead of instantiating an engine, I now invoke a static method on a class called Mock that takes care of creating my dynamic inheritor for me. Mock.Create() is what creates the equivalent of FakeEngine. On the next line, I invoke an (extension) method called Arrange that creates an implementation of the property for me as well. What I’m saying, in plain English, is “take this mock engine and arrange it such that the TemperatureInFahrenheit property returns 200.” I’ve done all of this in one line of code instead of adding an entire nested class. And, best of all, I don’t need to change this mock if I decide to change some behavior in the base class or add a new method.

Truly, once you get used to the concept of mocking, you’ll never go back. It will become your best friend for the purposes of mocking out dependencies of any real complexity. But temper your enthusiasm just a bit. It isn’t a good idea to use mocking frameworks for simple dependencies like the PrimeFinder example. The lite version of JustMock that I’ve used and many others won’t even allow it, and even if they did, that’s way too much ceremony — just pass in real objects and literals, if you can reasonably.

The idea of injecting dependencies into classes (what I’ve called “passive” and “semi-passive” collaboration) is critical to mocking and unit testing. All basic mocking frameworks operate on the premise that you’re using this style of collaboration and that your classes are candidates for polymorphism (either interfaces or overridable classes). You can’t mock things like primitives and you can’t mock sealed/final classes.

There are products out there called isolation frameworks that will grant you the ability to mock pretty much everything — primitives, sealed/final classes, statics/singletons, and even the new operator. These are powerful (and often long-running, resource-intensive) tools that have their place, but that place is, in my opinion, at the edges of your code base. You can use this to mock File.Open() or new SqlConnection() or some GUI component to get the code at the edge of your application under test.

But using it to test your own application logic is a path that’s fraught with danger. It’s sort of like fixing a broken leg with morphine. Passively collaborating CUTs have seams in them that allow easy configuration of behavior changes and a clear delineation of responsibilities. Actively collaborating CUTs lack these things and are thus much more brittle and difficult to separate and modify. The fact that you can come up with a scheme allowing you to test the latter doesn’t eliminate these problems — it just potentially masks them. I will say that isolating your coupled, actively collaborating code and testing it is better than not testing it, but neither one is nearly as good as factoring toward passive collaboration.

By

Understanding Degrees of Code Flexibility

In some projects I’ve been managing of late, I’ve noticed a continuous question cropping up: How flexible should we make the different parts of the system? I’m currently working with a bright crew of people, so they’re picking up on this quickly, but I thought I’d do a bit of a write-up to help the process along. And, as long as I’m doing write-ups like this, I might as well post them.

In discussions of software, there is a large issue that gets lost in the shuffle. You frequently hear people argue the merits of different styles of or approaches to programming. Unit testing or not? TDD? IoC or inline new? What’s the appropriate size of a method? ORM or inline SQL? SQL at all or NoSQL? You get the idea. But one thing that I find is often glossed over is the idea of system changes as a function of ease of making those changes. In other words, if a user comes a-hollering and says, “I want, nay, demand the ability to do X,” how hard is it to make that happen and to verify the results?

And by “hard,” I don’t mean, “do you write code for a day or for three weeks?” I mean, “what do the changes look like in terms of risk and deliverables?” In other words, can you make that happen by changing a configuration file or does it require code changes? Will you need to re-deploy or can you somehow patch on the fly through a plugin architecture? And is it testable? Can you verify 99% of the changes by swapping out a configuration setting, or do you have radically different production and test setups?

So I’m going to define some concepts to flesh out an idea. This isn’t exactly a formalized theory or anything. It’s rather just a working lexicon of how I think about my application. This is a scale of system flexibility for a given future change. Or, put another way, here is a way of assessing how much effort on the part of the entire development/operations group doing X for the aforementioned user will be, from least to most significant.

  1. Users can do it themselves.
  2. An IT-level change is required (e.g. changing a config file, swapping out images, etc.)
  3. An architect/dev change is required to configuration (e.g. XML for an IoC container)
  4. A non-compiled source code change is required (e.g. you update the markup for a site but not the underlying code)
  5. An Open/Closed Principle Compliant source code change is required (basically adding new code).
  6. A localized tweak to existing code is required.
  7. A substantial change to existing code is required spanning various modules.

Now when considering this list, it makes sense to assess your change-set in terms of the furthest down thing it requires. So maybe you need to change a logo on your website, which is easy, but you have an unwieldy switch statement somewhere that chooses swaps it out in certain circumstances that you now need to change. This is going to be probably a 6 rather than a 2. A given change is going to be as rigid as the most rigid link in the chain, so to speak.

Here are the kinds of changes described in more detail.

Users do it themselves.

There are some sorts of changes to the system that need not involve anyone from your team/staff/company. These are things that users do, through the application. An obvious example is a banking or commerce website in which users can change their passwords. “Password” has nothing to do with the business logic of commerce, so this is functionally a meta-piece of administrativa that you’re entrusting to users.

A dev-ops or operations person changes meta-data in production.

This is something that you (hopefully) don’t trust a user to do but that doesn’t require any actual knowledge of the code base. A good example of this might be a desktop application that has an XML configuration file (or an INI file, if you’re willing to show your age with me). This file might be modified to have the application point to a different database or log to a different file or something. This is not something the average user could or should do, but it’s a relatively lightweight change in that it requires only minimal training and no re-deployment of any kind.

An architect or developer changes meta-data in production.

The next step down in operational rigidity is a meta-data change that cannot be performed without an understanding of the code base. The best example here is the configuration of an IoC Container that has been extracted to XML. Figuring out which service is used by which ViewModel is not something that anyone without sophisticated knowledge of your source code can do, but, on the plus side, it’s still just a change to a setting in production.

Someone changes source code that does not involve re-compilation

This is what happens when a developer logs on to the web server and stars editing HTML or CSS or even a server side script like PHP in the files. This is really not a good idea for a variety of reasons, but it is possible and may be something you have to do in a pinch, so it’s worth noting.

Someone makes a code change that more or less just involves adding code and very little modification

Now we’re down deep enough into rigid territory that a new deployment/install is required in order to push the changes. From here on, this cannot be done in production, so if you’re doing this you’re going to incur all of the overhead of building/running automated tests (hopefully), quality assurance, creating a deployment and deploying (your process may vary). But on the plus side, this is pretty low risk as far as code changes go. Adding things is generally both easy to verify for correctness of functionality and unlikely to mess up existing code.

Someone makes a code change that involves lightweight changes to existing code.

The most common scenario here is probably a bug fix, though it may be new functionality too, depending on how flexible your architecture is. This is a higher risk proposition than adding new source code because you’re now creating a risk of regressions. You still have all of the same considerations about build and deployment, but the risk of problems is higher. Your testing and verification overhead should also be higher. This is a heavier change.

Significant work on the code base is required.

This is what happens when a code base that models a company and implements Office as a singleton suddenly needs to accommodate the new office you opened up in Texas. You designed your code under the assumption that there could only ever be one office location, and you were right about that — right up until you weren’t. Oops. Now things get ugly because management comes to you and says, “we’re opening a new office, so the application is going to need to handle that” and your response is, “that’s completely out of the question. Why, even the thought is preposterous!” You tell them that substantial rework is going to be required.

Making Sense of your Options

So why did I list all of these out? Well, I did it because I feel it’s important to know what your options are when you’re designing and that it’s important to anticipate, rather than react knee-jerk style, to changes. If you sit down before you start putting together a code base, think about what users might want and then go through the exercise of figuring out which number of change would be required. If you find likely changes that would be 6s or 7s (and there is certainly a sliding scale at the 6-7 level), that’s a problem that you should start addressing now. If you find extremely unlikely changes that are 1s and 2s, that’s not necessarily a problem. But it is a point of design flexibility that you could get away from, and it may be that you have pointless abstractions and complexity (though I’d be a lot more hesitant to introduce rigidity because you think flexibility is unneeded than vice-versa).

Another interesting exercise is to consider categories of these things. For instance, 5-7 are all things that require compiled code changes and 1-4 are all things that do not. This is an interesting way to split up your functionality, and it’s obviously the backbone of this post, but you can divvy these up in other ways as well. For instance, if you’re writing software that for some reason has no field or ops support, then 1, 5, 6 and 7 are your options, and 2-4 are basically non-starters. Or, if you’re considering things in which source control is an issue, then 4-7 are in a category and 1-3 are in a different category (most likely, as I’d think that you’d favor generating meta-data files as part of your deployment rather than source controlling different configurations).

None of this is even remotely comprehensive, but my goal here is really just to encourage people to understand at design time the difficulty of changing something at production time. It seems quite often to be the case that people don’t really think about this, and simply because no one has ever pointed it out to them. Your mileage may vary on the number of categories in the list and your preference for certain options, but at the core of this is a basic and incredibly important idea: you should always play “what if” when it comes to changes that users might request and understand how much of a headache it will be for you if the “what if” comes true. Oh, and also try to minimize the number of headaches. But hopefully that goes without saying.