C# Archives - Page 21 of 39

Nov
7

My Initial Postsharp Setup — Logging at Assembly Granularity

Category: .NET Tags: C#, Log4Net, PostSharp |

PostSharp and Log4Net

I tweeted a bit about this, and now I’m going to post about my initial experience with it. This comes from the perspective of someone new at this particular setup but familiar with the concepts in general. So, what are PostSharp and log4net?

First up, log4net. This is extremely easy to understand, as it’s a logging framework for .NET. It was ported to .NET from Java and the product log4j a long time ago. It’s a tried and true way to instrument your applications for logging. I’m not going to go into a lot of detail on it (here’s a Pluralsight course by Jim Christopher), but you can install it for your project with Nuget and get started pretty quickly and easily.

PostSharp is a little more complex to explain (there’s also a Pluralsight course on this, by Donald Belcham). It’s a tool for a technique called “aspect-oriented programming” (AOP) which addresses what are known as cross cutting concerns. These are things that are intrinsically non-localized in an application. What I mean is, you might have a module that processes EDI feeds and another one that stores data to a local file, and these modules may be completely isolated from one another in your system. These concerns are localized in a nicely modular architecture. Something like, oh, I dunno, logging, is not. You do that everywhere. Logging is said to be an aspect of your system. Security is another stock example of an aspect.

PostSharp employs a technique called “IL Weaving” to address AOP in clean and remarkably decoupled way. If you’re a .NET programmer, whether you code in VB, C#, F#, etc., all of your code gets compiled down to what’s known as intermediate language (IL). Then, when the code is actually being executed, this IL is translated on the fly into machine/executable code. So there are two stages of compiling, in essence. In theory, you can write IL code directly. PostSharp takes advantage of this fact, and when you’re building your C# code into IL code, it interposes and injects a bit of its own stuff into the resultant IL. The upshot of all this is that you can have logging in every method in your code base without writing a single call to Logger.Log(something) in any method, anywhere. Let me be clear — you can get all of the benefits of comprehensive logging with none of the boilerplate, clutter, and intensely high coupling that typically comes with implementing an aspect.

Great, But How?

Due to a lack of time in general, I’ve sort of gotten away from detailed how-to posts, for the most part, with screenshots and steps. It’s really time consuming to make posts like that. What I’ll do instead is describe the process and, if anyone has questions, perhaps clarify with an addendum or links or something. Trying to get more agile everywhere and avoid gold-plating 🙂

And really, getting these things into your project is quite simple. In both cases, I just added a nuget package to a project. For log4net, this is trivial to do. For PostSharp, this actually triggers an install of PostSharp as a Visual Studio plugin. PostSharp offers a few different license types. When you install it in VS, it will prompt you to enter a license key or do a 45 day trial. You can sign up for an express version on their site, and you’ll get a license key that you can plug in. From there, it gets installed, and it’s actually really polished. It even gives you a window in Studio that keeps track of progress in some tutorials they offer for getting started.

With that in place, you’re ready to write your first aspect. These are generally implemented as attributes that you can use to decorate methods, types, and assemblies so that you can be as granular with the aspects as you like. If you implement an attribute that inherits from OnMethodBoundaryAspect, you get a hook in to having code executed on events in the application like “Method Enter,” “Method Leave,” and “Exception.” So you can write C# code that will get executed upon entry to every method.

Here’s a look at an example with some method details elided:

[Serializable]
[AttributeUsage(AttributeTargets.Assembly)]
public sealed class LogAttribute : OnMethodBoundaryAspect
{
    private static readonly ILog _logger;

    static LogAttribute()
    {
        SetupLogger();
        _logger = LogManager.GetLogger(typeof(LogAttribute));
    }

    public override void OnException(MethodExecutionArgs args)
    {
        if(_logger != null)
            _logger.Error("An exception occurred: ", args.Exception); 
    }
...

Leaving aside the logging implementation details, what I’ve done here is define an attribute. Any type or method decorated with this attribute will automatically log any exception that occurred without the code of that method being altered in the slightest. The “MethodExecutionArgs” parameter gives you information that lets you inspect various relevant details about the method in question: its name, its parameters, its return value, etc.

Getting Modular

Okay, so great. We can apply this at various levels. I decided that I wanted to apply it per assembly. I’m currently working at times in a legacy code base where a series of Winforms and Webforms applications make use of a common assembly called “Library.” This code had previously been duplicated, but I made it common and unified it as a step toward architecture improvement. This is where I put my aspect attribute for reference, and I decided to apply this at the assembly level. Initially, I want some assemblies logging exceptions, but not others. To achieve this, I put the following in the AssemblyInfo.cs in the assemblies for which I wanted logging.

[assembly: Log()]

This is awesome because even though PostSharp and the Aspect are heavily coupled to the assemblies on the whole (every assembly uses Library, and Library depends on Postsharp, so every assembly depends on PostSharp) it isn’t coupled in the actual code. In fact, I could just remove that line of code and the library dependency, and not touch a single other thing (except, of course, the references to library utilities).

But now another interesting problem arises, which is naming the log files generated. I want them to go in AppData, but I want them named after the respective deliverable in this code base.

And then, in the library project, I have this method inside of the LogAttribute class:

private static string GetLogFileFullPath()
{
    string friendlyName = AppDomain.CurrentDomain.FriendlyName;
    string executableName = friendlyName.Replace(".vshost", string.Empty);
    string appdataPath = Environment.GetFolderPath(Environment.SpecialFolder.ApplicationData);
    string logPath = Path.Combine(appdataPath, "{CompanyNameHere}");
    return Path.Combine(logPath, String.Format("{0}.log", executableName));
}

I’ve made use of the Monostate Pattern to ensure that a single logger instance is configured and initialized and then used by the attribute instances. This is an implementation that I’ll probably refine over time, but it’s alright for a skunkworks. So, what happens is that when the application fires up, I figure out the name of the entry executable and use it to name the log file that’s created/appended in AppData under the company name folder.

This was great until I noticed weird files getting created in that folder. Turns out that NCrunch and other plugins are triggering the code to be invoked in this way, meaning that unit test runners, realtime and on-demand are generating logging. Duh. Oops. And… yikes!

My first thought was that I’d see if I was being run from a unit test and no-op out of logging if that were the case. I found this stack overflow post where Jon Skeet suggested an approach and mentioned that he “[held his] nose” while doing it because it was a pragmatic solution to his problem. Well, since I wasn’t in a pinch, I decided against that.

Maybe it would make sense, instead of figuring out whether I was in a unit test assembly and what other sorts of things I didn’t want to have the logging turned on for, to take a whitelist approach. That way, I have to turn logging on explicitly if I want it to happen. I liked that, but it seemed a little clunky. I thought about what I’d do to enable it on another one of the projects in the solution, and that would be to go into the assembly file and add the attribute for the assembly, and then go into the logger to add the assembly to the whitelist. But why do two steps when I could do one?

private static bool IsAspectLoggingEnabled()
{
    try
    {
        return Assembly.GetEntryAssembly() != null && Attribute.GetCustomAttributes(typeof(LogAttribute), false).Any();
    }
    catch
    { return false; }
}

I added this method that actually figures out whether the attribute has been declared for the assembly and, I only enable the logger if it has. I’ve tested this out and it works pretty well, though I’ve only been living with it for a couple of days, so it’s likely to continue evolving. But the spurious log files are gone, and MS Test runner no longer randomly bombs out because the “friendly name” sometimes has a colon in it. This is almost certainly not the most elegant approach to my situation, but it’s iteratively more elegant, and that’s really I’m ever going for.

Ideas/suggestions/shared experience is welcome. And here’s the code for the aspect in its entirety right now:

[Serializable]
[AttributeUsage(AttributeTargets.Assembly)]
public sealed class LogAttribute : OnMethodBoundaryAspect
{
    private static readonly ILog _logger;

    static LogAttribute()
    {
        if (IsAspectLoggingEnabled())
        {
            SetupLogger();
            _logger = LogManager.GetLogger(typeof(LogAttribute));
        }
    }

    public override void OnException(MethodExecutionArgs args)
    {
        if(_logger != null)
            _logger.Error("An exception occurred: ", args.Exception); 
    }


    private static bool IsAspectLoggingEnabled()
    {
        try
        {
            return Assembly.GetEntryAssembly() != null && Attribute.GetCustomAttributes(typeof(LogAttribute), false).Any();
        }
        catch
        { return false; }
    }

    private static void SetupLogger()
    {
        var appender = BuildAppender();

        var hierarchy = (Hierarchy)LogManager.GetRepository();
        hierarchy.Root.AddAppender(appender);

        hierarchy.Configured = true;
        BasicConfigurator.Configure(appender);
    }

    private static FileAppender BuildAppender()
    {
        var appender = new RollingFileAppender()
        {
            File = GetLogFileFullPath(),
            AppendToFile = true,
            Layout = new PatternLayout() { ConversionPattern = "%m%n" }
        };
        appender.ActivateOptions();
        return appender;
    }

    private static string GetLogFileFullPath()
    {
        string friendlyName = AppDomain.CurrentDomain.FriendlyName;
        string executableName = friendlyName.Replace(".vshost", string.Empty);
        string appdataPath = Environment.GetFolderPath(Environment.SpecialFolder.ApplicationData);
        string logPath = Path.Combine(appdataPath, "{CompanyNameHere}");
        return Path.Combine(logPath, String.Format("{0}.log", executableName));
    }
}

Oct
21

By Erik Dietrich

Module Boundaries and Demeter

Category: Abstractions Tags: Abstraction, C# | 7 Comments

I was doing a code review recently, and I saw something like this:

public class SomeService
{
    public void Update(Customer customer)
    {
        //Do update stuff
    }

    public void Delete(int customerId)
    {
        //Do delete stuff
    }
}

What would you say if you saw code like this? Do you see any problem in the vein of consistent abstraction or API writing? It’s subtle, but it’s there (at least as far as I’m concerned).

The problem that I had with this was the mixed abstraction. Why do you pass a Customer object to Update and an integer to Delete? That’s fairly confusing until you look at the names of the variables. The method bodies are elided because they shouldn’t matter, but to understand the reason for the mixed abstraction you’d need to examine them. You’d need to see that the Update method uses all of the fields of the customer object to construct a SQL query and that the corresponding Delete method needs only an ID for its SQL query. But if you need to examine the methods of a class to understand the API, that’s not a good abstraction.

A better abstraction would be one that had a series of methods that all had the same level of specificity. That is, you’d have some kind of “Get” method that would return a Customer or a collection of Customers and then a series of mutator methods that would take a Customer or Customers as arguments. In other words, the methods of this class would all be of the form “get me a customer” or “do something to this customer.”

The only problem with this code review was that I had just explained the Law of Demeter to the person whose code I was reviewing. So this code:

public void DeleteCustomer(int customerId)
{
    string theSqlQuery = "DELETE FROM Customer WHERE CustomerId = " + customerId;
    //Do some sql stuff...
}

was preferable to this:

public void DeleteCustomer(Customer customer)
{
    string theSqlQuery = "DELETE FROM Customer WHERE CustomerId = " + customer.Id;
    //Do some sql stuff...
}

The reason is that you don’t want to accept an object as a method parameter if all you do with it is use one of its properties. You’re better off just asking for that property directly rather than taking a needless dependency on the containing object. So was I a hypocrite (or perhaps just indecisive)?

Well, the short answer is “yes.” I gave a general piece of advice one week and then gave another piece of advice that contradicted it the next. I didn’t do this, however, because of caprice. I did it because pithy phrases and rules fail to capture the nuance of architectural decisions. In this case the Law of Demeter is at odds with providing a consistent abstraction. And, I value the consistent abstraction more highly, particularly across a public seam between modules.

What I mean is, if SomeService were an implementation of a public interface called ICustomerService, what you’d have is a description of some methods that manipulate Customer. How do they do it? Who knows… not your problem. Is the customer in a database? Memory? A file? A web service? Again, as consumers of the API we don’t know and don’t care. So because we don’t know where and how the customers are stored, what sense would it make if the API demanded an integer ID? I mean, what if some implementations use a long? What if Customers are identified elsewhere by SSN for deletion purposes? The only way to be consistent across module boundaries (and thus generalities) is to deal exclusively in domain object concepts.

The Law of Demeter is called the Principle of Least Knowledge. At its (over) simplest, it is a dot counting exercise to see if you’re taking more dependencies than is strictly necessary. This can usually be enforced by asking yourself if your methods are using any objects that they could get by without using. However, in the case of public facing APIs and module boundaries, we have to relax the standard. Sure, the SQL Server version of this method may not need to know about the Customer, but what about any scheme for deleting customers? A narrow application of the Law of Demeter would have you throw Customer away, but you’d be missing out by doing this. The real question to ask in this situation is not “what is the minimum that I need to know” but rather “what is the minimum that a general implementation of what I’m doing might need to know.”

Jul
31

By Erik Dietrich

Introduction to Unit Testing Part 6: Test Doubles

Category: Language Agnostic Tags: C#, JustMock, MS Test, Starting To Unit Test, Unit Testing | 1 Comment

In the last two posts in this series, I talked about how to test new code in your code base and then how to bring your legacy code under test. Toward the end of the last chapter in this series, I talked a bit about the concept of test doubles. The example I showed was one in which I used polymorphism to create a “dummy” class that I used in a test to circumvent otherwise untestable code. Here, I’ll dive into a lot more detail on the subject, starting out with a much simpler example than that and building to a more sophisticated way to handle the management of your test doubles.

First, a Bit of Theory

Before we get into test doubles, however, let’s stop and talk about what we’re actually doing, including theory about unit tests. So far, I’ve showed a lot of examples of unit tests and talked about what they look like and how they work (for instance, here in post two where I talk about Arrange, Act Assert). But what I haven’t addressed, specifically, is how the test code should interact with the production code. So let’s talk about that a bit now.

By far the most common case when unit testing is that you instantiate a class under test in the “arrange” part of your unit test, and then you do whatever additional setup is necessary before calling some method on that class. Then you assert something that should have happened as a result of that method call. Let’s return to the example of prime finder from earlier and look at a simple test:

[TestMethod]
public void Returns_False_For_One()
{
    var primeFinder = new PrimeFinder(); //Arrange

    bool result = primeFinder.IsPrime(1); //Act

    Assert.IsFalse(result); //Assert
}

This should be reviewed from the perspective of “arrange, act, assert,” but let’s look specifically at the “act” line. Here is the real crux of the test; we’re writing tests about the IsPrime method and this is where the action happens. In this line of code, we give the method an input and record its output, so it’s the perfect microcosm for what I’m going to discuss about a class under test: its interactions with other objects. You see, unit testing isn’t about executing your code — you can do that with integration tests, console apps, or even just by running the application. Unit testing, at its core, is about isolating your classes and running experiments on them, as if you were a scientist in a lab. And this means controlling all of the inputs to your class — stimulus, if you will — so that you can observe what it puts out.

Controlling the inputs in the PrimeFinder class is simple. Because I’m telling you that there are no invocations of global/static state (which will become an important theme as we proceed). You can see by looking at the unit test that the only input to the class under test (CUT) is the integer 1. This means that the only input/stimulus that we supply to the class is a simple integer, making it quite easy to make assertions about its behavior. Generally speaking, the simpler the inputs to a class, the easier that class is to test.

There are Inputs and There are Inputs

Omitting certain edge cases I can think of (and probably some that I’m not thinking of), let’s consider a handful of relatively straightforward ways that a class might get ahold of input information. There is what I did above — passing it into a method. Another common way to give information to a class is to use constructor parameters or setter methods/properties. I’ll refer to these as “passive collaboration” from the perspective of the CUT, since it’s simply being given the things that it needs. There is also what I’ll call “semi-passive collaboration,” which is when you pass a dependency to the CUT and the CUT interacts in great detail with that dependency, mutating its state and querying it. An example of this would be “Car theCar = new Car(new Engine())”, in which performing operations on Car related to starting and driving result in rather elaborate modifications to the state of Engine. It’s still passive in the sense that you’re handing the Engine class to the car, but it’s not as passive as simply handing it an integer. In general, passive input is input that the scope instantiating the CUT controls — constructor parameters, method parameters, setters, and even things returned from methods of objects passed to the CUT (such as the Car class calling _engine.GetTemperature() in the example in this paragraph).

In contrast, there is also “active collaboration,” which is when the CUT takes responsibility for getting its own inpu. This is input that you cannot control when instantiating the class. An example of this is a call to some singleton or public static method in the CUT. The only way that you can reassume control is by not calling the method in which it occurs. If static/singleton calls occur in the constructor, you simply cannot test or even instantiate this class without it doing whatever the static code entails. If it retrieves values from static state, you have no control over those values (short of mocking up the application’s global state).

A second form of active collaboration is the “new” operator. This is very similar to static state in that when you create the CUT, you have no control over this kind of input to the CUT. Imagine if Car new-ed up its own Engine and queried it for temperature. There would be absolutely no way that you could have any effect on this operation in the Car class short of not instantiating it. Like static calls, object instantiation renders your CUTs a non-negotiable, “take it or leave it” proposition. You can have them with all of their instantiated objects and global state or you can write your own, buddy.

Not all inputs to a class are created equal. There are a CUT’s passive inputs, in which the CUT cedes control to you. And then there are the CUT’s active inputs that it controls and on which it does not allow you to interpose in any way. As it turns out, it is substantially easier to test CUTs with exclusively passive collaboration/input and difficult or even impossible to test CUTs with active collaboration. This is simply because you cannot isolate actively collaborating CUTs.

Literals: Too Simple to Need Test Doubles

There’s still a little bit of work to do before we discuss test doubles in earnest. First, we have to talk about inputs that are too simple to require stand-ins: literals. The PrimeFinder test above is the perfect example of this. It’s performing a mathematical operation using an integer input, so what we’re interested in testing is known input-output pairs in a functional sense. As such, we just need to know what to pass in, to pass that value in, and then to assert that we get the expected return value.

In a strict sense, we could refer to this as a form of test double. After all, we’re doing a non-production exercise with the API, so the value we’re passing in is fake, in a sense. But that’s a little formal for my taste. It’s easier just to think in terms of literals almost always being too simple to require any sort of substitution of behavior.

An interesting exception to this the null literal (of null type) or the default value of a non-nullable type. In many cases, you may actually want to be testing this as an input since null and 0 tend to be particularly interesting inputs and the source of corner cases. However, in some cases, you may be supplying what is considered the simplest form of test double: the dummy value. A dummy value is something you pass into a function to say, “I don’t care what this is and I’m just passing in something to make the compiler happy.” An example of where you might do this is passing null to a constructor of an object instance when you just want to make assertions as to what some of its property values initialize to.

Simple/Value Objects and Passing in Friendlies

Next up for consideration is the concept of a “test stub,” or what I’ll refer to in the general sense as a “friendly.”

Take a look at this code:

public class Car
{
    public int EngineTemperature { get; private set; }

    public Car(Engine engine)
    {
        EngineTemperature = engine.TemperatureInFahrenheit;
    }
}
public class Engine
{
    public int TemperatureInFahrenheit { get; set; }
}

Here is an incredibly simple implementation of the Car-Engine pair I described earlier. Car is passed an Engine and it queries that Engine for a local value that it exposes. Let’s say that I now want to test that behavior. I want to test that Car’s EngineTemperature property is equal to the Engine’s temperature in fahrenheit. What do you think is a good test to write? Something like this, maybe —

[TestMethod]
public void EngineTemperature_Initializes_Value_Returned_By_Engine()
{
    const int engineTemperatureFromEngine = 200;
    Engine engine = new Engine() { TemperatureInFahrenheit = engineTemperatureFromEngine };
    var car = new Car(engine);

    Assert.AreEqual(engineTemperatureFromEngine, car.EngineTemperature);
}

Here, we’re setting up the Engine instance in such a way as that we control what it provides to Car when Car uses it. We know by inspecting the code for Car that Car is going to ask Engine for its TemperatureInFahrenheit value, so we set that value to a known commodity, allowing us to compare in the Assert. To put it another way, we’re supplying input indirectly to Car by setting up Engine and telling Engine what to give to Car. It’s important to note that this is only possible because Car accepts Engine as an argument. If Car instantiated Engine in its constructor, it would not be possible to isolate Car because any test of Car’s initial value would necessarily also be a test of Engine, making the test an integration test rather than a unit test.

Creating Bonafide Mocks

That’s all well and good, but what if the Engine class were more complicated or just written differently? What if the way to get the temperature was to call a method and that method went and talked to a file or a database or something? Think of how badly the testing for this is going to go:

public class Car
{
    public int EngineTemperature { get; private set; }

    public Car(Engine engine)
    {
        EngineTemperature = engine.TemperatureInFahrenheit;
    }
}
public class Engine
{
    public int TemperatureInFahrenheit
    {
        get
        {
            var stream = new StreamReader(@"C:\whatever.txt");
            return int.Parse(stream.ReadLine());
        }
    }
}

Now, when we instantiate a Car and query its engine temperature property, suddenly file contents are being read into memory and, as I’ve already covered in this series, File I/O is a definite no-no in a unit test. So I suppose we’re hosed. As soon as Car tries to read Engine’s temperature, we’re going to explode — or we’re going to succeed, which is even worse because now you’ll have a unit test suite that depends on the machine it’s running on having the file C:\whatever.txt on it and containing an integer as its first line.

But what if we got creative the way we did at the end of the last episode of this series? Let’s make the TemperatureInFahrenheit property virtual and then declare the following class:

public class FakeEngine : Engine
{
    private int _temperature;

    public override int TemperatureInFahrenheit
    {
        get { return _temperature; }
    }

    public FakeEngine(int temperature)
    {
        _temperature = temperature;
    }

This class is test-friendly because it doesn’t contain any file I/O at all and it inherits from Engine, overriding the offending methods. Now we can write the following unit test:

[TestMethod]
public void EngineTemperature_Initializes_Value_Returned_By_Engine()
{
    const int engineTemperatureFromEngine = 200;
    Engine engine = new FakeEngine(engineTemperatureFromEngine);
    var car = new Car(engine);

    Assert.AreEqual(engineTemperatureFromEngine, car.EngineTemperature);
}

If this seems a little weird to you, remember that our goal here is to test the Car class and not the engine class. All that the Car class knows about Engine is that it wants its TemperatureInFahrenheit property. It doesn’t (and shouldn’t) care how or where this comes from internally to Engine — file I/O, constructor parameter, secret ink, whatever. And when testing the Car class, you certainly don’t care. Another way to think of this is that you’re saying, “assuming that Engine tells Car that the engine temperature is 200, we want to assert that Car’s EngineTemperature property is 200.” In this fashion, we have isolated the Car class and are testing only its functionality.

This kind of test double and testing technique is known as a Fake. We’re creating a fake engine to stand-in for the real one. It’s not simple enough to be a dummy or a stub, since it’s a real, bona-fide different class instead of a doctored version of an existing one. I realize that the terminology for the different kinds of test doubles can be a little confusing, so here’s a helpful taxonomy of them.

Mocking Frameworks

The last step in the world of test doubles is to get to actual mock objects. If you stop and ponder the fake approach from the last section a bit, a problem might occur to you. The problem has to do with long-term maintenance of code. I remember, many moons ago when I discovered the power of polymorphism for creating fake objects, that I thought it was the greatest thing under the sun. Obviously there was at least one fake per test class with dependency, and sometimes there were multiple dependencies. And I didn’t always stop there — I might define three or four different variants of the fake, each having a method that behaved differently for the test in question. In one fake, TemperatureInFarenheit would return a passed in value, but in another, it would throw an exception. Oh, there were so many fakes — I was swimming in fakes for classes and fakes for interfaces.

And they were awesome… until I added a method to the interface they implemented or changed behavior in the class they inherited. And then, oh, the pain. I would have to go and change dozens of classes. And then there was also the fact that all of this faking took up a whole lot of space. My test classes were littered with nested classes of fakes. It was fun at first, but the maintenance became a drudgery. But don’t worry, because my gift to you is to spare you that pain.

What if I told you that you could implement interfaces and inherit from classes anonymously, without actually creating source code that did this? I’d be oversimplifying a bit, but once you got past that, you’d probably be pretty excited. I say this because, as you start to grasp the concept of mocking frameworks, this kind of “dynamic interface implementation/inheritance” is the easiest way to reason about what it’s doing, from a practical perspective, without getting bogged down in more complicated concepts like reflection and direct work with byte-code and other bits of black magic.

As an example of this in action, take a look at how I go about testing the Car and Engine with the difficult dependency. The first thing that I do is delete the Fake class because there’s no need for it. The next thing I do is write a unit test, using a framework called JustMock by Telerik (this is currently my preferred mocking framework for C#).

[TestMethod]
public void EngineTemperature_Initializes_Value_Returned_By_Engine()
{
    const int engineTemperatureFromEngine = 200;
            
    var engine = Mock.Create();
    engine.Arrange(e => e.TemperatureInFahrenheit).Returns(engineTemperatureFromEngine);

    var car = new Car(engine);

    Assert.AreEqual(engineTemperatureFromEngine, car.EngineTemperature);
}

Notice that instead of instantiating an engine, I now invoke a static method on a class called Mock that takes care of creating my dynamic inheritor for me. Mock.Create() is what creates the equivalent of FakeEngine. On the next line, I invoke an (extension) method called Arrange that creates an implementation of the property for me as well. What I’m saying, in plain English, is “take this mock engine and arrange it such that the TemperatureInFahrenheit property returns 200.” I’ve done all of this in one line of code instead of adding an entire nested class. And, best of all, I don’t need to change this mock if I decide to change some behavior in the base class or add a new method.

Truly, once you get used to the concept of mocking, you’ll never go back. It will become your best friend for the purposes of mocking out dependencies of any real complexity. But temper your enthusiasm just a bit. It isn’t a good idea to use mocking frameworks for simple dependencies like the PrimeFinder example. The lite version of JustMock that I’ve used and many others won’t even allow it, and even if they did, that’s way too much ceremony — just pass in real objects and literals, if you can reasonably.

The idea of injecting dependencies into classes (what I’ve called “passive” and “semi-passive” collaboration) is critical to mocking and unit testing. All basic mocking frameworks operate on the premise that you’re using this style of collaboration and that your classes are candidates for polymorphism (either interfaces or overridable classes). You can’t mock things like primitives and you can’t mock sealed/final classes.

There are products out there called isolation frameworks that will grant you the ability to mock pretty much everything — primitives, sealed/final classes, statics/singletons, and even the new operator. These are powerful (and often long-running, resource-intensive) tools that have their place, but that place is, in my opinion, at the edges of your code base. You can use this to mock File.Open() or new SqlConnection() or some GUI component to get the code at the edge of your application under test.

But using it to test your own application logic is a path that’s fraught with danger. It’s sort of like fixing a broken leg with morphine. Passively collaborating CUTs have seams in them that allow easy configuration of behavior changes and a clear delineation of responsibilities. Actively collaborating CUTs lack these things and are thus much more brittle and difficult to separate and modify. The fact that you can come up with a scheme allowing you to test the latter doesn’t eliminate these problems — it just potentially masks them. I will say that isolating your coupled, actively collaborating code and testing it is better than not testing it, but neither one is nearly as good as factoring toward passive collaboration.

Jul
15

By Erik Dietrich

Intro to Unit Testing 5: Invading Legacy Code in the Name of Testability

Category: Language Agnostic Tags: C#, MS Test, Starting To Unit Test, Unit Testing | 11 Comments

If, in the movie Braveheart, the Scots had been battling a nasty legacy code base instead of the English under Edward Longshanks, the conversation after the battle at Stirling between Wallace and minor Scottish noble MacClannough might have gone like this:

Wallace: We have prevented new bugs in the code base by adding new unit tests for all new code, but bugs will still happen.

MacClannough: What will you do?

Wallace: I will invade the legacy code, and defeat the bugs on their own ground.

MacClannough (snorts in disbelief): Invade? That’s impossible.

Wallace: Why? Why is that impossible? You’re so concerned with squabbling over the best process for handling endless defects that you’ve missed your God-given right to something better.

Goofy as the introduction to this chapter of the series may be, there’s a point here: while unit testing brand new classes that you add to the code base is a victory and brings benefit, to reap the real game-changing rewards you have to be a bit of a rabble-rouser. You can’t just leave that festering mass of legacy code as it is, or it will generate defects even without you touching it. Others may scoff or even outright oppose your efforts, but you’ve got to get that legacy code under test at some point or it will dominate your project and give you unending headaches.

So far in this series, I’ve covered the basics of unit testing, when to do it, and when it might be too daunting. Most recently, I talked about how to design new code to make it testable. This time, I’m going to talk about how to wrangle your existing mess to start making it testable.

Easy Does It

A quick word of caution here before going any further: don’t try to do too much all at once. Your first task after reading the rest of this post should be selecting something small in your code base to try it on if you want to target production and get it approved by an architect or lead, if that’s required. Another option is just to create a playpen version of your codebase to throw away and thus earn yourself a bit more latitude, but either way, I’d advise small, manageable stabs before really bearing down. What specifically you try to do is up to you, but I think it’s worth proceeding slowly and steadily. I’m all about incremental improvement in things that I do.

Also, at the end of this post I’ll offer some further reading that I highly recommend. And, in fact, I recommend reading it before or as you get started working your legacy code toward testability. These books will be a great help and will delve much further into the subjects that I’ll cover here.

Test What You Can

Perhaps this goes without saying, but let’s just say it anyway to be thorough. There will be stuff in the legacy code base you can test. You’ll find the odd class with few dependencies or a method dangling off somewhere that, for a refreshing change, doesn’t reference some giant singleton. So your first task there is writing tests for that code.

But there’s a way to do this and a way not to do this. The way to do it is to write what’s known as characterization tests that simply document the behavior of the existing system. The way not to do this is to introduce ‘corrections’ and cleanup as you go. The linked post goes into more detail, but suffice it to say that modifying untested legacy code is like playing Jenga — you never really know ahead of time which brick removal is going to cause an avalanche of problems. That’s why legacy code is so hard to change and so unpleasant to work with. Adding tests is like adding little warnings that say, “dude, not that brick!!!” So while the tower may be faulty and leaning and of shoddy construction, it is standing and you don’t want to go changing things without putting your warning system in place.

So, long story short, don’t modify — just write tests. Even if a method tells you that it adds two integers and what it really does is divide one by the other, just write a passing test for it. Do not ‘fix’ it (that’ll come later when your tests help you understand the system and renaming the method is a more attractive option). Iterate through your code base and do it everywhere you can. If you can instantiate the class to get to the method you want to test and then write asserts about it (bearing in mind the testability problems I’ve covered like GUI, static state, threading, etc), do it. Move on to the next step once you’ve done the easy stuff everywhere. After all, this is easy practice and practice helps.

Go searching for extractable code

Now that you have a pretty good handle on writing testable code as you’re adding it to the code base and getting untested but testable code under test, it’s time to start chipping away at the rest. One of the easiest ways to do this is to hunt down methods in your code base that you can’t test but not because of the contents in them. Here are two examples that come to mind:

public class Untestable1
{
    public Untestable1()
    {
        TestabilityKiller.Instance.DoSomethingHorribleWithGlobalVariables();
    }

    public int AddTwoNumbers(int x, int y)
    {
        return x + y;
    }
}

public class Untestable2
{
    public void PerformSomeBusinessLogic(CustomerOrder order)
    {
        Console.WriteLine("Total is " + AddTwoNumbers(order.Subtotal, order.Tax));
    }

    private int AddTwoNumbers(int x, int y)
    {
        return x + y;
    }
}

The first class is untestable because you can’t instantiate it without kicking off global state modification and who knows what else. But the AddTwoNumbers method is imminently testable if you could remove that roadblock. In the second example, the AddTwoNumbers method is testable once again, in theory, but with a roadblock: it’s not public.

In both cases, we have a simple solution: move the method somewhere else. Let’s put it into a class called “BasicArithmeticPerformer” as shown below. I do realize that there are other solutions to make these methods testable, but we’ll talk about them later. And I’ll tell you what I consider to be a terrible solution to one of the testability issues that I’ll talk about now: making the private method public or rigging up your test runner with gimmicks to allow testing of private methods. You’re creating an observer effect with testing when you do this — altering the way the code would look so that you can test it. Don’t compromise your encapsulation design to make things testable. If you find yourself wanting to test what’s going on in private methods, that’s a strong, strong indicator that you’re trying to test the wrong thing or that you have a design flaw.

public class BasicArithmeticPerformer
{
    public int AddTwoNumbers(int x, int y)
    {
        return x + y;
    }
}

Now that’s a testable class. So what do the other classes now look like?

public class Untestable1
{
    public Untestable1()
    {
        TestabilityKiller.Instance.DoSomethingHorribleWithGlobalVariables();
    }

    private int AddTwoNumbers(int x, int y)
    {
        return new BasicArithmeticPerformer().AddTwoNumbers(x, y);
    }
}

public class Untestable2
{
    public void PerformSomeBusinessLogic(CustomerOrder order)
    {
        Console.WriteLine("Total is " + AddTwoNumbers(order.Subtotal, order.Tax));
    }

    public int AddTwoNumbers(int x, int y)
    {
        return new BasicArithmeticPerformer().AddTwoNumbers(x, y);
    }
}

Yep, it’s that simple. In fact, it has to be that simple. Modifying this untestable legacy code is like walking a high-wire without a safety net, so you have to change as little as possible. Extracting a method to another class is very low risk as far as refactorings go since the most likely problem that could possibly occur (particularly if using an automated tool) is non-compiling. There’s always a risk, but getting legacy code under test is lower risk in the long run than allowing it to continue rotting and the risk of this particular approach is minimal.

On the other side of things, is this a significant win? I would say so. Even ignoring the eliminated duplication, you now have gone from 0 test coverage to 50% in these classes. Test coverage is not a goal in and of itself, but you can now rest a little easier knowing that you have a change warning system in place for half of your code. If someone comes along later and says, “oh, I’ll just change that plus to a minus so that I can ‘reuse’ this method for my purposes,” you’ll have something in place that will throw up a bid red X and say, “hey, you’re breaking things!” And besides, Rome wasn’t built in a day — you’re going to be going through your code base building up a test suite one action like this at a time.

Code that refers to no class fields is easy when it comes to extracting functionality to a safe, testable location. But what if there is instance-level state in the mix? For example…

public class Untestable3
{
    int _someField;

    public Untestable3()
    {
        TestabilityKiller.Instance.DoSomethingHorribleWithGlobalVariables();
        _someField = TestabilityKiller.Instance.GetSomeGlobalVariableValue();
    }

    public int AddToGlobal(int x)
    {
        return x + _someField;
    }
}

That’s a little tougher because we can’t just pull _someField into a new, testable class. But what if we made a quick change that got us onto more familiar ground? Such as…

public class Untestable3
{
    int _someField;

    public Untestable3()
    {
        TestabilityKiller.Instance.DoSomethingHorribleWithGlobalVariables();
        _someField = TestabilityKiller.Instance.GetSomeGlobalVariableValue();
    }

    public int AddToGlobal(int x)
    {
        return AddTwoNumbers(x, _someField);
    }

    private int AddTwoNumbers(int x, int y)
    {
        return x + y;
    }
}

Aha! This looks familiar, and I think we know how to get a testable method out of this thing now. In general, when you have class fields or local variables, those are going to become arguments to methods and/or constructors of the new, testable class that you’re creating and instantiating. Understand going in that the more local variables and class fields you have to deal with, the more of a testing headache the thing you’re extracting is going to be. As you go, you’ll learn to look for code in legacy classes that refers to comparably few local variables and especially fields in the current class as a refactoring target, but this is an acquired knack.

The reason this is not especially trivial is that we’re nibbling here at an idea in static analysis of object oriented programs called “cohesion.” Cohesion, explained informally, is the idea that units of code that you find together belong together. For example, a Car class with an instance field called Engine and three methods, StartEngine(), StopEngine( )and RestartEngine() is highly cohesive. All of its methods operate on its field. A class called Car that has an Engine field and a Dishwasher field and two methods, StartEngine() and EmptyDiswasher() is not cohesive. When you go sniping for testable code that you can move to other classes, what you’re really looking for is low cohesion additions to existing classes. Perhaps some class has a method that refers to no instance variables, meaning you could really put it anywhere. Or, perhaps you find a class with three methods that refer to a single instance variable that none of the other 40 methods in a class refer to because they all use some other fields on the class. Those three methods and the field they use could definitely go in another class that you could make testable.

When refactoring toward testability, non-cohesive code is the low-hanging fruit that you’re looking for. If it seems strange that poorly designed code (and non-cohesive code is a characteristic of poor design) offers ripe refactoring opportunities, we’re just making lemonade out of lemons. The fact that someone slammed unrelated pieces of code together to create a franken-class just means that you’re going to have that much easier of a time pulling them apart where they belong.

Realize that Giant Methods are Begging to be Classes

It’s getting less and less common these days, but do you ever see object-oriented code which you can tell that the author meandered his way over to from writing C back in the one-pass compiler days? If you don’t know what I mean, it’s code that has this sort of form:

public void PerformSomeBusinessLogic(CustomerOrder order)
{
    int x, y, z;
    double a, b, c;
    int counter;
    CustomerOrder tempOrder;
    int secondLoopCounter;
    string output;
    string firstTimeInput;
            
    //Alright, now let's get started because this is going to be looooonnnng method...
    ...
}

C programmers wrote code like this because in old standards of C it was necessary to declare variables right after the opening brace of a scope before you started doing things like assignment and control flow statements. They’ve carried it forward over the years because, well, old habits die hard. Interestingly, they’re actually doing you a favor. Here’s why.

When looking at a method like this, you know you’re in for doozy. If it has this many local variables, it’s going to be long, convoluted and painful. In the C# world, it probably has regions in it that divide up the different responsibilities of the method. This is also a problem, but a lemons-to-lemonade opportunity for us. The reason is that these C-style programmers are actually telling you how to turn their giant, unwieldy method into a class. All of those variables at the top? Those are your class fields. All of those regions (or comments in languages that don’t support regioning)? Method names.

In one of the resources I’ll recommend, “Uncle” Bob Martin said something along the lines of “large methods are where classes go to hide.” What this means is that when you encounter some gigantic method that spans dozens or hundreds of lines, what you really have is something that should be a class. It’s functionality that has grown too big for a method. So what do you do? Well, you create a new class with its local variables as fields, its region names/comments as method titles, and class fields as dependencies, and you delegate the responsibility.

public class Untestable4
{
    public void PerformSomeBusinessLogic(CustomerOrder order)
    {
        var extractedClass = new MaybeTestable();
        extractedClass.Region1Title();
        extractedClass.Region2Title();
        extractedClass.Region3Title();
    }
}

public class MaybeTestable
{
    int x, y, z;
    double a, b, c;
    int counter;
    CustomerOrder tempOrder;
    int secondLoopCounter;
    string output;
    string firstTimeInput;

    public void Region1Title()
    {

...

In this example, there are no fields in the untestable class that the method is using, but if there were, one way to handle this is to pass them into the constructor of the extracted class and have them as fields there as well. So, assuming this extraction goes smoothly (and it might not be that easy if the giant method has a lot of temporal coupling, resulting from, say, recycled variables), what is gained here? Well, first of all, you’ve slain a giant method, which will inevitably be good from a design perspective. But what about testability?

In this case, it’s possible that you still won’t have testable methods, but it’s likely that you will. The original gigantic method wasn’t testable. They never are. There’s really way too much going on in them for meaningful testing to occur — too many control flow statements, loops, global variables, file I/O, etc. Giant methods are giant because they do a lot of things, and if you do enough code things you’re going to start running over the bounds of testability. But the new methods are going to be split up and more focused and there’s a good chance that at least one of them will be testable in a meaningful way. Plus, with the extracted class, you have control over the new constructor that you’re creating whereas you didn’t with the legacy class, so you can ensure that the class can at least be instantiated. At the end of the day, you’re improving the design and introducing a seam that you can get at for testing.

Ask for your dependencies — don’t declare them

Another change you can make that may be relatively straightforward is to move dependencies out of the scope of your class — especially icky dependencies. Take a look at the original version of Untestable3 again.

public class Untestable3
{
    int _someField;

    public Untestable3()
    {
        TestabilityKiller.Instance.DoSomethingHorribleWithGlobalVariables();
        _someField = TestabilityKiller.Instance.GetSomeGlobalVariableValue();
    }

    public int AddToGlobal(int x)
    {
        return x + _someField;
    }
}

When instantiated, this class goes and rattles some global state cages, doing God-knows-what (icky), and then retrieves something from global state (icky). We want to get a test around the AddToGlobal method, but we can’t instantiate this class. For all we know, to get the value of “someField” the singleton gets the British Prime Minster on the phone and asks him for a random number between 1 and 1000 — and we can’t automate that in a test suite. Now, the earlier option of extracting code is, of course, viable, but we also have the option of punting the offending code out of this class. (This may or may not be practical depending on where and how this class is used, but let’s assume it is). Say there’s only one client of this code:

public class Untestable3Client
{
    public void SomeMethod()
    {
        var untestable = new Untestable3();
        untestable.AddToGlobal(12);
    }
}

All we really want out of the constructor is a value for “_someField”. All of that stuff with the singleton is just noise. Because of the nature of global variables, we can do the stuff Untestable3’s constructor was doing anywhere. So what about this as an alternative?

public class Untestable3Client
{
    public void SomeMethod()
    {
        TestabilityKiller.Instance.DoSomethingHorribleWithGlobalVariables();
        var someField = TestabilityKiller.Instance.GetSomeGlobalVariableValue();
        var untestable = new Untestable3(someField);
        untestable.AddToGlobal(12);
    }
}

public class Untestable3
{
    int _someField;

    public Untestable3(int someField)
    {
        _someField = someField;
    }

    public int AddToGlobal(int x)
    {
        return x + _someField;
    }
}

This new code is going to do the same thing as the old code, but with one important difference: Untestable3 is now a liar. It’s a liar because it’s testable. There’s nothing about global state in there at all. It just takes an integer and stores it, which is no problem to test. You’re an old pro by now at unit testing that’s this easy.

When it comes to testability, the new operator and global state are your enemies. If you have code that makes use of these things, you need to punt. Punt those things out of your code by doing what we did here: executing voids before your constructors/methods are called and asking for things returned from global state or new in your constructors/methods. This is another pretty low-impact way of altering a given class to make it testable, particularly when the only problem is that a class is instantiating untestable classes or reaching out into the global state.

Ruthlessly Eliminate Law of Demeter Violations

If you’re not familiar with the idea, the Law of Demeter, or Principle of Least Knowledge, basically demands that methods refer to as few object instances as possible in order to do their work. You can look at the link for more specifics on what exactly this “law” says, and what exactly is and is not a violation, but the most common form you’ll see is strings of dots (or arrows in C++) where you’re walking an object graph: Property.NestedProperty.NestedNestedProperty.You.Get.The.Idea. (It is worth mentioning that the existence of multiple dots is not always a violation of the Law of Demeter — fluent interfaces in general and Linq in the C# world specifically are counterexamples). It’s when you’re given some object instance and you go picking through its innards to find what you’re looking for.

One of the most immediately memorable ways of thinking about why this is problematic is to consider what happens when you’re at the grocery store buying groceries. When the clerk tells you that the total is $86.28, you swipe your Visa. What you don’t do is wordlessly hand him your wallet. What you definitely don’t do is take off your pants and hand those over so that he can find your wallet. Consider the following code, bearing in mind that example:

public class HardToTest
{
    public string PrepareSsnMessage(CustomerOrder order)
    {
        return "Social Security number is " + order.Customer.PersonalInfo.Ssn;
    }
}

The method in this class just prepends an explanatory string to a social security number. So why on earth do I need something called a customer order? That’s crazy — as crazy as handing the store clerk your pants. And from a testing perspective, this is a real headache. In order to test this method, I have to create a customer, then create an order and hand that to the customer, then create a personal info object and hand that to the customer’s order, and then create an SSN and hand that to the customer’s order’s personal info. And that’s if everything goes well. What if one of those classes — say, Customer — invokes a singleton in its constructor. Well, now I can’t test the “PrepareSsnMessage” in HardToTest because the Customer class uses a singleton. That’s absolutely insane.

Let’s try this instead:

public class HardToTest
{
    public string PrepareSsnMessage(string ssn)
    {
        return "Social Security number is " + ssn;
    }
}

Ah, now that’s easy to test. And we can test it even if the Customer class is doing weird, untestable things because those things aren’t our problem. What about clients, though? They’re used to passing customer orders in, not SSNs. Well, tough — we’re making this class testable. They know about customer order and they its SSN, so let them incur the Law of Demeter violation and figure out how to clean it up. You can only make your code testable one class at a time. That class and its Law of Demeter violation is tomorrow’s project.

When it comes to testing, the more stuff your code knows about, the more setup and potential problems you have. If you don’t test your code, it’s easy to write train wrecks like the “before” method in this section without really considering the ramifications of what you’re doing. The unit tests force you to think about it — “man, this method is a huge hassle to test because problems in classes I don’t even care about are preventing me from testing!” Guess what. That’s a design smell. Problems in weird classes you don’t care about aren’t just impacting your tests — they’re also impacting your class under test, in production, when things go wrong and when you’re trying to debug.

Understand the significance of polymorphism for testing

I’ll leave off with a segue into the next chapter in the series, which is going to be about a concept called “test doubles.” I will explain that concept then and address a significant barrier that you’re probably starting to bump into in your testing travels. But that isn’t my purpose here. For now I’ll just say that you should understand the attraction of using polymorphic code for testing.

Consider the following code:

public class Customer
{
    public string FirstName { get { return TestabilityKiller.Instance.GoGetCustomerFirstNameFromTheDatabase(); } }
}

public class CustomerPropertyFormatter
{
    public string PrepareFirstNameMessage(Customer customer)
    {
        return "Customer first name is " + customer.FirstName;
    }
}

Here you have a class, CustomerPropertyFormatter, that should be pretty easy to test. I mean, it just takes a customer and accesses some string property on it for formatting purposes. But when you actually write a test for this, everything goes wrong. You create a customer to give to your method and your test blows up because of singletons and databases and whatnot. You can write a test with a null argument and amend this code to handle null gracefully, but that’s about it.

But, never fear — polymorphism to the rescue. If you make a relatively small modification to the Customer class, you set yourself up nicely. All you have to do is make the FirstName property virtual. Once you’ve done that, here’s a unit test that you can write:

public class DummyCustomer : Customer
{
    private string _firstName;
    public override string FirstName { get { return _firstName; } }

    /// 

    /// Initializes a new instance of the DummyCustomer class.
    ///

public DummyCustomer(string firstName) { _firstName = firstName; } } [TestMethod, Owner(“ebd”), TestCategory(“Proven”), TestCategory(“Unit”)] public void Adds_Text_To_FirstName() { string firstName = “Erik”; var customer = new DummyCustomer(firstName); var formatter = new CustomerPropertyFormatter(); Assert.IsTrue(formatter.PrepareFirstNameMessage(customer).Contains(firstName)); }

Notice that there is a class, DummyCustomer declared inside of the test class that inherits from the Customer class. DummyCustomer is an example of a test double. You’ll notice that I’ve created a scenario here where I define a version of FirstName that I can control — a benign version, if you will. I effectively bypass that database-singleton thing and create a version of the class that exists only in the test project and allows me to substitute a simple, friendly value that I can test against.

As I said, I’ll dive much more into test doubles next time, but for the time being, understand the power of polymorphism for testability. If the legacy code has methods in it that are hard to use, you can create much more testable situations by the use of interface implementation, inheritance, and the virtual keyword. Conversely, you can make testing a nightmare by using keywords like final and sealed (Java and C# respectively). There are valid reasons to use these, but if you want a testable code base, you should favor liberal support of inheritance and interface implementation.

A Note of Caution

In the sections above, I’ve talked about refactorings that you can do on legacy code bases and mentioned that there is some risk associated with doing so. It is up to you to assess the level of risk of touching your legacy code, but know that any changes you make to legacy code without first instrumenting unit tests can be breaking changes, even small ones guided by automated refactoring tools. There are ways to ‘cheat’ and tips and techniques to get a method under test before you refactor it, such as temporarily making private fields public or local variables into public fields. The Michael Feathers book below talks extensively about these techniques to truly minimize the risk.

The techniques that I’m suggesting here would be ones that I’d typically undertake when requirements changes or bugs were forcing me to make a bunch of changes to the legacy code anyway, and the business understood and was willing to undertake the risk of changing it. I tend to refactor opportunistically like that. What you do is really up to your discretion, but I don’t want to be responsible for you doing some rogue refactoring and torpedoing your production code because you thought it was safe. Changing untested legacy code is never safe, and it’s important for you to understand the risks.

More Information

As mentioned earlier, here are some excellent resources for more information on working with and testing legacy code bases:

Working Effectively with Legacy Code by Michael Feathers
Clean Code by Robert (“Uncle Bob”) Martin
Clean Coders video series, by Robert Martin
The Art of Unit Testing by Roy Osherove (I have not personally read this, but I respect his work that I’m familiar with and have seen it recommended)

And, of course, you can check out my book about unit testing: Starting to Unit Test, Not as Hard as You Think.

Jul
3

By Erik Dietrich

Proposal: A Law of Performance Citation

Category: Language Agnostic Tags: C#, Performance | 16 Comments

I anticipate this post being fairly controversial, though that’s not my intention. I imagine that if it wanders its way onto r/programming it will receive a lot of votes and zero points as supporters and detractors engage in a furious, evenly-matched arm-wrestling standoff with upvotes and downvotes. Or maybe three people will read this and none of them will care. It turns out that I’m actually terrible at predicting which posts will be popular and/or high-traffic. And I’ll try to avoid couching this as flame-bait because I think I actually have a fairly non-controversial point on a potentially controversial subject.

To get right down to it, the Law of Performance Citation that I propose is this:

If you’re going to cite performance considerations as the reason your code looks the way it does, you need to justify it by describing how a stakeholder will be affected.

By way of an example, consider a situation I encountered some years back. I was pitching in to help out with a bit of programming for someone when I was light on work, and the task I was given amounted to “copy-paste-and-adjust-to-taste.” This was the first red flag, but hey, not my project or decision, so I took the “template code” I was given and made the best of it. The author gave me code containing, among other things, a method that looked more or less like this (obfuscated and simplified for example purposes):

public void SomeMethod()
{
    bool isSomethingAboutAFooTrue = false;
    bool isSomethingElseAboutAFooTrue = false;
    IEnumerable foos = ReadFoosFromAFile();
    for (int i = 0; i < foos.Count(); i++)
    {
        var foo = foos.ElementAt(i);
        if (IsSomethingAboutAFooTrue(foo))
        {
            isSomethingAboutAFooTrue = true;
        }
        if (IsSomethingElseAboutAFooTrue(foo))
        {
            isSomethingElseAboutAFooTrue = true;
        }
        if (isSomethingAboutAFooTrue && isSomethingElseAboutAFooTrue)
        {
            break;
        }
    }

    WriteToADatabase(isSomethingAboutAFooTrue, isSomethingElseAboutAFooTrue);
}

I promptly changed it to one that looked like this for my version of the implementation:

public void SomeMethodRefactored()
{
    var foos = ReadFoosFromAFile();

    bool isSomethingAboutOneOfTheFoosTrue = foos.Any(foo => IsSomethingAboutAFooTrue(foo));
    bool isSomethingElseABoutOneOfTheFoosTrue = foos.Any(foo => IsSomethingElseAboutAFooTrue(foo));

    WriteToADatabase(isSomethingAboutOneOfTheFoosTrue, isSomethingElseABoutOneOfTheFoosTrue);
}

I checked this in as my new code (I wasn't changing his existing code) and thought, "he'll probably see this and retrofit it to his old stuff once he sees how cool the functional/Linq approach is." I had flattened a bunch of clunky looping logic into a compact, highly-readable method, and I found this code to be much easier to reason about and understand. But I turned out to be wrong about his reaction.

When I checked on the code the next day, I saw that my version had been replaced by a version that mirrored the original one and didn't take advantage of even the keyword foreach, to say nothing of Linq. Bemused, I asked my colleague what had prompted this change and he told me that it was important not to process the foos in the collection a second time if it wasn't necessary and that my code was inefficient. He also told me, for good measure, that I shouldn't use var because "strong typing is better."

I stifled a chuckle and ignored the var comment and went back to look at the code in more detail, fearful that I'd missed something. But no, not really. The method about reading from a file read in the entire foo collection from the file (this method was in another assembly and not mine to modify anyway), and the average number of foos was single digits. The foos were pretty lightweight objects once read in, and the methods evaluating them were minimal and straightforward.

Was this guy seriously suggesting that possibly walking an extra eight or nine foos in memory, worst case, sandwiched between a file read over the network and a database write over the network was a problem? Was he suggesting that it was worth a bunch of extra lines of confusing flag-based code? The answer, apparently, was "yes" and "yes."

But actually, I don't think there was an answer to either of those questions in reality because I strongly suspect that these questions never crossed his mind. I suspect that what happened instead was that he looked at the code, didn't like that I had changed it, and looked quickly and superficially for a reason to revert it. I don't think that during this 'performance analysis' any thought was given to how external I/O over a network was many orders of magnitude more expensive than the savings, much less any thought of a time trial or O-notation analysis of the code. It seemed more like hand-waving.

It's an easy thing to do. I've seen it time and again throughout my career and in discussing code with others. People make vague, passing references to "performance considerations" and use these as justifications for code-related decisions. Performance and resource consumption are considerations that are very hard to reason about before run-time. If they weren't, there wouldn't be college-level discrete math courses dedicated to algorithm runtime analysis. And because it's hard to reason about, it becomes so nuanced and subjective in these sorts of discussions that right and wrong are matters of opinion and it's all really relative. Arguing about runtime performance is like arguing about which stocks are going to be profitable, who is going to win the next Super Bowl, or whether this is going to be a hot summer. Everyone is an expert and everyone has an opinion, but those opinions amount to guesses until actual events play out for observation.

Don't get me wrong -- I'm not saying that it isn't possible to know by compile-time inspection whether a loop will terminate early or not, depending on the input. What I'm talking about is how code will run in complex environments with countless unpredictable factors and whether any of these considerations have an adverse impact on system stakeholders. For instance, in the example here, the (more compact, maintainable) code that I wrote appears that it will perform ever-so-slightly worse than the code it replaced. But no user will notice losing a few hundred nano-seconds between operations that each take seconds. And what's going on under the hood? What optimizations and magic does the compiler perform on each of the pieces of code we write? What does the .NET framework do in terms of caching or optimization at runtime? How about the database or the file read/write API?

Can you honestly say that you know without a lot of research or without running the code and doing actual time trials? If you do, your knowledge is far more encyclopedic than mine and that of the overwhelming majority of programmers. But even if you say you do, I'd like to see some time trials just the same. No offense. And even time trials aren't really sufficient because they might only demonstrate that your version of the code shaves a few microseconds off of a non-critical process running headlessly once a week somewhere. It's for this reason that I feel like this 'law' that I'm proposing should be a thing.

Caveats

First off, I'm not saying that one shouldn't bear efficiency in mind when coding or that one should deliberately write slow or inefficient code. What I'm really getting at here is that we should be writing clear, maintainable, communicative and, above all, correct code as a top priority. When those traits are established, we can worry about how the code runs -- and only then if we can demonstrate that a user's or stakeholder's experience would be improved by worrying about it.

Secondly, I'm aware of the aphorism that "premature optimization is the root of all evil." This is a little broader and less strident about avoiding optimization. (I'm not actually sure that I agree about premature optimization, and I'd probably opt for knowledge duplication in a system as the root of all evil, if I were picking one.) I'm talking about how one justifies code more than how one goes about writing it. I think it's time for us to call people out (politely) when they wave off criticism about some gigantic, dense, flag-ridden method with assurances that it "performs better in production." Prove it, and show me who benefits from it. Talk is cheap, and I can easily show you who loses when you write code like that (hint: any maintenance programmer, including you).

Finally, if you are citing performance reasons and you're right, then please just take the time to explain the issue to those to whom you're talking. This might include someone writing clean-looking but inefficient code or someone writing ugly, inefficient code. You can make a stakeholder-interest case, so please spend a few minutes doing it. People will learn something from you. And here's a bit of subtlety: that case can include saying something like, "it won't actually affect the users in this particular method, but this inefficient approach seems to be a pattern of yours and it may well affect stakeholders the next time you do it." In my mind, correcting/pointing out an ipso facto inefficient programming practice of a colleague, like hand-writing bubble sorts everywhere, definitely has a business case.

DaedTech

My Initial Postsharp Setup — Logging at Assembly Granularity

PostSharp and Log4Net

Great, But How?

Getting Modular

Module Boundaries and Demeter

Introduction to Unit Testing Part 6: Test Doubles

First, a Bit of Theory

There are Inputs and There are Inputs

Literals: Too Simple to Need Test Doubles

Simple/Value Objects and Passing in Friendlies

Creating Bonafide Mocks

Mocking Frameworks

Intro to Unit Testing 5: Invading Legacy Code in the Name of Testability

Easy Does It

Test What You Can

Go searching for extractable code

Realize that Giant Methods are Begging to be Classes

Ask for your dependencies — don’t declare them

Ruthlessly Eliminate Law of Demeter Violations

Understand the significance of polymorphism for testing

A Note of Caution

More Information

Proposal: A Law of Performance Citation

Caveats

About Me

Developer Hegemony

Search the Site

Post Archives By Month