DaedTech

Stories about Software

By

I Don’t Know What X is on Line 48 And I Don’t Care

I was at a users’ group recently here in Chicago, and there were two excellent presenters. There were two very well done presentations: one about Xamarin and one about the C# language entitled “Underestimated C# Language Features” by John Michael Hauck. This was a very polished and appealing talk in which he wove together some important differentiators for the C# language, including closures, anonymous methods, and deferred execution. Given that he’s presented this at some conferences as well, I wasn’t surprised by the high caliber of the presentation.

The format was one in which he presented a series of increasingly difficult problems and asked what different variables were at different points of the program’s execution. For example, here’s something I just made up that would have looked at home in one of his many examples:

public void LetsSeeWhatHappens()
{
    int x = 100;
    Console.WriteLine(x);
    foreach (var f in GetSomeFuncs())
    {
        x++;
        Console.WriteLine(f(++x));
    }
    Console.WriteLine(x);
}

public IEnumerable> GetSomeFuncs()
{
    for (int index = 0; index < 3; index++)
        yield return i => index * i;
}

With each example like this, he’d ask the audience what they thought would be the output. And then the fun would begin. There’d always be several different answers and heated disagreement. After all, bragging rights and programmer cred was at stake. This was public gamification at its finest, and it reminded me vaguely of being in a sports bar and listening to people debating heatedly what the next play call would be in the Monday night football game:

Person 1: Third and 7. They have to pass!
Person 2: I think they’re going to call a draw.
Person 3: You’re nuts! That’s stupid! It’ll be a screen pass!
Person 1: Come on, you’ve been wrong every time!!!

FootballGame

I was entertained by this. John was an engaging presenter and the material interested me, but I discovered I had no interest in actually guessing, even to myself, to say nothing of out loud. At one point, a friend of mine that I was sitting with said, “who knows — write a unit test to figure it out,” and I certainly agreed with that point. But that wasn’t it. I think it was just that I was more interested in what the answer would teach me about the language and its nuance than I was in somehow testing myself. Frankly, this seemed like trivia to me and getting the right answer didn’t seem as important as taking the opportunity to hone my critical thinking skills.

Figuring this out, I realized it explained why I was impatient with all of the guessing that people were doing. John would ask people what they thought, and they would guess but then also start explaining their reasoning and arguing with one another. This struck me as boring and inefficient. “Just shut up and let him hit F10, and we’ll have the answer,” I thought. In that moment, I also realized that a very mild pet peeve of mine has always been code reviews or other places where people argue over runtime behavior. I think to myself, “You know what’s better than all of us at being the .NET runtime? The .NET runtime! So let’s ask it.”

After the presentation, I went out for a bite to eat with some friends. There, one of them made an excellent point. When we were discussing the idea of sitting around and figuring out what code did, he said (paraphrased), “if I show code to a handful of competent developers and they can’t agree on what it will do at runtime, then that code needs to be re-written.” I thought this was perfect and couldn’t agree more. The combination of letting the runtime stick to figuring out what things will be at runtime and the idea that well-written code shouldn’t make this a mystery really sort of drove home why this whole exercised seemed to me (and really is) purely an academic thought-drill.

By all means, exercise your brain and solve riddles involving programming. But I don’t think that this type of activity should be the centerpiece of work-place conversations, evaluations of code, or especially interviews. If I’m interviewing people and asking them what X is on line 48, before even the guy who gets it right, I’m going to hire the guy that says, “let’s write a unit test and find out.”

By

Please Don’t Recycle Local Variables

I think there’s a lot of value to the conservation angle of the green movement. In general, it’s a matter of efficiency–if you can heat/light/whatever your house with the same quality of life, using less energy and fewer resources, that’s a win for everyone. This applies to a whole lot of things beyond just eco-concerns, however. Conserving heat when you’re cold, conserving energy when you’re running a marathon, conserving your dollars when making a budget–all good ideas. Cut down, conserve, reuse when you can.

Recycle

Except please don’t do it with your local variables. For example:

public void DoSomeStuff()
{
    int count = 0;

    foreach (var customer in Customers)
    {
        DoSomeStuffToCustomer(customer);
        if (customer.DidTheRightStuffHappen)
            count++;
    }

    Console.WriteLine(count);
    count = 0;

    foreach (var machine in Machines)
    {
        DoSomeStuffToMachine(machine);
        if (machine.DidTheRightStuffHappen)
            count++;
    }

    Console.WriteLine(count);
}

Here, we initialize a local variable, count, and use it to keep track of the results of some processing of customers. When we’re done, we reset count and use it to keep track of the apparently unrelated concept of machines. What I’m saying is that there shouldn’t be just one count, but rather customerCount and machineCount.

Does this seem like nitpicking? You could certainly make that argument, but this code is not going to age well. First of all, this method should clearly be two methods, so we’re starting right off the bat with a bit of technical debt. It would be cleaner if each loop had its own method.

But an interesting thing happens if we use the refactoring tools to try to do that–the refactoring tool wants return values or input parameters. Yikes, that was unexpected, so we just move on. Later, when the time comes to iterate over movies, we see that there’s a ‘design pattern’ in place, so we modify the code to look like this:

public void DoSomeStuff()
{
    int count = 0;

    foreach (var customer in Customers)
    {
        DoSomeStuffToCustomer(customer);
        if (customer.DidTheRightStuffHappen)
            count++;
    }

    Console.WriteLine(count);
    count = 0;

    foreach (var machine in Machines)
    {
        DoSomeStuffToMachine(machine);
        if (machine.DidTheRightStuffHappen)
            count++;
    }

    Console.WriteLine(count);
    count = 0;

    foreach (var movie in Movies)
    {
        DoSomeStuffToMovie(movie);
        if (movie.DidTheRightStuffHappen)
            count++;
    }

    Console.WriteLine(count);
}

Now this thing should really be split up, so we start selecting parts of it to see what we can refactor. Ew, now we’re getting ref parameters to boot. This thing is getting even more painful to try to refactor, and we’re in a hurry, so no time for that. And to make matters worse, if you add in a few other aggregator variables this way, you’ll start to have all kinds of barriers in place when you want to pull this thing apart, such as crazy sets of out parameters. I’ve posted before about how I feel about ref and out.

All of this mounting technical debt could easily be avoided by giving each loop its own count variable. Having them recycle the same one creates a compile-time dependency of what’s going on in each loop with what happened in the loop before, even though there are no other similar dependencies in evidence. In other words, recycling this local variable is the only thing that’s creating a coupling in your code–there’s no logical reason to do it.

This is the height of procedural programming and baking in temporal dependencies that I cautioned you to avoid here. It’s a completely useless dependency that will inhibit refactoring and dirty up your code in a hurry. It may not seem like much yet, but this will be a huge pain point later as the lines of code in this method balloon from the dozens to the hundreds, and you rely heavily on automated tools to help with cleanup. Flag variables used over and over in sequence throughout a method are like pebbles in your shoe when you’re trying to refactor.

So my advice is to avoid this practice completely. There’s really no advantage to coding this way and the potential downside is enormous.

By

Introduction to Unit Testing Part 4: Design New Code For Testability

In the last post in this series, I covered what I think of as an important yet seldom discussed subject: how not to overwhelm yourself and get discouraged when you’re starting to unit test. In the post before that, I showed the basics of writing a unit test. But this leaves something of a gap. You can now write a unit test in a vacuum for an extremely simple class, and, when looking at your legacy code base, you know what to avoid. But you don’t necessarily how to write a non-trivial application with unit tests.

You might understand now how to write tests, assuming that you have some disconnected new class in your code base without dependencies and barriers to testing, but you wonder how to get to that point in the first place. I mean, your application doesn’t seem to need a prime number finder or a bowling score calculator. It needs you to add lines of code to existing methods or new methods to existing classes. It doesn’t really seem to need new classes, so the initial momentum and resolution you’ve built reading these first three posts to go and be a unit tester sort of fizzles anticlimactically when you stare at your code base.

What’s going on here?

Recognizing Inhospitable Terrain

The first thing to understand is that your code base probably wasn’t written with testability in mind. There’s nothing wrong with you for not being able to see where unit testing fits in, because it doesn’t. Last time I talked about things you’ll see that will torpedo your efforts to test a particular method or piece of code, but let me speak to some entire architectures and patterns that don’t lend themselves to testability. It’s not that code written with these technologies and patterns can’t be tested–it’s just that it won’t be easy for you. As you read through this, you might feel like I’ve read your code base’s mind.

  1. Active Record as an architectural pattern
    This is a pattern in which you create classes that are in-memory representations of database tables, views, or stored procedures. If you see in your code base a class called “Customer” that has methods like “GetById()”, “Update()” and “MoveNext()” you’ve got yourself an Active Record architecture. This architecture tightly couples your database to your domain logic and your domain logic to the rules for navigating through domain objects. You can’t test any of these objects since any operation you perform on them sends them scurrying off to create database connections and parameterized queries and all manner of other untestable stuff. And since decomposition and decoupling is the path toward unit testing, this sort of tight coupling of everything in your code is the path away from it.
  2. Winforms
    Winforms in the .NET world are tried and true when it comes to rapidly cranking out functional little applications, but you have to work really, really hard to make code that uses them testable. Q&A sites are littered with people trying to understand how to make Winforms testable, which should tell you that making them testable is not trivial. If you have Winforms and Active Record both in the same code base, at least the architecture is split into two concerns. But it’s split into two thoroughly untestable ones.
  3. Webforms
    See Winforms. Webforms is very similar in terms of framework testability, and for pretty similar reasons. Webforms is arguably even harder to test, however, because it’s predicated on spewing out reams and reams of HTML, CSS, and Javascript while allowing you to pretend you’re writing a desktop app. I’ve talked about my opinion of this technology before.
  4. Wizard/markup-reliant code
    Do you use the Webforms grid wizard thing to generate your grids? Do you define object data sources, such as DB connections or files, in the markup? Practices like these are the epitome of quick and dirty, rapid-prototyping implementations that hopelessly cross couple your applications beyond all testability. If this is something that’s done in your group/code base, testing is basically a non-starter until you go in a different architectural direction.
  5. Everything in your application is in a user control/form
    I’ve seen this called “Smart UI” and it basically means that there’s absolutely no separation of concerns in your code. The UI elements create database connections, write to files, implement business rules–they do everything. Code like this is impossible to unit test.

If any of this is sounding familiar, your task might be daunting. I have my own preferences, but I’m trying not to offer a value judgment here as much as I’m letting you know what you’re up against. I’m like a mortgage broker that’s saying to you, “if you want to own a home, that’s a great goal. But if you are eight months behind on your rent and have no personal savings, you’re going to have some work to do first.” If I’ve described your code base in the list above, you face different challenges than a green field developer. And since you’ve presumably been contributing to these code bases, you’re probably very used to implementation techniques that don’t result in testable code. You’re going to need to change your thinking and your coding practices in order to start writing testable code.

Once we’ve discussed how to get you writing testable code, I’ll come back to these macroscopic concerns and give some pointers for how to improve the situation. But, for now, on to a revised approach to coding. The following holds true whether you’re banging away at some legacy Winforms/Active Record application or starting a brand new MVC 4 site.

Add New Methods and Classes First, Ask Questions Later

First thing to abide by is to favor adding new things to the code over modifying existing things in the code. You may have heard this before in the context of the “Open/Closed Principle”, but that’s a guide for how to write your classes. (Basically, it admonishes you to write classes that others can extend and override rather than change.) I’m talking about how to deal with existing code bases. To put it simply and bluntly, it’s a lot easier to both code and test brand spankin’ new classes than to write and test changes to existing ones. We all know this. It’s at the heart of why we as developers always lean toward rewriting others’ code instead of understanding and working with it.

Now, this might not win you friends. In shops where people tend to write procedural code (you know, the kind you’re trying to get away from writing), they seem to have some weird fear of creating too many classes and masochistic attachment to monolithic structures. You might have to compromise or practice on your own if you run afoul of the project’s architect, but the exercise is invaluable. It’s going to propel you toward decoupling as a default rather than an exception. Doing new things? Time for a new class and some unit tests.

I know what you’re thinking: “But what if the thing that needs to be done has to be done in the middle of some method somewhere?” Well, instantiate your new class at that point and use it. “But what if it needs a bunch of fields from the class it’s in and variables from the method?” Pass them in through the constructor or method call. “But won’t that make my design bad?” It already is bad, but at least now you’re making part of it testable. When your code is testable and under test, everything is easier to fix later.

You’ll have to use some discretion, obviously, but shift your attitude here. Don’t look at a project that has nothing but .aspx files and their code behind and hide classes in there for fear of breaking with tradition. Boldly add pure .cs files to the code base. Add unit tests for those .cs files you’re creating. (Apologies to Java readers, but this has no real Java equivalent that I can think of having used. Plus, the Java stack in general seems not to have the same level of untestable cruft built into frameworks.) It’s a lot harder for someone, even the project architect, to give you a hard time if your new way of doing things is covered by unit tests. Even if they’re hostile Expert Beginners, they’re hard pressed not to sound silly if they say, “we don’t do that here.” You’ll at least have a better chance of pushing this change through with the unit tests than without them.

Ask Questions in the Right Order: What, How, When

Now that you’re creating a lot of new classes and instantiating them in the old untestable ones, it’s time to start working on what kind of code you write. If you’ve practiced and come back, I suspect that you’re starting to be able to write a few useful tests but are perhaps still struggling. And I bet it’s because the line between where the old class ends and your new one begins is a little hazy. Maybe they share some common fields. Maybe when you instantiate the class you’re testing, you hand it a “this” reference so that it can go picking through the properties on the untestable behemoth from which you’re escaping. This is the next thing we need to tighten up–stop doing that. A clear, concise division of labor between the classes is necessary, and it’s not possible if they share all of the same fields, properties, state, etc. That’s like a break up where you two continue to live together, share a car, and go to the movies on weekends.

The best way to achieve this clean split is with good abstraction, and the best way to do that is to remember “what, how, when.” When it’s time to change the code base, remember that you want to favor creating a new class. But before you do that, ask yourself “what?” but not the other two questions. What should I name this class? What should it do? What should it expose as its public methods? Don’t start thinking about how those methods or classes should work and don’t you dare start thinking about when anything at all should happen–just think about what. Give it a good name that defines a clear purpose, and then give it a good set of methods and properties that draw attention to why it’s a different concept than the class that will be using it. Ask yourself what the boundaries between the two classes will be so that you can minimize the amount of shared information. And now, start stubbing out those methods with no implementation and start stubbing out some test methods with names that say what the methods will do.

At this point, you’re ready for “how.” Start actually implementing the “what” and testing that your implementation works in the unit tests. Believe me, it’s much, much easier to implement methods this way. If you think about “what” and “how” at the same time, you start writing confusing code that not even you have faith in when you’re done. Implementation alone is much easier when you have a clear picture of “what,” and unit testing is a breeze.

Once everything is implemented, you can start thinking about “when.” When should you instantiate the class and when should you call its methods? But don’t spend too much time with “when” in your head because it’s dangerous. Does that sound weird? Let me explain.

“When Code” is Monolithic Code

Picture two methods. One is a 700 line juggernaut with more control flow statements than you can keep track of without a spreadsheet. The other is five lines long, consisting only of an initialize statement, a foreach, and a return statement. With which would you rather work? I imagine the response is unanimous here. Even if you tend to crank out these kinds of large methods, when you step through the debugger, looking for the cause of a bug and finding yourself in some huge method, your heart sinks and you settle in with snacks and caffeine because it’s going to be a long day.

Now with these two methods in mind, imagine if we were pair programming together and I simply asked “when?” With the tiny method, you’d probably say, “What do you mean by ‘when’–I mean, you initialize before the loop and you return when you find the record you’re looking for. What a weird question!” With the other method, you’d probably affect a thousand-yard stare and say, “Man, I don’t even know where to begin.” But the answer to that question would fill pages. Books. Because you set the first loop counter j equal to the third loop counter k about twenty lines before the fourth try-catch and fifteen lines before you set the middle loop counter j equal to four. Unless, of course, you threw that exception up on line 2090, in which case j might never have been initialized. Er, wait, I think that happened somewhere near the fifth while loop in that else condition up there. Oh, there’s so much “when,” but it’s all slammed together in a method where you can’t possibly test any of it. Lots of thinking about “when” breeds huge methods like a Petri dish for bacteria.

“When” code is procedural at its core, and procedural, “when” code is an anathema to object-oriented unit testing, which is all about “what” and “how.” Remember earlier in the series when I said that multi-threaded code was really, really hard to test? Well, that’s just a subset of an idea called “temporal coupling,” and what we’re talking about here also falls under that umbrella. Temporal coupling is what happens when things have to be executed in a specific order or else they do not work.

Imagine that you’re coding up a model for someone’s day. When you think of how to do this, do you think, “first he gets up, then he brushes his teeth, then he showers, then he puts on his clothes, then… then he comes home, then he eats dinner, then he watches TV, then he goes to bed?” Do you code this up with constructs like:

public void GoAboutMyDay()
{
    bool wokeUp = WakeUp();
    if (wokeUp == true)
    {
        bool showered = Shower();
        if (showered == true)
        {
            bool putOnClothes = PutOnClothes();
            if(putOnClothes)
                //You get the idea
        }
    }
}

This method is all about “when.” It’s entirely procedural, and it’s going to be horrifying when it’s complete. When you get into the fortieth nested if condition, maybe someone will come along and flatten it out with a bunch of inverted early returns. Or maybe not, because maybe some of if clauses start sprouting else conditions with loops in them. And maybe the methods being called start communicating with one another via boolean flag fields in the class. Who knows–this thing is on the precipice of becoming unstoppable. It might just achieve sentience at some point, so that when you try to start deleting conditionals, it says, “I can’t let you do that, Dave,” and puts them back.

The root problem behind it all is the “when” and the procedural thinking because you’re orienting the implementation around the order of the activities rather than the nature of the activities. Unit testing is all about deconstructing things into their smallest possible chunks and asserting things about those chunks. Temporal coupling and “when” logic is all about chaining and fusing things together.

If you were thinking about “what” first here, you would form a much different mental model of a person’s day. You’d say things to yourself like, “well, during the course of a person’s day, he probably wakes up, gets dressed, eats breakfast–well, actually eats one or more meals–maybe works if it’s a weekday, goes to bed at some point,” etc. Whereas in the procedural “when” modeling you were necessarily building a juggernaut method, here you’re dreaming up the names of methods and/or classes that can be unit tested separately and in isolation. It’s no reach to say, “okay, let’s have a Meal class that will have the following methods…”

Only at the end will you decide “when.” You’ll decide it after you’ve stubbed things out with the “what” and implemented/unit tested them with the “how.” “When” is a detail that you should allow yourself to figure out at any point down the line. If you nail down what and how, you will have testable, modular, and manageable code as you create your classes.

Other Design Considerations for Your New Classes

I’ll wrap up here with a few additional tips for creating testable designs when adding code-to-code bases:

  1. Avoid using fields to communicate between your methods by setting flags and tracking state. Favor having methods that can be executed at any time and in any order.
  2. Don’t instantiate things in your constructor. Favor passing them in (we’ll talk about this in detail in a future post in the series).
  3. Similarly, don’t have a lot of code or do a lot of work in your constructor. This will make your class painful to setup for test.
  4. In your methods, accept parameters that are as decomposed as possible. For instance, don’t accept a Customer object if all you do with it is read its SSN property. In that case, just ask for the SSN.
  5. Avoid writing public static methods. These are easy enough to test (often), but they start introducing testability problems when you write code that uses them. (This might be hard to swallow at first, but mull over the idea of simply not using static methods anymore.)
  6. The earlier you start writing your unit tests, the better. If you find that you’re having a hard time testing your new code, it’s more likely a problem with the code than with unit testing it, and if you write tests early, you’ll discover these problems before you get too far and fix them.

This post has covered ways to write unit tests “from here forward” and ways to stop adding untested code to code bases. In the next post, I’ll talk about how to start getting the legacy code under test.

Addendum: Mitigating the Hostile Test Environments

Finally, as promised, here are ways to accommodate testing in the less-than-ideal architectures mentioned above, if you’re curious or want to do some more research:

  1. Instead of Active Record, look at some kind of ORM solution like NHibernate or Entity Framework. These are tools that generate all of the code for you to access the database so that you don’t have to worry about testing that code and you can focus on writing only your (testable) domain code. Barring that, try to separate the three concerns of Active Record objects: modeling the database, connecting to the database, and modeling a domain object. The first concern adds no value, and the second two can be broken out into separate objects where the only thing hard to test is the actual database access.
  2. Instead of Winforms, favor WPF when possible. If that isn’t possible, see if you can use the Model-View-Presenter (MVP) pattern to move as much logic out of the untestable code-behind as possible.
  3. To be blunt, from a testing/decoupling perspective, Webforms is a disaster. You can have some limited success by adopting a more passive binding model and moving as much code out of the code-behind as possible, but it’s all pretty awkward. Webforms really seems more about rapid-prototyping and Microsoft-Accessing web development than producing scalable, sophisticated architectures.
  4. If you’re using wizards to generate your application’s architecture, cut it out. If you’re defining implementation details in markup, cut it out. Markup is for layout, not unit-testable business logic or state logic. If you depend on definitions in markup to drive your application’s behavior, you’re relying exorbitantly on a third-party framework, which is always extremely brittle from a testability perspective.
  5. To fix Smart UI, you just have to factor toward a more decoupled architecture. Start pulling different concerns out of the user controls and forms and finding a home for them.

By

Don’t Write Code You Don’t Need

I was reviewing some code the other day, and I saw a quick-and-dirty logger implementation that looked something like this (I’m re-creating from memory and modifying a bit for illustrative purposes in this post):

public class Logger
{
    public string Path { get; set; }

    public Logger()
    {

    }

    public Logger(string path)
    {
        Path = path;
    }

    public void WriteEntry(string entry)
    {
        using (var writer = File.AppendText(Path))
        {
            writer.WriteLine(entry);
        }
    }
}

A few things probably jump out at you, depending on your C#, OOP, and code reviewing chops. I’d imagine that, at the very least, you think, “you could just delete the default constructor.” You might also wonder why the other constructor exists since you could just set Path whenever you wanted. There’s the lack of null/valid check on path before creating a stream writer with it, and the unnecessary, potentially problematic, and definitely not threadsafe creation of the stream writer over and over. These things are all valid to point out, but I’d say that they’re also all symptoms of two larger root causes that I want to talk about.

Overeager to Please

TooMuchCode

You can offer too much code. By this I mean something subtly but critically different from “you can write too much code,” as when you are needlessly complicated or verbose. Offering too much code means that you’re giving users of the public interface of your classes too many options. Before my TDD days, this was something with which I constantly struggled. To understand what I mean, consider the thinking that probably went into the creation of this class:

Well, let’s see. A file needs a path, so a file logger should probably also take a path as input. From there, it should handle the details of streams and all that other stuff so that all we have to do is give it stuff to put in the file.

So far, so good. This is pretty clever as far as abstractions goes, and whoever wrote this class has a good grasp on how to create abstractions that provide value. But here’s what probably came next.

So, for path, let’s make that a public property in case the user of the class wants to change the path later. For convenience, let’s add a constructor that lets the user specify the path, but let’s also let him know that he doesn’t have to.

That thinking is considerate and helpful, but it’s offering too much code. You need to take ownership of your abstractions and be a little forceful and unyielding with them. You provide what you provide, and if users don’t like your class, they can create an inheritor or write their own or something: “this is my class, take it or leave it.”

In this case, you have to make concrete decisions about how and when the path of the logger is set:

In this class, the path is set at instantiation time and only instantiation time. As long as you live with my logger, you will obey my rules!

public class Logger
{
    public string Path { get; private set; }

    public Logger(string path)
    {
        Path = path;
    }

    public void WriteEntry(string entry)
    {
        using (var writer = File.AppendText(Path))
        {
            writer.WriteLine(entry);
        }
    }
}

Notice that we’ve already eliminated the problem of the pointless constructor by making this decision. We’re also well poised to handle the null/valid checking of the path since we can just do it in the constructor instead of doing it in the constructor and in the setter for path and worrying about duplication, when exactly to validate and how, what to do on invalid set when the last path was valid, etc. It’s also going to be very easy to take care of the issue with the needless creation of StreamWriter instances. Now that you can’t change the path once the instance is created, you can simply create the writer in the constructor and reuse it for the lifetime of the object by storing it as a private field. Now making this threadsafe becomes a pretty manageable task. In general, eliminating one conceptual option eliminates a lot of internal implementation complexity.

But what about the users who want to set the path at some point later in the object instance’s lifetime? Psst…that’s not a real common use case. And you know what? If they really want to do that, they can just instantiate a second logger. Don’t make your life really complicated because you think users of your code might want flexibility. Because you know what they want more than extra flexibility? Code that works. And it’s a lot harder to offer them code that works if you’re bending over backwards to handle every conceivable thing they might want to do.

Pointless or Speculative Mutability

The last section touches on this obliquely, but I’d like to bring this to the fore. Mutable, state-based code is a lot more prone to problems than immutable or functional code that has fixed or no state, respectively. Consider the refactoring path I’ve proposed here. In the last section, I chose to eliminate the public setter for Path instead of the constructor that took Path as an argument. Did that seem like a coin flip and pick one to you? Well, it wasn’t.

With the constructor injection of path, we have to do validity checking and initialization only once, to establish preconditions for the object’s existence. With the public setter for path, we have to maintain object invariants, which tends to be more complex. A good example of this is how we handle the transition from a valid path to the user setting an invalid or null path. Do we throw an exception? Revert to the previous path? You don’t have a discussion when there was no valid previous path as is the case with constructor injection.

This is just an example of a broader idea, which is that mutability creates state transition management tasks and that these tasks are harder to get right. To understand what I mean, imagine that you’re implementing the API for an array. You tell your users, “alright, when you create the array, you specify how many elements it will have, and then you can set and access the elements as you please by index.” The users come back and say, “well, we want to change the size of the array on the fly,” to which you respond, “oh, crap,” as you start to wonder what you’ll do if they want to resize it smaller than what they’ve used or if they try to make it negative or too big or something. You can feel the number of edge cases multiplying.

If you come from a background where you write a lot of procedural, script-based, or hacked-together utility code, this may seem like a weird thing to think about. You’re probably used to doing things like declaring boolean ‘flags’ somewhere in a method and setting them much later in that same method in order to keep track of something further down the line. And if you do this, you’re probably used to a lot of tweaking, guessing, and hoping for the best. But as it turns out, you’re doing things the hard way.

The easy way is to program without any state at all. That means no assignment and no variables, the way you might do if asked to write a method that took x and y and returned 4x + 2y. There’s no need to do any of that: just have “return 4x + 2y” and be done with it. This way of doing things is called functional programming, and its calling card is that output is always a pure function of the input and is entirely predictable.

If functional paradigm isn’t an option (and it often isn’t in the entirety of an application), the next easiest thing to deal with is immutable objects. These are objects that have state, but that state is not modifiable after creation time. An example of an object that can easily be immutable is something like an address. An address is just a handful of strings bound together with a class definition, so why have it be mutable? If you want a different one, just let the one you have go out of scope and create a new one. Then you don’t have to worry about questions like “what if I set a new address1, but not a new address2–that would never happen, right?” In immutable land, that’s simply not a consideration that you bother with.

The hardest way of doing things is with state transition management or mutable state. That’s the nature of the original logger at the top. Anything goes. The object has only transitive state, so it has either to constantly manage all permutations of state changes to check for accuracy or it has to risk being wrong in an invalid state. This is not a great choice and should be avoided if at all possible.

As you develop, you’ll find it’s possible a lot more often than you think. Objects that perform a service, such as a logger, generally have little reason to store mutable state, particularly in a nicely decoupled application. Even a lot of data objects, which represent the very core of what we’d think of as mutable, can be immutable a lot more often than you’d think (e.g. address). Mutability in your code is often best left in places where it’s unavoidable. If you don’t have the luxury of pass-through interaction with a database, you will generally maintain an in-memory domain model that needs to have mutable state because it’s representing the database which is, almost by definition, a whole gigantic system of files full of mutable state. A lot of objects that help you manage user interface will have mutable state (e.g. some class that stores things like “is checkbox X currently checked”). But unavoidable as it may be, you can certainly minimize and isolate it. Whatever you do, don’t introduce it where you don’t need it because you’re creating needless complexity, which means extra things that can go wrong.

Take-Away

The upshot of this exhaustive code-review that I’ve conducted is just advice to consider carefully where your classes fall on the functional-immutable-mutable spectrum and to keep them as far toward the functional side as possible. I’m not suggesting that you run out and rewrite existing code this minute or even that you vow to change how you do things. I’m just suggesting you be aware that mutable objects come with a heavy cost in terms of complexity and difficulty. This will help you in your programming travels in general, and especially if you’re looking to delve into things like architecture, test-driven development, or concurrent (multi-threaded) programming.

By

Why Are Public Fields No Good, Again?

Today, a developer on my team, relatively new to C# and some of the finer points of OO, asked me to take a look at his code. I saw this:

public class Address
{
    public string Street1;
    public string Street2;
    public string City;
    public string State;
    public string Zip;
}

How would you react to this in my shoes? I bet like I did–with a quick “ooh, no good…try this:”

public class Address
{
    public string Street1 { get; set; }
    public string Street2 { get; set; }
    public string City { get; set; }
    public string State { get; set; }
    public string Zip { get; set; }
}

Busy and pleased with myself for imparting this bit of wisdom, I started to move on. But then I realized that I was failing at mentoring. Nothing annoyed me more as a junior developer than hearing something like, “That’s just the way you do it.” And there I was saying it.

The creatures outside looked from pig to man, and from man to pig, and from pig to man again; but already it was impossible to say which was which.

AnimalFarm_1stEd

I stopped at this point and forced myself to talk about this–to come up with a reason. Obviously, I hadn’t been thinking about the subject of why public fields are icky, so slick terms like “encapsulation” and “breaking change” weren’t on the tip of my tongue. Instead, I found myself talking sort of obliquely and clumsily explaining encapsulation in terms of an example. Strangely, the first thing that popped into my head was threading and locking the field to make it thread-safe. So I said something along the lines of this:

With a public field, you simply have a variable that can be retrieved and set. With a property, you’re really hiding that variable behind a little method, and you have the ability to centralize operations on that variable as needed rather than forcing them on everyone who uses your code. So imagine that you want to make these fields thread-safe. In your version, you force all client code that uses your class to deal with this problem. In the property version, you can handle that for your clients so that they needn’t bother.

Yeah, things are always more awkward off the cuff. So I ruminated on this a bit and thought I’d offer it up in blog post format. Why not have public fields (aside from random thoughts about threading)?

The Basic Consideration: Encapsulation

As amused by my off-the-cuff threading example as I am, it does drive at a fundamental tenet of OOP, which is class-level encapsulation. If I’m writing a class, I want to expose a set of public functionality (methods and properties) that describe what an instance of the class will do while hiding how it will do it. If the address class is in some other assembly and I don’t decompile or anything, as a client, I have no idea if the City property is auto-implemented, if it wraps a simple property, or if all kinds of magic happens behind the scenes.

I know (hopefully) that if I set the City property to some value and then read it back, I get that value. But beyond that, I dunno. Maybe it implements lazy loading from some persistence store. Maybe it tracks all of the things you’ve set City to throughout its entire lifetime. I don’t know, and I don’t want to know if it isn’t part of the public API.

If you use public fields, you can’t offer consumers of your code that blissful ignorance. They know, ipso facto. And, worse, if they want any of the cool stuff I mentioned, like lazy loading or tracking previous values, they have to implement it themselves. When you don’t encapsulate, you fail to provide any kind of useful abstraction to me. If all you’re giving me is a field-bag, what use do I have for an instance of your class in mine? I can just have my own string variables, thank you very much, and I will hide them from my clients.

The Intermediate Consideration: Breaking Changes

There’s a more subtle problem at play here as well. Specifically, public properties and public fields look and seem pretty similar, if not identical, but they’re really not. Under the hood, a property is a method. A field is, well, a field. If you have a property and you decide you want to change its internal representation (going from something complex to simple property or automatic property), no harm done. If you want to switch between a field and a property though, even the trivial switch I showed above, there be dragons–especially going from field to property.

It seems like a pretty obvious case of YAGNI to start out with a field and move to a property if you need it, but things start to go wrong. Weird and subtle things. And they go wrong on you when you least expect them. Maybe you’re using the public members of the Address class as part of some kind of reflection. Well, now things are broken because properties and fields are completely different here. Perhaps you have a field that you were using as an out parameter somewhere. Oops. And, perhaps most insidiously of all, just changing a field to a property will cause runtime exceptions in dependent assemblies unless recompiled.

Let me put that another, scarier way. Let’s say that I write something called Erik’s Awesome DLL and I distribute it to to the world, where it gains wide adoption. Let’s also say that I created a bunch of public fields that the world uses as part of my DLL’s API. And finally, let’s say that in vNext, I decide that I want some encapsulation after all. Maybe I want to implement thread-safety. I implement and publish vNext to Nuget, and you update accordingly. The next time you hit F5, your application will blow up, and it will continue to blow up until you recompile. It may not sound like a big deal, but you’re definitely going to annoy users of your code with stuff like this.

So beware that you’re casting the die here more quickly and strongly than you might think. In most cases, YAGNI applies, but in this case, I’d say that you should assume you are going to need it. (And anything that stops people from using out parameters is good in my book)

The Advanced Consideration: Tell, Don’t Ask

I’ll leave off with a more controversial and definitely the most philosophical point. Here’s some interesting reading from the Pragmatic Bookshelf, and the quote I’m most interested in here describes the concept of “Tell, Don’t Ask.”

[You] should endeavor to tell objects what you want them to do; do not ask them questions about their state, make a decision, and then tell them what to do.

I’m calling this controversial because the context in which I’m mentioning it here is a bit strained. Address, above, is essentially a data transfer object whose only purpose is to store data. It will probably be bubbled up from somewhere like a database and make its way onto something like a form, with perhaps no actual plumbing code that ever does ask it anything. But then again, maybe not, so I’m mentioning it here.

Public fields are the epitome of “ask.” There’s no telling but not asking–they’re all “ask and tell.” Auto-properties are hardly different, but they’re a slight and important step in the right direction. Because with a property, awkward as it may be, we could get rid of the getter and leave only the setter. This would be a step toward factoring to a (less awkward) implementation in which we had a method that was setting something on the object, and suddenly we’re telling only and asking nothing.

“Tell, don’t ask” is really not about setting data properties and never reading them (which is inherently useless), but about a shift in thinking. So in this example, we’d probably need one more step to get from telling Address what its various fields are only to telling some other object to do something with that information. This is getting a little indirect, so I won’t belabor the point. But suffice it to say that code riddled with public fields tends to be about as far from “tell, don’t ask” as you can get. It’s a system design in which one piece of code sets things so that another can read it rather than a system design in which components communicate via instructions and commands.

So I feel clean and refreshed now. I didn’t shove off with “I’m in charge and I say so” or “because that’s just a best practice.” I took time to construct a carefully reasoned argument, which served double duty of keeping me in the practice of justifying my designs and decisions and staying sharp with my understanding of language and design principles.