DaedTech

Stories about Software

By

Beware the Bloated Constructor

What’s a Bloated Constructor?

Yesterday, I was going through the version history of a file in some code base (from earliest to most recent) and I saw the following:

public ActiveProduct(Product product, StringVersion driver, Side es)
{
    Side = es;

    try
    {
        if (product == null)
            throw new Exception("Can't create an active product from a null product");

        Logger.LogTimeStudyStart("ActiveProduct create");

        if (driver.IsEmpty())
            CurrentDriver = product.Driver;
        else
            CurrentDriver = driver;

        _Device = Session.Instance.DeviceManager.CreateHI(es, CurrentDriver.ToString());
        _Device.ConnectionStatus += Device_ConnectionStatus;
        _Device.BatteryStatus += Device_BatteryStatus;
    }
    catch (Exception ex)
    {
        ExceptionHandler.HandleException(ex);
        throw ex;
    }
    finally
    {
        Logger.LogTimeStudyEnd("ActiveProduct create", "ActiveProduct: Creating " + driver);
    }

    State = ActiveProductState.Simulated;
    Environments = new SortedList();
    this._Product = product;
    _Options = product.AvailableOptions;
    AvailablePrograms = new ProgramSlots(this) { HIStartIndex = 0 };
    AccessoryPrograms = new ProgramSlots(this) { CanRemoveOverride = true, StartIndex = 4, IsAutoShifting = false };
    VirtualPrograms = new ProgramSlots(this) { IsVirtual = true, CanRemoveOverride = true, SlotConfig = Limit.None };
    SlotCalculator = new ProgramCalculator(this, false);
    VirtualCalculator = new ProgramCalculator(this, true);
    DFS = new DFSCalibration(this);
    Accessories = new Accessories();
    PowerBands = PowerBands.Unknown;
}

This made me sad and tired but no worries, I figured, there were a lot more revisions I hadn’t yet seen. Surely somebody did something about this later. And then, happiness:

public ActiveProduct(Product product, StringVersion driver, Side es)
{
    Initialize(product, driver, es);
}

Sweet! That’s much easier on the eyes. Let’s see just see what Initialize does and call it a day:

private void Initialize(Product product, StringVersion driver, Side es)
{
    Side = es;

    try
    {
        if (product == null)
            throw new Exception("Can't create an active product from a null product");

        Logger.LogTimeStudyStart("ActiveProduct create");

        if (driver.IsEmpty())
            CurrentDriver = product.Driver;
        else
            CurrentDriver = driver;

        _Device = Session.Instance.DeviceManager.CreateDevice(es, CurrentDriver.ToString());
        _Device.ConnectionStatus += Device_ConnectionStatus;
        _Device.BatteryStatus += Device_BatteryStatus;
    }
    catch (Exception ex)
    {
        ExceptionHandler.HandleException(ex);
        throw ex;
    }
    finally
    {
        Logger.LogTimeStudyEnd("ActiveProduct create", "ActiveProduct: Creating " + driver);
    }

    State = ActiveProductState.Simulated;
    Environments = new SortedList();
    this._Product = product;
    _Options = product.AvailableOptions;
    AvailablePrograms = new ProgramSlots(this) { DeviceStartIndex = 0 };
    AccessoryPrograms = new ProgramSlots(this) { CanRemoveOverride = true, HIStartIndex = 4, IsAutoShifting = false };
    VirtualPrograms = new ProgramSlots(this) { IsVirtual = true, CanRemoveOverride = true, SlotConfig = Limit.None };
    SlotCalculator = new ProgramCalculator(this, false);
    VirtualCalculator = new ProgramCalculator(this, true);
    DFS = new DFSCalibration(this);
    Accessories = new Accessories();
    PowerBands = PowerBands.Unknown;
}

Oh, the humanity… looks like someone sprayed a bit of cologne here and nothing more.

So, what’s the problem?

I contend that this is a fundamental design failure. Forget throwing an exception from within a try block as some kind of goto. Forget all of the static stuff like the logging and exception handling and general global state. Forget the singletons. Forget all of the other ancillary problems. The design failure is how complicated (convoluted) this constructor is. I see this enough that I think some people would call it a development pattern (I don’t think any would claim it rises to the level of “Design Pattern”), but I firmly believe that it is an anti-pattern.

This style of code is the product of procedural, command and control thinking that I blogged about some time back. It makes no distinction between creation/initialization of an object and operations of the object. These distinct activities are blurred as the bloated constructor generates logger messages, talks to singletons, instantiates other objects, etc.

From the perspective of the procedural programmer, the constructor should take the role of making sure the object wants for nothing, having all of its creature comforts and having its every desire and whim satisfied. In the object oriented style, where dependencies tend to be inverted, the constructor has a different and more Spartan role. Its only job is to make sure that object initializes into a state where it satisfies its basic invariants (in other words, it makes sure that the object instance starts off in a valid state and nothing more).

Let’s consider why the latter, object-oriented approach tends to be a better fit in object oriented languages (and, I would argue, in any state-based paradigm, but I’ll focus on OOP here).

1. Violating Developer Expectations

Imagine that you’re newly assigned to a code base and you want to start understanding how it works. An excellent way to do this is to write a few unit tests (perhaps to keep or perhaps just for throw-away experimentation). You pick some class that you’d like to test and figure you’ll experiment with its API before digging into its code. So, you create a new instance of that class and…

Oops. Your continuous testing tool crashes. Huh. Guess you’ll have to look in the code now (this is never a promising sign — if you have to inspect a class’s code to understand why it’s crashing, it’s probably not a good abstraction, especially if this is true of the constructor.) So, you look in the code and see that there’s a reference to a lazy-loading singleton that lazy-loads a few other singletons and one of those singletons is trying to read files off of a network somewhere and….

Alright, forget it. Let’s move onto another class. You instantiate that class and your testing tool doesn’t crash, but it does turn red. Huh. Back into the code. Ah, this time the class under test doesn’t touch any static stuff or singletons, but it instantiates another class that does. The exception handling is a little better here, so you just turn the testing tool red rather than completely crashing it, but this class is hosed for testing as well.

As a new developer, you’re instantiating classes and expecting that they’ll be instantiated. What’s happening instead is weird, bad, unpredictable things that force you to open the source code to figure out what on Earth is going on. You’re badly betraying a trust other developers put in you to do what you advertise (i.e. putting together the minimum invariants of the class).

2. Testability Nightmare

And, looking back at the narrative from the previous section, another problem makes itself pretty obvious as well. Constructors that do real work instantiating other things (especially global state) make classes impossible to test in isolation. Even if you don’t use global state and instead have classes all instantiate their collaborators, it’s still tough to test things in isolation.

Imagine that your code works like this: main tells a car to build itself, the car tells its engine and chassis to build themselves, the engine tells the alternator and transmission to build themselves, etc. Now, let’s say that you want to write a test of the car to see what happens when its engine starts out invalid. Just how will that be accomplished? You have no control over its engine — it controls that in its constructor. Even if it exposes engine via an accessor, too little, too late.

3. Casts Dependencies And Assumptions in Stone

If the constructor is busy, it’s likely that it’s talking to global state/and or instantiating things. If it takes arguments, it may be mutating those or paring them or coupling itself to them. But, whatever it’s doing, you have no control over it as a client. So, if you want this object at all, you’re getting it with whatever the constructor decides to foist on you.

Contrast this with any non-constructor method. Any non-constructor method you have the option to call or not, so you have the control. If you want a Car that isn’t started, you can simply not call Car.Start(). However, if the author of Car decides to start it in the constructor, you’re hosed. Not having a started car is not an option for you, buddy. Likewise, if Car instantiates a new SixCylinder() engine and you want a more fuel economical four cylinder, you’re SOL.

4. Performance Problems

Let’s say that I have some class, Critical, that I can’t live without in the application. I need to instantiate a Critical and access some property on it in response to a user request. Let’s say that Critical has a busy constructor that chugs and hums and does all manner of things, taking 5 or 10 seconds at times. Not all of this stuff has anything to do with the property that I want to access in response to the user request.

What are my choices here? Well, I don’t have any. I can like it or lump it. I need new Critical() and the user therefore needs to wait 5 or 10 seconds even though his request should take that number of microseconds. Sure, I can cache my Critical for next time or I can implement a flyweight, but that first time, I’m still needlessly forcing a wait on the user.

So, What Should a Constructor Look Like?

Given all of these arguments against bloated constructors, it should be fairly obvious what I think a constructor ought to look like. It ought to contain very little code. If the class has no dependencies, then it ought to contain no code (or not exist). After all, what are you going to do in the constructor that you can’t do in a field initializer? You’re not reaching into global state somewhere, are you?! Tsk tsk.

If a constructor has parameters, it ought to do nothing but set local fields according to those parameters, and perhaps perform some sort of validation on the parameters to ensure that its invariants can be satisfied with these values. That’s really it. The constructor’s only responsibility is to put the object into a valid state as far as its invariants are concerned, while doing the minimum amount of configuration possible. After all, you want objects where clients can explicitly tell them what to do. Even simple things like hooking to events should be triggered by a client method and not the constructor, ideally. It may seem a bit more cumbersome, but it’s also expressive and clear. Constructors hide side effects where well-named methods promote them from side effects to deliberate effects.

So What About that Constructor We Looked At?

How about this:

public ActiveProduct(Product product, StringVersion driver, Side es)
{
    if (product == null)
        throw new Exception("Can't create an active product from a null product");

    Side = es;
    CurrentDriver = driver;
    this._Product = product;
}

Wow, isn’t that like a ray of sunshine on an otherwise cloudy day? No Initialize() method hiding untold horrors. No nested, contradictory exception handling logic. No logging, no singletons, no static state — simple, minimal and easy to reason about. So, what to be done with all of these things that formerly resided in this constructor?

Well, a number of them were simply initializing properties or backing stores. This can be done with class initializers for sensible defaults and should otherwise be left to clients. After all, we’re here to serve them — not force settings on them. The logging makes no sense — do that somewhere else. If you’re writing a class and you want to know how long the constructor takes (which is a huge warning flag, by the way), write that code where you instantiate it — don’t force this on every conceivable client of your class.

The event wireup can be moved into some kind of “Hookup()” method to allow clients specifically to request this. You could also make it more granular if you like, allowing them to hook up ala carte instead of all at once. As for the singeltons and static state, leave that up to the instantiating client if it wants that done. We might as well leverage the thing that makes static state a design nightmare — the fact that anyone can do anything from anywhere — to move the stink away from us with some “Not in my backyard” attitude. If everyone does this, eventually the global state will be bubbled to some common location where singletons can be slain and static classes converted to instances.

Are You a Lone Crackpot, Erik?

As it turns out, no. Here is what others have to say on the subject:

Microsoft agrees with this take, citing performance:

Do minimal work in the constructor. Constructors should not do much work other than to capture the constructor parameters. The cost of any other processing should be delayed until required.

Misko Hevery agrees, citing problems finding seams for testing:

When your constructor has to instantiate and initialize its collaborators, the result tends to be an inflexible and prematurely coupled design. Such constructors shut off the ability to inject test collaborators when testing.

John Wigger cautions against putting too much code in a constructor:

In dealing with a more complicated OO design, it can be a mistake to put too much initialization logic into constructors. This is especially important for an OO design that uses inheritance significantly.

Gilad Bracha even goes so far as to hypothesize that constructors are harmful:

Constructors come with a host of special rules and regulations: they cannot be overridden like instance methods; they need to call another constructor, or a superclass constructor etc. Try defining mixins in the presence of constructors – it’s tricky because the interface for creating instances gets bundled with the interface of the instances themselves.

By

Abstractions are Important 3 – “What?” Before “How?”

So, I’m back from my two weeks overseas, refreshed, enriched, and generally wiser, I suppose. We traveled to Spain and Portugal, visiting a ton of historic sites, eating good food and having fun. For my first post back, I’d like to make a third post in my series on abstractions.

Methods as Recipes

I was looking at some code yesterday. It was some long method, probably 60 or 70 lines long, and I sighed as scrolled disinterestedly through it. At the moment, I couldn’t muster the energy to try to figure out what the author thought it did or probably wanted it to do, so I started ruminating about what leads to methods like this. And, I think I understand it. It’s the idea of a method as a recipe.

By way of homage to Spain, let’s write a “CookPaella()” method. When writing methods, do you ever start by doing the following?

public void CookPaella()
{
    //1.  Get a medium bowl
    //2.  Mix together 2 tablespoons olive oil, tsp paprika, , tsp oregano, salt and pepper to taste
    //3.  Stir in 2 pounds chicken breasts cut into 2 inch pieces, to coat
    //4.  Refrigerate chicken.
    //5.  Heat 2 tablespoons olive oil in paella pan over medium heat
    //6.  Stir in 3 cloves garlic, 1 tsp red pepper flakes, and 2 cups rice
    //7.  Cook to coat rice with oil -- about 3 minutes
    //8.  Stir in a pinch of saffron, 1 bay leaf, 1/2 bunch parsley, 1 quart chicken stock and optional zest of two lemons
    //9.  Bring to a boil, then cover, reduce heat to medium low and simmer 20 minutes
    //10. Meanwhile, heat 2 tbsps olive oil in a separate skillet over medium heat and stir in marinated chicken with onions and cook for 5 minutes.
    //11. Stir in bell pepper and sausage and cook for 5 minutes.
    //12. Stir in shrimp, turning until both sides are pink
    //13. Spread rice mixture onto a serving tray, topping with meat and seafood mixture
}

(recipe compliments of All Recipes).

This is a sane approach. Much like making an outline for an essay in English class, you list out the basic procedure that you want to follow, and you fill in the details:

public void CookPaella()
{
    //1.  Get a medium bowl
    var myBowl = new Bowl("Medium");
    //2.  Mix together 2 tablespoons olive oil, tsp paprika, , tsp oregano, salt and pepper to taste
    myBowl.AddTablespoons(2, "olive oil");
    myBowl.AddTeaspoons(1, "paprika");
    myBowl.AddTeaspoons(1, "oregano");
    myBowl.AddPinch("salt");
    myBowl.AddPinch("pepper");

    //3.  Stir in 2 pounds chicken breasts cut into 2 inch pieces, to coat
    //4.  Refrigerate chicken.
    //5.  Heat 2 tablespoons olive oil in paella pan over medium heat
    //6.  Stir in 3 cloves garlic, 1 tsp red pepper flakes, and 2 cups rice
    //7.  Cook to coat rice with oil -- about 3 minutes
    //8.  Stir in a pinch of saffron, 1 bay leaf, 1/2 bunch parsley, 1 quart chicken stock and optional zest of two lemons
    //9.  Bring to a boil, then cover, reduce heat to medium low and simmer 20 minutes
    //10. Meanwhile, heat 2 tbsps olive oil in a separate skillet over medium heat and stir in marinated chicken with onions and cook for 5 minutes.
    //11. Stir in bell pepper and sausage and cook for 5 minutes.
    //12. Stir in shrimp, turning until both sides are pink
    //13. Spread rice mixture onto a serving try, topping with meat and seafood mixture
}

This is great because rather than starting without any kind of gameplan, we’ve stubbed everything out that needs to happen, and now we’re in the process of filling in the template. Whether you’re cooking or assembling a piece of furniture or anything else, there is a tendency to read through (or skip) to the end so that the actual following of the instructions reveals no mysteries. We take the same approach here.

Once this is complete, you’re going to have a large method, so some refactoring is probably in order. At the very least, our numbered bullets provide some logical methods to create:

public void CookPaella()
{
    //1.  Get a medium bowl
    var myBowl = new Bowl("Medium");
    //2.  Mix together 2 tablespoons olive oil, tsp paprika, , tsp oregano, salt and pepper to taste
    MixTogether(myBowl);

    //3.  Stir in 2 pounds chicken breasts cut into 2 inch pieces, to coat
    //4.  Refrigerate chicken.
    //5.  Heat 2 tablespoons olive oil in paella pan over medium heat
    //6.  Stir in 3 cloves garlic, 1 tsp red pepper flakes, and 2 cups rice
    //7.  Cook to coat rice with oil -- about 3 minutes
    //8.  Stir in a pinch of saffron, 1 bay leaf, 1/2 bunch parsley, 1 quart chicken stock and optional zest of two lemons
    //9.  Bring to a boil, then cover, reduce heat to medium low and simmer 20 minutes
    //10. Meanwhile, heat 2 tbsps olive oil in a separate skillet over medium heat and stir in marinated chicken with onions and cook for 5 minutes.
    //11. Stir in bell pepper and sausage and cook for 5 minutes.
    //12. Stir in shrimp, turning until both sides are pink
    //13. Spread rice mixture onto a serving try, topping with meat and seafood mixture
}

private static void MixTogether(Bowl myBowl)
{
    myBowl.AddTablespoons(2, "olive oil");
    myBowl.AddTeaspoons(1, "paprika");
    myBowl.AddTeaspoons(1, "oregano");
    myBowl.AddPinch("salt");
    myBowl.AddPinch("pepper");
}

There, look at that. We’re going to have this reduced to a nice 13 line method and, we could probably even group the calls further from there, resulting in a CookPaella() method that had three instructions: Prep(), CookRice(), CookMeat(). Those methods would consist of three or four lines themselves, and things would spread on out from there like a tree. This is a series of well factored methods that are probably clean and reasonable (discounting the fact that we’re instantiating what we need with Bowl, rather than having variables passed in — that’s for example purposes and not a weakness of the approach).

So where do these long methods come from? Well, I would argue that they come from people thinking in terms of recipe, but not creating or factoring to the outline. That is, they sit down to write the method and simply start banging out code line by line until they’re done. They think in terms of a recipe — a procedure — and methods are simply containers for procedures. So, when they start out, they don’t know what all a method is going to do; rather, the method twists and winds and meanders its way toward some eventual end in ad-hoc fashion.

In a less contrived case than this one, a method will probably start out as just a jump point for a series of instructions. The instructions are coded sequentially until there are no more instructions and then the method is at an end. The “how” is defined, and then the author looks at the “how” and decides what to name the method. He describes “how” and then, based on “how”, decides “what”. Ah, I see that I’ve assembled a series of instructions that seems to cook a paella, so “CookPaella” is probably a good thing to call this.

Methods as Abstractions

So, is there another way to do this? Absolutely. You can flip the script and decide “what” without worrying about “how”. With this approach, we completely discard the procedural/sequential concept and focus instead on creating meaningful abstractions. Procedural/sequential programming is good for, say, batch scripts, but object oriented programmers need to think in abstractions. I don’t want a specific, blow-by-blow recipe for cooking paella to become the ‘architecture’ of my code. I want to write code that a cook can use to get things done. That’s an important distinction.

Let’s think of our paella cooking a little differently. Let’s just think of it as cooking. In this fashion, we can define implements like pots and pans, ingredients like paprika and meat, and actions, like sear or boil. From there, we can start thinking about an object model. What is a “bowl” and what should it do? A bowl does things other than mix ingredients for paella — it has properties and, depending on the object model, may have some actions. Let’s decide what those properties and actions are without worrying about how they work.

For example, let’s define a bowl, along with concepts called “ingredient” and “quantity”:

public class Ingredient
{
    public Ingredient()
    {
            
    }
}

public class Quantity
{
    public Quantity()
    {
            
    }
}

public interface Bowl
{
    int Diameter { get; set; }

    int Height { get; set; }

    void Add(Ingredient ingredient, Quantity quantity);

    void MixIngredients();
}

Notice that we have a Bowl interface, rather than any kind of implementation. How do these methods work? Who cares. We don’t need to know that to make our paella. From here, we might define a paella pan interface as well, and perhaps various inheritors of ingredient and quantity, depending on their implementations. The point is, we’ll reason about each individual object and how it should behave. After we do enough of this, we can start to create larger constructs, such as some MeatCooker class that takes an arbitrary number of meats and a pan and cooks them. A few classes like this, and before you know it, your CookPaella() method will write itself.

Notice that the “start at the beginning and sequentially do everything” scripting style approach is gone, but so too is the structured, outline approach. With this abstraction-based approach, we’re defining objects with properties and behaviors that make sense in our domain model. This is what allows easy assembly. But, the important thing to understand is we’re defining abstractions rather than procedures. These are going to be easier to reason about, and they’re also going to be more flexible. We can use our meat cooker and pan and bowl to cook lasagnas as well as paellas.

So, the overriding message here is to think sequentially and abstractly when considering how to model a problem in an object oriented language. Don’t think about “how” until the very end. Instead, think about “what”. What objects should you have? What properties should they have? What actions? What interactions do you foresee for them. As you answer these questions, the specifics will become much easier. The longer you defer those specifics, the more flexible the design. You’re using an object oriented language, so leverage that language. Don’t code like you’re batch scripting. Don’t model your application by writing recipes even when you’re modeling, well, cooking a recipe.

By

I Love Debugger

Learning to Love Bugs

My apologies for the titular pun in reference to “I Love Big Brother” of iconic, Orwellian fame, but I couldn’t resist. The other day, I was chatting with some people about the idea of factoring large methods into smaller, more focused ones and one of the people chimed in with an objection that was genuinely new to me.

Specifically, the objection was that giant methods tended to be preferable because it kept the stack trace flat and made it easier to have everything “all in one place” when you were (inevitably) going through the code in the debugger. My first, fleeting thought was to wonder if people really found it that difficult to ctrl-tab between classes, but I quickly realized that this was hardly the important problem here (and really, to each his or her own). The bigger problem, as I explained a moment later, but have thought through in a bit more detail for a blog post now, is that you’re writing code more likely to generate defects so that when you’re tasked with fixing those defects, you feel more comfortable.

This is like a general housing contractor saying, “I prefer to use sand as a building material over wood or brick for houses I build because it’s much easier to work with in the morning after the tide destroys the house each night.”

Winston realized that two equals four and that the only way to prevent bugs is to cause them. Wilson happily declared, “I love Debugger!”.

More Bugs? Prove It!

So, if you’re a connoisseur of strict logic in debating, you’ll notice that I’ve begged the question here with my objection. That is, I ‘proved’ the reasoning fallacious by assuming that larger methods means more bugs and then used that ‘proof’ as evidence that larger methods should be avoided. Well, fear not. A group of researches from Standford did an empirical analysis of OS bugs, and found:

Figure 5 shows that as functions grow bigger, error rates increase for most checkers. For the Null checker, the largest quartile of functions had an average error rate almost twice as high as the smallest quartile, and for the Block checker the error rate wEis about six times higher for larger functions. Function size is often used as a measure of code complexity, so these results confirm our intuition that more complex code is more error-prone.

Some of our most memorable experiences examining error reports were in large, highly complex functions with contorted control flow. The higher error rate for large functions makes a case for decomposition into smaller, more understandable functions.

This finding is not unique, though it nicely captures the issue. During my time in graduate school in a class on advanced topics in software engineering, we did a unit on the relationship between various coding practices and likelihood of bugs. A consistent theme is that as function size grows, number of defects per line of code grows (in other words, the number of defects per function grows faster than the number of lines per function).

So, What Now?

In the end, my response is quite simply this: get used to a more factored and distributed paradigm. Don’t worry about being lost in files and stack traces in the debugger. Why not? Well, because if you follow Uncle Bob Martin’s advice about factoring methods to be 4 or 5 lines, you wind up with methods that descriptively tell you what they’re going to do and do it perfectly. In other words, you don’t need to step into them because they’re too simple and concise for things to go wrong.

In this fashion, your debugging becomes different. You don’t have a pen and paper, a spreadsheet, a stack trace window, and a row after row of “immediates” all to keep track of what on Earth is going on. You set a breakpoint somewhere, and any method calls are innocent until proven guilty. You step over everything until something fishy happens (or until you become a client of some lumbering beast of a method that someone else wrote, which is virtually assured of having defects). This approach is almost universally rejected at first but infectious with time. I know that, as a “no bigger than the screen” guy originally, my initial reaction to the idea of all methods being 4 or 5 lines was “that’s stupid”. But try it sometime and you won’t go back.

Bye, Bye Debugger!

If you combine small factored methods and unit tests (which tend to have a natural synergy), you will find that your debugger skills begin to atrophy. Rather than reasoning about the code at runtime, you reason about it at compile time. And, that’s a powerful and important concept.

Reasoning about code at run time is programming by coincidence, as made famous by one of my favorite programming books. I mean, think about it — if you need the debugger to understand what the state of the code is and what’s going on, what you’re really saying when you build and run is, “I have no idea what this code is going to do by inspecting it, so I need to run the entire application to understand it.” You don’t understand your own code while you’re writing it. That’s a problem!

When you write small, factored methods and generally tested and decoupled code, you don’t have this problem. Take this to its logical conclusion and imagine a method that takes two int parameters and returns an int representing their sum. Do you need to set breakpoints and watches, tag immediate variables and look at a stack trace to know what this method will do? Of course not! You can reason about this method at compile time and take for granted that it will do the right thing at run time. When you write code like this, you’re telling the application how to behave. When you find yourself immersed in the debugger for three quarters of your day, you’re not dictating how the application will behave. Instead, you’re begging it to work as a kind of prayer since it’s pretty much out of your hands what’s going to happen. Don’t buy it? How many times have you been at your desk with a deadline looming saying “please, just work — why won’t you just work!?!”

This isn’t to say that I never use the debugger. But, with a combination of TDD, a continuous testing tool, and small, factored methods, it’s fairly rare. Generally, I just use it when my stuff is integrated with code not done this way. For my own stuff, if I ever do use it, it’s from the entry point of a unit test and not the application.

The cleaner the code that I write, the more my debugger skills atrophy. I watch in amazement at peers that are incredible with the debugger — and I say that with no irony. Some of them can get it to do things I didn’t realize were possible and that I freely admit are very cool. I don’t know how to do these things because I’m out of practice. But, I consider that good. If you’re getting a lot of practice de-bug-ing your code, it means you’re getting a lot of practice writing code with bugs in it.

So, let’s keep those methods small and get out of the practice of generating bugs.

(By the way, I’m going to be traveling overseas for the next couple of weeks, so this may be my last post for a while).

By

Preserve Developer Mindshare – Don’t Nitpick

Mindshare


I’m not particularly interested in marketing principles in the commercial sense of the word (though I find the psychology of argumentation and persuasion to be fascinating), so please excuse any failed parallelism in advance. Today, I want to talk about the concept of mind share, but to apply it to the life of a work-a-day developer.

For those not familiar with the concept, mind share is the awareness that a consumer has about a particular product. For instance, if I say “smart phone”, the first things that pop into your head are probably “iPhone”, “Android”, “Blackberry”, perhaps in exactly that order. If that’s the case, iPhone has a larger mindshare from your perspective as a consumer than Blackberry or Android.

Another concept that comes into play is referred to in the linked wikipedia article as “evoked set”. This refers to the set of items that you’ll think of at all without some kind of researching or prompting. In our example above, you didn’t think of Windows Mobile, and now that you read the name, you probably think, “oh yeah, them.” If that’s the case, your evoked set is the first three, and Windows Mobile is out in the cold.

But let’s come back to this later.

A Modest Proposal

The other day, I happened to overhear the substance of a code review. The code was some relatively minor set of changes, and so the suggested fixes and changes were also relatively minor and unremarkable, but with an interesting exception because of its newness to me. The reviewer requested that the developer use the Visual Studio utilities “Sort Usings” and “Organize Usings”. For those not familiar with .NET, this is the Java equivalent of sorting your package imports or the C++ equivalent of sorting/organizing #includes. The only difference is that in C#/.NET, this is functionally useless from the compiled code perspective. That is, C# took a lesson from its counterparts and had its compiler take care of this housekeeping. From a developer’s point of view, this only potentially has ramifications in terms of additional intellisense overhead.

Still, this struck me on the surface as a good practice, albeit one I had never really considered. I suppose that unused using statements are a form of dead code, having intellisense perform better is a mild plus, and sorting them is probably… nice, I guess… for those who ever inspect the using statements. I am not one of those people — I never write them because of Ctrl-Period, and I never look at them. I used to remove them because of the CodeRush issues list for a file, but I tend to turn that off since it tends to unceremoniously remove the Linq extension methods and leave me with non-compiling code (fingers crossed for a fix in a future version).

Back to the story, the reviewer then went on to state that this would be required to ‘pass’ any future code reviews that he did. In spite of the apparent tiny benefit conferred by this practice, something about this proclamation seemed a little off and problematic to me. But, it slipped out of my mind in favor of more pressing matters until I was going through the process of promoting some code in a different scenario the other day, and suddenly, the unfocused nagging issue leaped into full view for me.

Anatomy of a Code Promotion

Generally speaking, a developer’s task is a simple one: implement features, fix any defects. So, if you’re given a task to implement, you implement it, take a moment to pat yourself on the back and move on. And, that’s what I did during my epiphany. Except, er no, wait.

I was using Rational Clear Case, so what I actually had to do was finish the change, and then check in my code. From there, to promote it, I had to open up Clear Case explorer, find my view, right click, and say “Deliver from Stream to Default”. From there, I had to launch Clear Case Project Explorer, find the integration stream, and click “Make a Baseline”. The policy is to name the baseline the default appended with an underscore and my login name. After that, I had to recommend the baseline. Ugh (and double Ugh for Clear Case as source control). Suddenly my life as a developer is not so simple. That’s no longer a one step process, but some number greater than one depending on our standards for granularity.

But wait, crap. I didn’t run all the unit tests to make sure the build wasn’t broken (actually, I tend to be fanatical about that, but I’m making a rhetorical point). I also didn’t run style cop to make sure I was conforming to the set of coding standard we have, nor did I run my other static analysis tools to check for simple mistakes. Alright, so time to do all that, and re-deliver.

But wait. Clear Case forces a rebase operation prior to code delivery (the equivalent of SVN update). And, it’s generally good practice to run all of your tests and analysis tools prior to and after a rebase to make sure that you know whether you are responsible for any broken tests, standards violations, etc or whether you inherited them. Man, this is getting intense.

So alright, promotion process is check your code for correctness, run all tests, run all static analysis. Then, rebase and do all that again. Then, follow that whole rigmarole about delivery and making baselines. My goodness — I haven’t even considered that I might have forgotten to add a file, so I should probably grab a clean copy of everything from source control and rebuild and, if anything breaks, re-deliver. And I haven’t even mentioned the possibility of handling merge conflicts.

Oh, and I now need to sort and organize my using statements. That seemed like a decent idea a few paragraphs ago, but now…

(I realize that there are optimizations that could be made to this particular process — different source control, continuous integration, etc. Point is, just about every process has some warts and, even if it doesn’t, managing concurrent changes and standards in a group environment requires more thought than we realize as we get accustomed to the process.)

Mindshare Revisited

In the face of all of this stuff, the mindshare metaphor begs consideration. If fixing our defects and implementing our features isn’t the iPhone, we’re in big trouble. From there, running unit tests and static analysis tools probably ought to be Android and Blackberry, but they may get pushed out a bit in favor of the particulars of wrangling the source control system and resolving merge conflicts, depending on the source control system and merge tool.

As we add more things, we have two options. We can either reduce the mind share of existing things in our evoked set, or we can spend time and energy expanding our evoked set. So, if we want to hold our efficiency of feature implementation constant, we’re going to have to leave some things out of our mindshare (and then perhaps be reminded of them at code reviews or with exasperated emails from team members with different evoked sets than ours, which we trade for exasperated emails of our own at things missing from their evoked sets). Alternatively, if we want to expand our mindshare, it’s going to come at the cost of a steep learning curve for all newer members and decreased efficiency across the board as we go through our rote checklist prior to each delivery.

Getting It Right

I don’t care for either of these options. So, I have two suggestions for people as the number of sticky notes and strings around our fingers grows in order to promote code.

  1. Don’t sweat the small stuff.
  2. Automate as much as possible.

In the case of “organize and sort usings”, I’d offer item (1). Something that provides no benefit to the end-product and questionable benefit to the development environment is something that ought not to occupy our mindshare as developers. But, in case I am just flat out wrong in my assessment of the benefit/detriment analysis, I’d offer option (2). Given that this is already implemented in Visual Studio, a small plugin running on the build machine could ensure that the using statements in all checked in code are always optimized, without adding to the maze of things developers have to remember.

And, to expand on this, I’d suggest that we in general move as many things into the (2) camp as possible, if we value them. Things like coding standards, static analysis, best practices, etc do matter, so why not force them with automatic, gated checkins or code transforms on the build machine. That ensures they’re always right, and without forcing up front memorization and, more importantly, without distraction from the most important problem — “implement features and fix defects”. The closer to 100% of our mindshare that iPhone occupies, the better for all project stakeholders.

By

Abstractions are Important Part 2 – Good Abstractions

Last time, I talked about the importance of abstractions in relation to a piece of service code that I had seen. This time, I’d like to expand on that concept a bit. I showed some examples of good versus bad abstractions and talked about why they were good or bad, and this time, I’d like to explore the idea of defining in general good versus bad abstractions.

What are Abstractions, Anyway?

If we’re going to talk about abstractions in general rather than simply by example, it probably makes sense to define the term a bit more formally. Wikipedia has a fairly good definition for it:

In computer science, abstraction is the process by which data and programs are defined with a representation similar in form to its meaning (semantics), while hiding away the implementation details.

In other words, an abstraction is a way to say to your clients, “give me the gist of what you want to do and let me worry about the details of how.” If you’re a client of the abstraction, you trust the provider to handle the details correctly.

An Example Abstraction

Abstraction, as a concept, is not limited to the problem domain of programming. Let’s consider an abstraction that has nothing to do with programming, on its face: the pizza shop. The pizza shop abstracts the process of making a pizza from its customers, allowing customers to specify basic properties of their desired pizza while the shop handles more granular ones.

If you call a pizza shop, you tell the shop that you want a large pizza with pepperoni on it. What you typically don’t do is specify how much pepperoni (aside from perhaps in vague terms like “extra” or “light”) nor do you specify the exact dimensions of “large”. These are some of the details that are simplified for you. Others are hidden entirely such as how much basil goes in the marinara sauce or the temperature at which the ovens are set for pizza cooking.

The procedure in general is a simple one. You specify a few rudimentary details about the pizza and whether you want delivery or not, and the shop responds with a time estimate and then, later, a pizza. However, this is an excellent abstraction (as one might surmise by the popularity and ubiquity of its implementation).

So, what makes an abstraction ‘good’? How do we make sure that the ones we’re creating are good?

Exposes Details That Make It Useful

Any abstraction has to expose some level of detail to clients or it would be useless. When calling the pizza place, you are aware, obviously, that you want a pizza. This is unavoidable. On top of that, the pizza place also allows you to specify size and a lot of the ingredients of the pizza. This ensures that you will get as much or as little food as you want and that dietary restrictions and considerations are met. In addition, the pizza place (usually) allows you to specify whether you want to eat there, carry the pizza home or have it delivered. This is another angle for the abstraction as it allows the pizza shop to accommodate your location preference for where you want to eat.

Making the abstraction useful is vital or else nobody would actually use it. In the world of pizza parlors, if one opened up that served only small, mushroom pizzas for carry out, it probably wouldn’t last very long. Even assuming there were no such thing as competition, people would probably opt to make their own pizzas most of the time rather than agree to such specific restrictions. No one would make much use of this abstraction.

Hides details that it needs to control

The flip side of exposing enough detail to make it useful is hiding details that need to be controlled. Imagine the opposite of our “you only get small, mushroom pizzas for carry out” situation, where a pizza parlor allowed specification of any detail, however minute. Customers could call and say that they wanted a pizza cooked in natural cave at 193 degrees Celcius, infused with rare spices from a remote island, and delivered at 4 AM.

The impact on a shop of catering (or attempting to cater) to this level of detail would be disastrous. Imagine the logistics of having to dispatch employees to whatever location customers demanded. Or, imagine the expense incurred in obtaining certain ingredients. And, imagine the absurdity of a pizza place staying open 24/7/365. These things would be the result of too much permissiveness with the abstraction. This abstraction hides no details from its users and, by relinquishing all control over operational details, it allows its users to put it into unprofitable, preposterous modes of operation.

Is Understandable and Intuitive to Clients

If usefulness and guarding against damaging levels of control by clients are table stakes for having an abstraction that can hope to survive, understand-ability and intuitiveness are necessary to thrive. One of the reasons the pizza place is so successful is that it’s a relatively universal, common-sense and simple abstraction. Whatever slight variations may exist, you will generally have no issues ordering pizza from a place even if you’ve never ordered from there before.

“Ask for food”, “add a few ingredients”, “specify where you want the food”, and “pay for food” are all very simple concepts. This simplicity is a big part of the reason that when you’ve checked into a hotel and are tired, you fall back to ordering a pizza instead of ordering, say, tapas or hibachi or going out and buying a bunch of groceries. “How do I get there?”, “Where can I cook this?”, “Do you have a way for me to take this home?”, etc are questions you don’t need to ask. This simplicity and universality makes the abstraction a wildly successful one.

Prevents (or limits) client mistakes

It’s crucial to limit mistakes that clients can force you to make, and it’s almost as crucial to prevent clients from making their own mistakes. The former might blow up your abstraction before it gets going, where the latter, like intuitiveness, is important for gaining adoption. One of the attractions of ordering a pizza is that it’s unlikely to end in disaster. Oh, it might not be cooked to perfection or it might generally be mediocre, but it won’t set your oven on fire or come bubble over into a gigantic mess during cooking.

The reason for this is that the pizza restaurant abstraction removes a lot of the potential problems of cooking a pizza by hiding the process from the customer and leaving it safely in the hands of a specialist. Nothing about specifying the size or the toppings of the pizza gives me the ability to make a decision that somehow causes the pizza to be overcooked or the meat toppings to be dangerously undercooked.

Back to the Code (and are you even making abstractions)?

So, what does all of this mean for coding? I would argue that since a pizza place is really just a process abstraction, we can translate these lessons directly to code. Exposing things to make the abstraction useful while hiding things that would cripple it is fairly straightforward to do, provided that you think in abstractions. I might have a database access abstraction that allows users to specify connection credentials but internally prevents things like multiple connections, dropped connections, etc. In this fashion, I can allow users to connect with different levels of privilege, but I can prevent them from inadvertently getting my class into some invalid state.

Likewise, I can create intuitive operations such as “create new record” or “delete record” that hide ugly details like SQL statements and transactions. This presents an intuitive and inviting interface that’s pleasant to use. And, in addition to providing a minimum guarantee of my abstraction’s own functionality, I can at least assist in saving them from their own lack of familiarity with the abstraction. Fail early goes a long way toward this — I can throw descriptive exceptions if they try to delete nonexistent records, rather than leaving them to decipher what SQL Error 9024B means. This is the equivalent of the pizza place operator saying, “I’m sorry sir, but ordering a negative six inch pizza makes no sense — we don’t offer that.” In real life, this “fail early” approach is much better than a delivery guy showing up empty handed and leaving it to you to figure out why no pizza arrived.

To pull back a bit, I think it’s important to consider the pizza shop or a similar metaphor when writing methods/classes/modules. Don’t simply write code that is technically functional, strewing it willy-nilly about various classes and locations. Don’t write code by coincidence while the debugger is running, setting flags and whatnot to get things working for your exact scenario. Don’t go with the philosophy “ship it if it works.”

Instead, when writing code, imagine that you’re creating a metaphorical pizza place. Who are your ‘customers’? Answer that question, and it becomes easy to answer “what do they want from me?” Answer this question well before “how do I get them what they want?” The “what” is your public API — the useful thing you’re going to provide. The “how” is the detail(s) that you hide from them, for their own good and the operational good of your code. The intuitiveness of your public API is going to be determined by answering questions like “am I logically book-ending operations — if I allow them to open something, do I allow them to close it?” or “if I read this method name and its parameters aloud, is it clear what this does” or “do all parts of my public API do what they say they’re going to do?” If you’re answering yes to these questions, your pizza shop is looking good. If not — if you’re sending sandwiches or sushi when the customers order a medium pepperoni pizza — your abstraction (code) is probably doomed.

And, I hate to bring unit testing into everything, but there’s really no avoiding this — if you write unit tests and especially if you practice TDD, you’re a lot more likely to have better abstractions. Why? Well, you’re your own first customer. This is like going “undercover boss” and ordering pizza from your own shop to see how the experience goes. When you write tests, you’re using your public API, and if you’re muttering things like “what are all these parameters” or “why do I have to pass that in?!?” you’re getting early feedback on your own abstraction. I’ve rarely seen an abstraction that inspired me to react with a “wat?!?” and gone on to find a nice set of unit tests covering it. Tests seem to function as insurance against boneheaded abstractions.

A Checklist For Abstractions

I’ll close out with a set of suggestions — a checklist for evaluating whether you’re creating good, usable abstractions or not.

  1. Does your API operate at a consistent level of abstraction — do you avoid having some methods that require users to pass you SQL statements and others that encapsulate this detail, for example?
  2. Do your methods generally have two or fewer parameters (more parameters making it increasingly hard on users to intuitively understand the method)?
  3. Do your methods have succinct but communicative names like “CreateEntry(Entry entryToCreate)” as opposed to needlessly verbose (“CreateRecordThatIsGoingToGoInTheDatabase(Entry entry)”) names that are hard to type and remember or weirdly succinct names (“CE(Entry e)”)?
  4. Do your methods lie? Does “CreateEntry(Entry entryToCreate)” actually delete an entry, or perhaps less egregiously, create an entry sometimes, unless the entry has a certain flag set true, in which case it quietly fails?
  5. Do you avoid forcing weird details on your clients, such as asking them to store boolean flags?
  6. Do you avoid multiple return values (i.e. out/ref parameters?)
  7. Do you somehow communicate what exceptions your methods might throw?
  8. Do you limit the number of methods per class so that reading through the documentation or IDE assistance is not painful?
  9. Do you avoid forcing your clients to violate the Law of Demeter? That is, in order to get a “D”, do you force your clients to call getA().getB().getC().getD(); ?
  10. Do you practice command query separation in your public API?
  11. Do you limit or eliminate exposing public state, and especially flags?
  12. Do you limit temporal couplings that force your clients to call your methods in a specific order?
  13. Do you avoid deep inheritance hierarchies that make it unclear where the members of the public API actually come from?
  14. Do your publicly exposed classes have a single, obvious responsibility — do you avoid exposing swiss-army-knife classes with a mish-mash of different functionalities?