DaedTech

Stories about Software

By

The Code that Cried Wolf: Proving that Comments are Unneeded

I spent a lot of years diligently commenting my code. I wrote what the code did next to the statements themselves and I wrote method headers and sometimes I even kept a journal of edits in that header as if there were no such thing as source control. As I mentioned in an earlier post, I’ve since drifted further and further from this approach to the point of almost never commenting my code if left to my own devices. The reason for this gradual but near-complete reversal is that I’ve come to view comments as a crutch that prevents code from standing on its own from a readability perspective.

For instance, consider the following class (and before we go any further, I would just like to mention that the relationship between the “Prepper” and “Employee” is not actually how I would design this — this is just an example for demonstrating a point):

public class EmployeePrepper
{
    private readonly IEnumerable _employees;

    public EmployeePrepper(IEnumerable employees)
    {
        _employees = employees;
    }

    public void SetupEmployees()
    {
        foreach (var employee in _employees)
        {
            if (string.IsNullOrEmpty(employee.FirstName))
                throw new Exception();
            if (string.IsNullOrEmpty(employee.LastName))
                throw new Exception();

            if (employee.LastName.Length < 3)
                employee.LoginName = employee.FirstName + employee.LastName;
            else
                employee.LoginName = employee.FirstName[0] + employee.LastName;
        }
    }
}

Imagine that you've asked for a code review and someone says "well, it's a little hard to read and follow -- you've got some magic numbers in there and it's not all that readable." What suggestion is coming next? Or, if none is forthcoming, what do you do to address this? Something like the next code snippet?

/// 
/// This class takes a bunch of employees that the user entered and sets up derived properties on them
/// 
public class EmployeePrepper
{
    /// 
    /// This is a collection of employees that the constructor will pass in
    /// 
    private readonly IEnumerable _employees;

    /// 
    /// This constructor takes the class's dependency and initializes it
    /// 
    /// The employees this class will work on
    public EmployeePrepper(IEnumerable employees)
    {
        _employees = employees;
    }

    /// 
    /// Sets up the employees by setting derived properties
    /// 
    public void SetupEmployees()
    {
        //Cycle through the employees
        foreach (var employee in _employees)
        {
            //Make sure a valid first name and last name are entered
            if (string.IsNullOrEmpty(employee.FirstName))
                throw new Exception();
            if (string.IsNullOrEmpty(employee.LastName))
                throw new Exception();

            //If the employee has a very short last name, then use first concatenated to last
            if (employee.LastName.Length < 3)
                employee.LoginName = employee.FirstName + employee.LastName;
            else //Otherwise, use first initial concatenated to last name
                employee.LoginName = employee.FirstName[0] + employee.LastName;
        }
    }
}

Ah, much better, right? From the outside looking in, with IntelliSense, I learn that (1) the class takes a collection of employees and sets their derived properties, (2) the constructor takes a dependency and initializes itself, and (3) the setup employee method sets up the employees with their derived properties. Of course, I could learn that by (1) inspecting the constructor, (2) common sense, and (3) see item (1).

Okay, so maybe I don't learn all that much of interest from IntelliSense, but there's a lot of clarity in the method itself. I learn that (1) I'm going to cycle through records, (2) I'm going to check properties for validity and (3) a descriptive business rule for setting up login names. Of course, I could learn (1) by knowing what foreach() is and (2) by basic inspection. (3) is interesting in that it's the first comment that isn't just pointless filler. It actually explains something that I need to know.

So 1 out of 6 comments is helpful, but isn't that still a net gain? Well, it is if we assume that there is no downside to having commented code. But there is a downside. Specifically, the addition of comments to the file means extra artifact that must be maintained or else the comments drift from noise to lies. For example, what if we add a constructor parameter and don't pay attention to the doc comments (which pretty much no one does when editing classes)? Suddenly that comment about taking a single employee dependency is nonsensical. So you have 5 liabilities and 1 asset in terms of comments here, which seems like a poor tradeoff (although I'm not going to attempt to place an actual measurable value on the concept of misleading comments versus helpful, explanatory comments -- even if I could, it wouldn't matter much since it's pretty unlikely I'll read your comments anyway :) ).

Okay, I've sold you on my premise, but what do you say to the architects/leads/coworkers/project managers/etc who aren't buying what I'm selling? How do you convince them that this activity is somewhere between pointless and detrimental? Well, prove to them that the comments are pointless. Take away the crutch and make your class names, method names, variable names, and logic flow so clean and so self documenting that the comments look preposterous. Take the original code and turn it into this:

public class EmployeeDerivedPropertySetter
{
    private const int LastNameMinimumLength = 3;

    public class FirstNameEmptyException : Exception { }

    public class LastNameEmptyException : Exception { }

    private readonly IEnumerable _employeesPassedInFromConstructor;

    public EmployeeDerivedPropertySetter(IEnumerable employees)
    {
        _employeesPassedInFromConstructor = employees;
    }

    public void SetupDerivedProperties()
    {
        foreach (var employee in _employeesPassedInFromConstructor)
            SetupDerivedProperties(employee);
    }

    private static void SetupDerivedProperties(Employee employee)
    {
        ValidateFirstAndLastName(employee);
        SetLoginName(employee);
    }

    private static void ValidateFirstAndLastName(Employee employee)
    {
        if (string.IsNullOrEmpty(employee.FirstName))
            throw new FirstNameEmptyException();
        if (string.IsNullOrEmpty(employee.LastName))
            throw new LastNameEmptyException();
    }

    private static void SetLoginName(Employee employee)
    {
        if (IsEmployeeLastNameVeryShort(employee))
            employee.LoginName = ConcatenateFirstAndLastNames(employee);
        else
            employee.LoginName = ConcatenateFirstInitialAndLast(employee);
    }

    private static bool IsEmployeeLastNameVeryShort(Employee employee)
    {
        return employee.LastName.Length < LastNameMinimumLength;
    }

    private static string ConcatenateFirstAndLastNames(Employee employee)
    {
        return employee.FirstName + employee.LastName;
    }

    private static string ConcatenateFirstInitialAndLast(Employee employee)
    {
        return employee.FirstName.ElementAt(0) + employee.LastName;
    }
}

Now, small factored methods might not be your thing, particularly if you love debugger (or just love big methods and classes for some reason). But I think you'd be hard pressed to argue that this isn't quite readable and self explanatory. There's no method here that you'd look at, then squint at, then rub your head and say, "what the...?" You have things like "If the last name is very short, set login name to first concatenated to last." This class reads like an instruction manual or... kinda like the version with a bunch of comments. Except unlike that version, your code is actually the comments, so it's impossible for it to become obsolete and misleading.

But I'm not trying to tell you how to code -- I'm trying to tell you how to sell this. I'm trying to tell you how to prove that comments are unnecessary. "How," you ask? Well, put the comments back in and see how completely obtuse they now look:

public class EmployeeDerivedPropertySetter
{
    /// 
    /// The minimum length of the last name
    /// 
    private const int LastNameMinimumLength = 3;

    /// 
    /// Thrown when the first name is empty
    /// 
    public class FirstNameEmptyException : Exception { }

    /// 
    /// Thrown when the last name is empty
    /// 
    public class LastNameEmptyException : Exception { }
    /// 
    /// This is a collection of employees that the constructor will pass in
    /// 
    private readonly IEnumerable _employeesPassedInFromConstructor;

    /// 
    /// This constructor takes the class's dependency and initializes it
    /// 
    /// The employees this class will work on
    public EmployeeDerivedPropertySetter(IEnumerable employees)
    {
        //Set the employee variable to the passed in value
        _employeesPassedInFromConstructor = employees;
    }

    /// 
    /// Sets up the employees by setting derived properties
    /// 
    public void SetupDerivedProperties()
    {
        //Cycle through the employees
        foreach (var employee in _employeesPassedInFromConstructor)
            SetupDerivedProperties(employee);
    }

    /// 
    /// Setup the derived properties
    /// 
    /// Employee to set them up for
    private static void SetupDerivedProperties(Employee employee)
    {
        //Make sure a valid first name and last name are entered
        ValidateFirstAndLastName(employee);
        //Set the employee's login name
        SetLoginName(employee);
    }

    /// 
    /// Validate the first and last name of the employee
    /// 
    /// The employee to validate
    private static void ValidateFirstAndLastName(Employee employee)
    {
        //If first name is null or empty throw an exception
        if (string.IsNullOrEmpty(employee.FirstName))
            throw new FirstNameEmptyException();
        //If last name is null or empty, throw an exception
        if (string.IsNullOrEmpty(employee.LastName))
            throw new LastNameEmptyException();
    }

    /// 
    /// Set the employee's login name
    /// 
    /// Employee whose login name to set
    private static void SetLoginName(Employee employee)
    {
        //If the employee has a very short last name, then user first concatenated to last
        if (IsEmployeeLastNameVeryShort(employee))
            employee.LoginName = ConcatenateFirstAndLastNames(employee);
        else //Otherwise, use first initial concatenated to last name
            employee.LoginName = ConcatenateFirstInitialAndLast(employee);
    }

    /// 
    /// Returns true if the employee last name is very short
    /// 
    /// Employee to check for a short last name
    /// True if the last name is very short, false if not
    private static bool IsEmployeeLastNameVeryShort(Employee employee)
    {
        //Returns whether or not the last name's length is shorter than the minimum length of the last name
        return employee.LastName.Length < LastNameMinimumLength;
    }

    /// 
    /// Concatenate an employee's first and last name
    /// 
    /// Employee whose first and last name to calculate
    /// The concatenated first and last name
    private static string ConcatenateFirstAndLastNames(Employee employee)
    {
        //Concatenate the first and last names of the employee
        return employee.FirstName + employee.LastName;
    }

    /// 
    /// Concatenates the first initial and last name of the employee
    /// 
    /// Employee whose first initial and last name to concatenate
    /// The concatenated first initial and last name
    private static string ConcatenateFirstInitialAndLast(Employee employee)
    {
        //Concatenate the first initial and last name of the employee
        return employee.FirstName.ElementAt(0) + employee.LastName;
    }
}

I think you'd be hard pressed to find someone who thought that this level of commenting on this self-describing code made any sense. That's the beauty of this demonstration -- more even than just the self documenting code, adding comments next to it demonstrate that there's really nothing left to say. And hey, if this gambit backfires and you find someone who thinks this is a good idea, you've learned a valuable piece of information about someone to bear in mind when considering which projects to work on and where to work in the future. In all seriousness, though, I think this will do a fairly good job of demonstrating that you can express the vast majority of what you would express in comments with the code itself.

There is another very practical benefit to proving this and the resulting Spartan approach to comments beyond the fact that the comments and code tend not to get out of sync; when comments are a rarity, you're more likely to read them. Think of heavily commented code as "The Code that Cried Wolf." If you see a lot of code that looks like the code in this snippet immediately above, don't you start to just kind of tune out the comments? They appear in some other color than the main code, and rather than information, they become comforting white noise, separating your methods and fields from one another the way a little bit of margin/padding separates a bunch of text boxes on the form from one another. You don't study the buffering space -- you just take it for granted. This is how it works with over-commented code bases. If you find yourself in code bases like this, do an exercise and ask yourself when the last time you paid attention to a comment was. Yesterday? Last week? Last month? Now ask yourself how many times you've looked at the code since then? A hundred? A thousand? Ten thousand? That's a pretty low percentage of time that you pay attention to comments.

But imagine a code base with almost no comments. You're flipping between classes and looking at methods, and there's no extras -- just code. Now, you open some class and there's a big, fat, 10 line comment. Betcha notice it immediately and, what's more, betcha read it. And, I'm also willing to wager it's worth reading. If a code base has an absence of noise comments, the developer that wrote this one probably has something important to say: why he chose to do it this way even though it seems obtuse or where she found a template/inspiration for this code via google and why she chose to use it.

Method, parameter, class, and variable names can tell you a lot about what the code elements are. Source control checkin comments can tell you a lot about why various changes were made. Unit tests/BDD Tests/Integration tests can tell you a lot about business rules and requirements. None of these things require adding redundancy to the code itself. Comments in your code should be the exception rather than the rule. And you can prove it.

By

Because I Said So Costs You Respect

Do you remember being a kid, old enough to think somewhat deductively and logically, but not old enough really to understand how the world works? Do you remember putting together some kind of eloquent argument about why you should be able to sleep at a friend’s house or stay out later than normal, perfecting your reasoning, presentation and polish only to be rebuffed? Do you remember then having a long back and forth, asking why to everything, only to be smacked in the face with the ultimate in trump cards: “because I’m your parent and I say so?” Yeah… me too. Sucked, didn’t it?

There are few things in life more frustrating than pouring a good amount of effort into something only to have it shot down without any kind of satisfying explanation of the rationale. For children this tends to be unfortunate for self interested reasons: “I want to do X in the immediate future and I can’t.”  But as you get older and are motivated by more complex and nuanced concerns, these rejections get more befuddling and harder to understand.  A child making a case for why he should own a Red Rider BB Gun will understand on some level the parental objection “it’s dangerous” and that the parental objection arises out of concern for his welfare. So when the “enough — I’m your parent and I say so” comes up it has the backing of some kind of implied reasoning and concern.  An adult making a case that adopting accounting software instead of doing things by hand would be a money saving investment will have a harder time understanding  an answer of “no, we aren’t going to do that because I’m your boss and I say so.”

This is hard for adults to understand because we are sophisticated and interpersonally developed enough to understand when our goals align with those of others or of organizations. In other words, we reasonably expect an answer of “no” to the question “can I go on vacation for 8 months and get paid?” because this personal goal is completely out of whack with others and with an organization’s. So even if the explanation or reasoning aren’t satisfying, we get it. The “because I said so” is actually unnecessary since the answer of “that makes no sense for anyone but you” is perfectly reasonable. But when goals align and it is the means, rather than the ends, that differ, we start to have more difficulty accepting when people pull rank.

I remember some time back being asked to participate in generating extra documentation for an already documentation-heavy process. The developers on the project were taking requirements documents as drawn up by analysts and creating “requirements analysis” documents, and the proposal was to add more flavors of requirements documents to this. So instead of the analysts and developers each creating their own single word document filled with paragraph narratives of requirements, they were now being asked to collaborate on several each with each subsequent version adding more detail to the document. So as a developer, I might write my first requirements document in very vague prose, to be followed by a series of additional documents, each more detailed than the last.

I objected to this proposal (and really even to the original proposal). What I wanted to do was capture the requirements as data items in a database rather than a series of documents filled with prose. And I didn’t like the idea of having several documents with each one being a “more fleshed out” version of the last document — there’s a solution for that and it’s called “version control”. But when I raised these objections to the decision makers on the project, I was rebuffed. If you’re curious as to what the rationale for favoring the approach I’ve described over the one I suggest, I am as well, even to this day. You see, I never received any explanation other than a vague “well, it might not be the greatest, but we’re just going to go with it.” This explanation neither explained the benefit of the proposed approach nor any downside to my approach. Instead, I was a kid again, hearing “because I’m your parent and I say so.” But I wasn’t a little kid asking to stay out late — I was an adult with a different idea for how to achieve the same productivity and effectiveness goals as everyone else.

In the end, I generated the documents I was required to generate. It wasn’t the end of the world, though I never did see it as anything other than a waste of time. As people collaborating toward a larger goal, it will often be the case that we have to do things we don’t like, agree with or approve of, and it might even be the case that we’re offered no explanation at all for these things. That is to be expected in adult life, but I would argue that it should be minimized because it chips away at morale and satisfaction with one’s situation.

Once, twice, or every now and then, most people will let a “just do it because I say so” roll off their back. Some people might let many more than that slip by, while others might get upset earlier in the process. But pretty much everyone is going to have a limit with this kind of thing. Pretty much everyone will have a threshold of “because I say so” responses beyond which they will check out, tune out, leave, blow up, or generally lose it with that in some form. So I strongly recommend avoiding the tempting practice to “pull rank” and justify decisions with “because it’s my project” or “because I’ve decided that.” It’s sometimes nice not to have to justify your decisions — especially to someone with less experience than you and when you’re in a hurry — but the practice of defending your rationale keeps you sharp and on your toes, and it earns the respect of others. “Because I say so” is a well that you can only go to so many times before it dries up and leaves a desert of respect and loyalty. Don’t treat your coworkers like children when they want to know “why” and when they have legitimate questions — they deserve better than “because I’m in charge and I say so.”

By

TDD For Breaking Problems Apart

If I have to write some code that’s going to do something rather complex, it’s easy for my mind to start swimming with possibilities, as mentioned in the magic boxes post. In that post, I talked about thinking in abstractions and breaking problems apart and alluded to TDD being a natural fit for this paradigm without going into much detail. Today and in my next post about this I’m going to go into that detail.

Let’s say I’m tasked with coding some class or classes that keep track of bowling scores. I don’t think “okay, what is my score at the start of a bowling game?” I think things like “I’ll need to have some iteration logic that remembers back at least two frames, since that’s how far back we can go, and I’ll probably need something to look ahead to handle the 10th frame, and maybe there’s a design pattern for that, and…” Like I said, swimming. These are the insane ramblings of an over-caffeinated maniac more than they’re a calm and methodical solution to a somewhat, but not terribly complex problem. In a way, this is like premature optimization. I’m solving problems that I anticipate having rather than problems that I actually have right at this moment.

TDD is thus like a deep breath, a relaxing cup of some hot beverage, and perhaps a bit of meditation or zoning out. All of this noise fades out of my head and I can actually start figuring out how things work. The reason for this is that the essence of TDD is breaking the task into small, manageable problems and solving those while simultaneously ensuring that they stay solved. It’s like a check-list. Forget all of this crap about iterators and design patterns — let’s establish a base case: “Score at the beginning of a game is zero.” That’s pretty easy to solve and pretty hard to get frantic over.

With TDD, the first that I’m going to do is write a test and I’m going to write only enough of a test to fail (which includes writing something that doesn’t compile. So, let’s do that:

[TestClass]
public class Constructor
{
    [TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
    public void Initializes_Score_To_Zero()
    {
        var scoreCalculator = new BowlingScoreCalculator();
    }
}

Alright, since there is no such type as “BowlingScoreCalculator”, this test fails by virtue of non-compiling. I then define the type and the test goes green (I’m using NCrunch, so I’m never actually building or running anything — I just see dots change colors). Next, I want to assert that the score property is equal to zero after the constructor executes:

[TestClass]
public class BowlingTest
{
    [TestClass]
    public class Constructor
    {
        [TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
        public void Initializes_Score_To_Zero()
        {
            var scoreCalculator = new BowlingScoreCalculator();

            Assert.AreEqual(0, scoreCalculator.Score);
        }
    }
}

public class BowlingScoreCalculator
{
    public BowlingScoreCalculator()
    {

    }
}

This also fails, since that property doesn’t exist, and I solve this problem by declaring an auto-implemented property, which actually makes the whole test pass. And just like that, I’ve notched my first passing test and step toward a functional bowling calculator. The game starts with a score of zero.

What’s next? Well, I suppose the simplest thing that we should say is that if I bowl a frame with a score of zero for the first throw and one on the second throw, my score is now 1. Okay, so what is a frame, and what kind of data structure should I use to represent it, and what would be the ideal API for that, and is score really best represented as a property, and — BZZT!! Stop it. We’re solving small, simple problems one at a time. And the only problem I have at the moment is “lack of failing test”. I’m going to quickly decide that the mechanism for bowling a frame is going to be BowlFrame(int, int) so that I know the name of my next sub-test class for naming purposes. Writing code until something fails, I get:

[TestClass]
public class BowlingTest
{
    [TestClass]
    public class Constructor
    {
        [TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
        public void Initializes_Score_To_Zero()
        {
            var scoreCalculator = new BowlingScoreCalculator();

            Assert.AreEqual(0, scoreCalculator.Score);
        }
    }

    [TestClass]
    public class BowlFrame
    {
        [TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
        public void With_Throws_0_And_1_Results_In_Score_1()
        {
            var scoreCalculator = new BowlingScoreCalculator();
            scoreCalculator.BowlFrame(0, 1);
        }
    }
}

public class BowlingScoreCalculator
{
    public int Score { get; set; }

    public BowlingScoreCalculator()
    {    

    }
}

This fails because there is no “bowl frame” method, so I add it. CodeRush’s “declare method” refactoring does the trick nicely, populating the new method with a thrown NotImplementedException, which gives me a red test instead of a non-compile. I have to make the test pass before moving on, so I delete it by deleting the throw, and then I add an assert that score is equal to 1. This fails, and I make it pass:

[TestClass]
public class Constructor
{
    [TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
    public void Initializes_Score_To_Zero()
    {
        var scoreCalculator = new BowlingScoreCalculator();

        Assert.AreEqual(0, scoreCalculator.Score);
    }
}

[TestClass]
public class BowlFrame
{
    [TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
    public void With_Throws_0_And_1_Results_In_Score_1()
    {
        var scoreCalculator = new BowlingScoreCalculator();
        scoreCalculator.BowlFrame(0, 1);

        Assert.AreEqual(1, scoreCalculator.Score);
    }
}

public class BowlingScoreCalculator
{
    public int Score { get; set; }

    public BowlingScoreCalculator()
    {

    }

    public void BowlFrame(int throw1, int throw2)
    {
        Score = 1;
    }
}

Now, with a green test, I have the option to solve problems in existing code (i.e. “refactor”). For instance, I don’t like that I have a useless, empty constructor, so I delete it and note that my tests still pass. This is another easy problem that I can solve with confidence.

From here, I think I’d like a non-obtuse implementation of scoring that first frame. I think I’ll test that if I bowl a 2 and then a 3, the score is equal to five. For the first time, I’m not going to have any non-compiling failings, so I write some code and see red, obviously, which I get rid of by making the whole thing look like this:

[TestClass]
public class BowlingTest
{
    [TestClass]
    public class Constructor
    {
        [TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
        public void Initializes_Score_To_Zero()
        {
            var scoreCalculator = new BowlingScoreCalculator();

            Assert.AreEqual(0, scoreCalculator.Score);
        }
    }

    [TestClass]
    public class BowlFrame
    {
        [TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
        public void With_Throws_0_And_1_Results_In_Score_1()
        {
            var scoreCalculator = new BowlingScoreCalculator();
            scoreCalculator.BowlFrame(0, 1);

            Assert.AreEqual(1, scoreCalculator.Score);
        }

        [TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
        public void With_Throws_2_And_3_Results_In_Score_5()
        {
            var scoreCalculator = new BowlingScoreCalculator();
            scoreCalculator.BowlFrame(2, 3);

            Assert.AreEqual(5, scoreCalculator.Score);
        }
    }
}

public class BowlingScoreCalculator
{
    public int Score { get; set; }

    public void BowlFrame(int throw1, int throw2)
    {
        Score = throw1 + throw2;
    }
}

All right, the calculator is now doing a good job of handling a low scoring first frame. So, let’s correct a little thing or two with the refactor cycle, making sure that whatever we do is a problem that’s easily solvable, isolated, and small in scope. I’m thinking I don’t like the magic numbers 2, 3 and 5 in that last test. Let’s try making it look like this:

[TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
public void With_Throws_2_And_3_Results_In_Score_5()
{
    var scoreCalculator = new BowlingScoreCalculator();
    const int firstThrow = 2;
    const int secondThrow = 3;
    scoreCalculator.BowlFrame(firstThrow, secondThrow);

    Assert.AreEqual(firstThrow + secondThrow, scoreCalculator.Score);
}

Yes, it’s perfectly acceptable to refactor your unit tests during the “refactor” phase. Treat these guys as first class code and keep them clean or nobody will want them around. I have another bone to pick with this code, which is the duplication of the constructor logic in each test. Let’s fix that too:

[TestClass]
public class BowlFrame
{
    private static BowlingScoreCalculator Target { get; set; }

    [TestInitialize()]
    public static void BeforeEachTest()
    {
        Target = new BowlingScoreCalculator();
    }

    [TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
    public void With_Throws_0_And_1_Results_In_Score_1()
    {
        Target.BowlFrame(0, 1);

        Assert.AreEqual(1, Target.Score);
    }

    [TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
    public void With_Throws_2_And_3_Results_In_Score_5()
    {
        const int firstThrow = 2;
        const int secondThrow = 3;
        Target.BowlFrame(firstThrow, secondThrow);

        Assert.AreEqual(firstThrow + secondThrow, Target.Score);
    }
}

There, that looks better to me now. Now, what’s wrong with the production code? What problem can we solve? I can think of a few things: we can bowl more than 10 per frame, we can bowl negative numbers, and this thing will only keep track of the most recent frame. There are many other issues as well, but we’re already trending toward overload here, so let’s get back on Easy Street and figure out what to do if the user gives us goofy input.

I write a test that fails:

[TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
public void Throws_Exception_On_Negative_Argument()
{
    ExtendedAssert.Throws(() => Target.BowlFrame(-1, 0));
}

And then make it pass (first by declaring the nested exception and then by throwing it when the first frame is negative). I also take a shortcut in terms of obtuseness of my TDD here and make it throw the exception if either parameter is negative. I consider this acceptable, personally, since I believe that within the red-green-refactor discipline it’s a matter of some discretion how simple “the simplest thing” is. I’m not going to be accepting negative parameters for one and not the other and I know it. At any rate, after getting this to pass, I now refactor my method parameter names to be more descriptive and set the Score setter to private (which I should have done at first anyway) and the production class looks like this:

public class BowlingScoreCalculator
{
    public class FrameUnderflowException : Exception { }

    public int Score { get; private set; }

    public void BowlFrame(int firstThrow, int secondThrow)
    {
        if (firstThrow < 0 || secondThrow < 0)
            throw new FrameUnderflowException();
        Score = firstThrow + secondThrow;
    }
}

Now I’m going to tackle the similar problem of allowing too much scoring for a frame. I want to solve the problem that scores of greater than 10 are currently allowed and I’m going to do this with the test case for 11. TDD is not a smoke testing approach nor is it an exhaustive approach, so I’m not going to write tests for 11, 12, 13… 200 or anything like that. As such, I try to hit edge cases whenever possible and that’s what I’ll do here:

[TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
public void Throws_Exception_On_Score_Of_11()
{
    ExtendedAssert.Throws(() => Target.BowlFrame(10, 1));
}

I make this pass, and then I go back and then decide I’m not a huge fan of that “10” in both my test and in the production class. Zero I can let slide because of it’s sort of universal value in scoring of competitions, but 10 deserves a descriptive name. I’m going to call it “Mark” since that’s what a frame of 10 is known as in bowling.

Now that we’ve sufficiently guarded against bad inputs, I’d say it’s time to start thinking about aggregation. The easiest way to do that is to have a frame of 1 and then another frame of one, and make sure that we have a score of 2:

[TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
public void Sets_Score_To_2_After_2_Frames_With_Score_Of_1_Each()
{
    Target.BowlFrame(1, 0);
    Target.BowlFrame(1, 0);

    Assert.AreEqual(2, Target.Score);
}

How cool is it that I fix this by changing “Score = firstThrow + secondThrow;” to “Score += firstThrow + secondThrow;” Talk about the simplest solution possible!

Now, I don’t like the way that test is written with 5 int literals in there. This is telling me that it might be time to rethink the way I’m handling frame. I don’t really want to think too much about it now, but I know that there’s going to be a hard-fail when I get to the 10th frame, which can have three throws so I’m already not married to this implementation. And now it’s already starting to feel awkward.

So what if I created a simple frame object? How would that look? Well, I’ll cover that next time as I flesh out this exercise. I’m hoping I’ve at least sold you somewhat on the notion that TDD forces you to chunk problems into manageable pieces and solve them that way without getting carried away. Next time I’ll circle back a bit to the idea of “magic boxes” as we decide how to divide up the concept of “frame” and “game” while adhering to the practice of TDD.

By

You Aren’t God, So Don’t Talk Like Him

Queuing Up Some Rage

Imagine that you’re talking to an acquaintance, and you mention that blue is your favorite color. The response comes back:

Acquaintance: You’re completely wrong. Red is the best color.

How does the rest of this conversation go? If you’re sane…

Acquaintance: That’s wrong. Red is the best color.
You: Uh, okay…

This, in fact, is really the only response that avoids a terminally stupid and probably increasingly testy exchange. The only problem with this is that the sane approach is also perceived as something between admission of defeat and appeasement. You not fighting back might be perceived as weakness. So, what do you do? Do you launch back with this?

Acquaintance: That’s wrong. Red is the best color.
You: No, it is you who is wrong. You’d have to be an idiot to like red!

If so, how do you think this is going to go?

Acquaintance: That’s wrong. Red is the best color.
You: No, it is you who is wrong. You’d have to be an idiot to like red!
Acquaintance: Ah, well played. I’ve changed my mind.

Yeah, I don’t think that’s how it will go either. Probably, it will turn out more like this:

Acquaintance: That’s wrong. Red is the best color.
You: No, it is you who is wrong. You’d have to be an idiot to like red!
Acquaintance: You’re the idiot, you stupid blue fanboy!
You: Well, at least my favorite color isn’t the favorite color of serial killers and Satan!
Acquaintance: Go shove an ice-pick up your nose. I hope you die!

Well, okay, maybe it will take a little longer to get to that point, perhaps with some pseudo-intellectual comparisons to Hitler and subtle ad hominems greasing the skids of escalation. If you really want to see this progression in the wild, check the comments section of any tech article about an Apple product. But the point is that it won’t end well.

Looking back, what is the actual root cause of the contention? The fact that you like blue and your acquaintance likes red? That doesn’t seem like the sort of thing that normally gets the adrenaline pumping. Is it the fact that he told you that you were wrong? I think this cuts closer to the heart of the matter, but this ultimately isn’t really the problem, either. So what is?

Presenting Opinions as Facts

The heart of the issue here, I believe, is the invention of some arbitrary but apparently universal truth. In other words, the subtext of what your acquaintance is saying is, “There is a right answer to the favorite color question, and that right answer is my answer because it’s mine.” The place where the conversation goes off the rails is the place at which one of the participants declares himself to be the ultimate Clearinghouse of color quality. So, while the “you’re wrong” part may be obnoxious, and it may even be what grinds the listener’s teeth in the moment, it’s just a symptom of the actual problem: an assumption of objective authority over a purely subjective matter.

To drive the point home, consider a conversation with a friend or family member instead of a mere acquaintance. Consider that in this scenario the “you’re wrong” would probably be good-natured and said in jest. “Dude, you’re totally wrong–everyone knows red is the best color!” That would roll off your back, I imagine. The first time, anyway. And probably the second time. And the third through 20th times. But, sooner or later, I’m pretty sure that would start to wear on you. You’d think to yourself, “Is there any matter of opinion about which I’m not ‘wrong,’ as he puts it?”

In the example of favorite color and other things friends might discuss, this seems pretty obvious. Who would seriously think that there was an actual right answer to “What’s your favorite color?” But what about the aforementioned Apple products versus, say, Microsoft or Google products? What about the broader spectrum of consumer products, including deep dish versus thin crust pizza or American vs Japanese cars? Budweiser or Miller? Maybe an import or a microbrew? What about sports teams? Designated hitter or not? Soccer or football?

And what about technologies and programming languages and frameworks? Java versus .NET? Linux versus Windows? Webforms vs ASP MVC? What about finer granularity concerns? Are singletons a good idea or not? Do curly braces belong on the same line as a function definition or the next line? Layered or onion architecture? Butter side up or butter side down? (Okay, one of those might have been something from Dr Seuss.)

It’s All in the Phrasing

With all of these things I’ve listed, particularly the ones about programming and others like them, do you find yourself lapsing into declarations of objective truth when what you’re really doing is expressing an opinion? I bet you do. I know I do, from time to time. I think it’s human nature, or at the very least it’s an easy way to try to add additional validity to your take on things. But it’s also a logical fallacy (appeal to authority, with you as the authority, or, as I’ve seen it called, confusing fact with opinion.) It’s a fallacy wherein the speaker holds himself up as the arbiter of objective truth and his opinions up as facts. Whatever your religious beliefs may be, that is a role typically reserved for a deity. I’m pretty sure you’re not a deity, and I know that I’m not one, so perhaps we should all, together, make an effort to be clear if we’re stating facts (“two plus one is four”) or if we’re expressing beliefs or opinions (“Three is the absolute maximum number of parameters I like to see for a method”).

Think of how you you would react to the following phrases:

  • I like giant methods.
  • I believe there’s no need to limit the number of control flow statements you use.
  • I would have used a service locator there where you used direct dependency injection.
  • I prefer to use static methods and especially static state.
  • I wish there were more coupling between these modules.
  • I am of the opinion that unit testing isn’t that important.

You’re probably thinking “I disagree with those opinions.” But your hackles likely aren’t raised. Your face isn’t flushed, and your adrenaline isn’t pumping in anticipation of an argument against someone who just indicted your opinions and your way of doing things. You aren’t on the defensive. Instead, you’re probably ready to argue the merits of your case in an attempt to come to some mutual understanding, or, barring that, to “agree to disagree.”

Now think of how you’d react to these statements.

  • Reducing the size of your methods is a waste of time.
  • Case statements are better than polymorphism.
  • If you use dependency injection, you’re just wrong.
  • Code without static methods is bad.
  • The lack of coupling between these modules was a terrible decision.
  • Unit testing is a dumb fad.

How do you feel now? Are your hackles raised a little bit, even though you know I don’t believe these things? Where the language in the first section opened the door for discussion with provocative statements, the language in this section attempts to slam that door shut, not caring if your fingers are in the way. The first section states the speaker’s opinions, where the language in the second indicts the listener’s. Anyone wanting to foster a cooperative and pleasant environment would be well served to favor things stated in the fashion of the first set of statements. It may be tempting to make your opinions seem more powerful by passing them off as facts, but it really just puts people off.

Caveats

I want to mention two things as a matter of perspective here. The first is that it would be fairly easy to point out that I write a lot of blog posts and give them titles like, “Testable Code Is Better Code,” and, “You’re Doin’ It Wrong,” to say nothing of what I might say inside the posts. And while that’s true, I would offer the rationale that pretty much everything I might post on a blog that isn’t a simple documentation of process is going to be a matter of my opinion, so the “I think” kind of goes without saying here. I can assure you that I do my best in actual discussions with people to qualify and make it clear when I’m offering opinions. (Though, as previously mentioned, I’m sure I can improve in this department, as just about anyone can.)

The second caveat is that what I’m saying is intended to apply to matters of complexity that are naturally opinions by their nature. For instance, “It’s better to write unit tests” is necessarily a statement of opinion since qualifying words like “better” invite ambiguity. But if you were to study 100 projects and discover that the ones with unit tests averaged 20% fewer defects, this would simply be a matter of fact. I am not advocating downgrading facts to qualified, wishy-washy opinions. What I am advocating is that we all stop ‘upgrading’ our opinions to the level of fact.

By

How to Create Good Abstractions: Oracles and Magic Boxes

Oracles In Math

When I was a freshman in college, I took a class called “Great Theoretical Ideas In Computer Science”, taught by Professor Steven Rudich. It was in this class that I began to understand how fundamentally interwoven math and computer science are and to think of programming as a logical extension of math. We learned about concepts related to Game Theory, Cryptography, Logic Gates and P, NP, NP-Hard and NP-Complete problems. It is this last set that inspires today’s post.

This is not one of my “Practical Math” posts, so I will leave a more detailed description of these problems for another time, but I was introduced to a fascinating concept here: that you can make powerful statements about solutions to problems without actually having a solution to the problem. The mechanism for this was an incredibly cool-sounding concept called “The Problem Solving Oracle”. In the world of P and NP, we could make statements like “If I had a problem solving oracle that could solve problem X, then I know I could solve problem Y in polynomial (order n squared, n cubed, etc) time.” Don’t worry if you don’t understand the timing and particulars of the oracle — that doesn’t matter here. The important concept is “I don’t know how to solve X, but if I had a machine that could solve it, then I know some things about the solutions to these other problems.”

It was a powerful concept that intrigued me at the time, but more with grand visions of fast factoring and solving other famous math problems and making some kind of name for myself in the field of computer science related mathematics. Obviously, you’re not reading the blog of Fields Medal winning mathematician, Erik Dietrich, so my reach might have exceeded my grasp there. However, concepts like this don’t really fall out of my head so much as kind of marinate and lie dormant, to be re-appropriated for some other use.

Magic Boxes: Oracles For Programmers

One of the most important things that we do as developers and especially as designers/architects is to manage and isolate complexity. This is done by a variety of means, many of which have impressive-sounding terms like “loosely coupled”, “inverted control”, and “layered architecture”. But at their core, all of these approaches and architectural concepts have a basic underlying purpose: breaking problems apart into isolated and tackle-able smaller problems. To think of this another way, good architecture is about saying “assume that everything else in the world does its job perfectly — what’s necessary for your little corner of the world to do the same?”

That is why the Single Responsibility Principle is one of the most oft-cited principles in software design. This principle, in a way, says “divide up your problems until they’re as small as they can be without being non-trivial”. But it also implies “and just assume that everyone else is handling their business.”

Consider this recent post by John Sonmez, where he discusses deconstructing code into algorithms and coordinators (things responsible for getting the algorithms their inputs, more or less). As an example, he takes a “Calculator” class where the algorithms of calculation are mixed up with the particulars of storage and separates these concerns. This separates the very independent calculation algorithm from the coordinating storage details (which are of no consequence to calculations anyway) in support of his point, but it also provides the more general service of dividing up the problem at hand into smaller segments that take one another granted.

Another way to think of this is that his “Calculator_Mockless” class has two (easily testable) dependencies that it can trust to do their jobs. Going back to my undergrad days, Calculator_Mockless has two Oracles: one that performs calculations and the other that stores stuff. How do these things do their work? Calculator_Mockless doesn’t know or care about that; it just provides useful progress and feedback under the assumption that they do. This is certainly reminiscent of the criteria for “Oracle”, an assumption of functionality that allows further reasoning. However, “Oracle” has sort of a theoretical connotation in the math sense that I don’t intend to convey, so I’m going to adopt a new term for this concept in programming: “Magic Box”. John’s Calculator_Mockless says “I have two magic boxes — one for performing calculations and one for storing histories — and given these boxes that do those things, here’s how I’m going to proceed.”

How Spaghetti Code Is Born

It’s one thing to recognize the construction of Magic Boxes in the code, but how do you go about building and using them? Or, better yet, go about thinking in terms of them from the get-go? It’s a fairly sophisticated and not always intuitive method for deconstructing programming problems.

nullTo see what I mean, think of being assigned a hypothetical task to read an XML file full of names (right), remove any entries missing information, alphabetize the list by last name, and print out “First Last” with “President” pre-pended onto the string. So, for the picture here, the first line of the output should be “President Grover Cleveland”. You’ve got your assignment, now, quick, go – start picturing the code in your head!

What did you picture? What did you say to yourself? Was it something like “Well, I’d read the file in using the XDoc API and I’d probably use an IList<> implementer instead of IEnumerable<> to store these things since that makes sorting easier, and I’d probably do a foreach loop for the person in people in the document and while I was doing that write to the list I’ve created, and then it’d probably be better to check for the name attributes in advance than taking exceptions because that’d be more efficient and speaking of efficiency, it’d probably be best to append the president as I was reading them in rather than iterating through the whole loop again afterward, but then again we have to iterate through again to write them to the console since we don’t know where in the list it will fall in the first pass, but that’s fine since it’s still linear time in the file size, and…”

And slow down there, Sparky! You’re making my head hurt. Let’s back up a minute and count how many magic boxes we’ve built so far. I’m thinking zero — what do you think? Zero too? Good. So, what did we do there instead? Well, we embarked on kind of a willy-nilly, scattershot, and most importantly, procedural approach to this problem. We thought only in terms of the basic sequence of runtime operations and thus the concepts kind of all got jumbled together. We were thinking about exception handling while thinking about console output. We were thinking about file reading while thinking about sorting strings. We were thinking about runtime optimization while we were thinking about the XDocument API. We were thinking about the problem as a monolithic mass of all of its smaller components and thus about to get started laying the bedrock for some seriously weirdly coupled spaghetti code.

Cut that Spaghetti and Hide It in a Magic Box

Instead of doing that, let’s take a deep breath and consider what mini-problems exist in this little assignment. There’s the issue of reading files to find people. There’s the issue of sorting people. There’s the issue of pre-pending text onto the names of people. There’s the issue of writing people to console output. There’s the issue of modeling the people (a series of string tuples, a series of dynamic types, a series of anonymous types, a series of structs, etc?). Notice that we’re not solving anything — just stating problems. Also notice that we’re not talking at all about exception handling, O-notation or runtime optimization. We already have some problems to solve that are actually in the direct path of solving our main problem without inventing solutions to problems that no one yet has.

So what to tackle first? Well, since every problem we’ve mentioned has the word “people” in it and the “people” problem makes no mention of anything else, we probably ought to consider defining that concept first (reasoning this way will tell you what the most abstract concepts in your code base are — the ones that other things depend on while depending on nothing themselves). Let’s do that (TDD process and artifacts that I would normally use elided):

public struct Person
{
    public string FirstName { get; set; }
    public string LastName { get; set; }

    public override string ToString()
    {
        return string.Format("{0} {1}", FirstName, LastName);
    }
}

Well, that was pretty easy. So, what’s next? Well, the context of our mini-application involves getting people objects, doing things to them, and then outputting them. Chronologically, it would make the most sense to figure out how to do the file reads at this point. But “chronologically” is another word for “procedurally” and we want to build abstractions that we assemble like building blocks rather than steps in a recipe. Another, perhaps more advantageous way would be to tackle the next simplest task or to let the business decide which functionality should come next (please note, I’m not saying that there would be anything wrong with implementing the file I/O at this point — only that the rationale “it’s what happens first sequentially in the running code” is not a good rationale).

Let’s go with the more “agile” approach and assume that the users want a stripped down, minimal set of functionality as quickly as possible. This means that you’re going to create a full skeleton of the application and do your best to avoid throw-away code. The thing of paramount importance is having something to deliver to your users, so you’re going to want to focus mainly on displaying something to the user’s medium of choice: the console. So what to do? Well, imagine that you have a magic box that generates people for you and another one that somehow gets you those people. What would you do with them? How ’bout this:

public class PersonConsoleWriter
{
    public void WritePeople(IEnumerable people)
    {
        foreach (var person in people)
            Console.Write(person);
    }
}

Where does that enumeration come from? Well, that’s a problem to deal with once we’ve handled the console writing problem, which is our top priority. If we had a demo in 30 seconds, we could simply write a main that instantiated a “PersonConsoleWriter” and fed it some dummy data. Here’s a dead simple application that prints, well, nothing, but it’s functional from an output perspective:

public class PersonProcessor
{
    public static void Main(string[] args)
    {
        var writer = new PersonConsoleWriter();
        writer.WritePeople(Enumerable.Empty());
    }
}

What to do next? Well, we probably want to give this thing some actual people to shove through the pipeline or our customers won’t be very impressed. We could new up some people inline, but that’s not very impressive or helpful. Instead, let’s assume that we have a magic box that will fetch people out of the ether for us. Where do they come from? Who knows, and who cares — not main’s problem. Take a look:

public class PersonStorageReader
{
    public IList GetPeople()
    {
        throw new NotImplementedException();
    }
}

Alright — that’s pretty magic box-ish. The only way it could be more so is if we just defined an interface, but that’s overkill for the moment and I’m following YAGNI. We can add an interface if thing later needs to pull customers out of a web service or something. At this point, if we had to do a demo, we could simply return the empty enumeration instead of throwing an exception or we could dummy up some people for the demo here. And the important thing to note is that now the thing that’s supposed to be generating people is the thing that’s generating the people — we just have to sort out the correct internal implementation later. Let’s take a look at main:

public static void Main(string[] args)
{
    var reader = new PersonStorageReader();
    var people = reader.GetPeople();

    var writer = new PersonConsoleWriter();
    writer.WritePeople(people);
}

Well, that’s starting to look pretty good. We get people from the people reader and feed them to the people console writer. At this point, it becomes pretty trivial to add sorting:

public static void Main(string[] args)
{
    var reader = new PersonStorageReader();
    var people = reader.GetPeople().OrderBy(p => p.LastName);

    var writer = new PersonConsoleWriter();
    writer.WritePeople(people);
}

But, if we were so inclined, we could easily look at main and say “I want a magic box that I hand a person collection to and it gives me back a sorted person collection, and that could be stubbed out as follows:

public static void Main(string[] args)
{
    var reader = new PersonStorageReader();
    var people = reader.GetPeople();

    var sorter = new PersonSorter();
    var sortedPeople = sorter.SortList(people);

    var writer = new PersonConsoleWriter();
    writer.WritePeople(people);
}

The same kind of logic could also be applied for the “pre-pend the word ‘President'” requirement. That could pretty trivially go into the console writer or you could abstract it out. So, what about the file logic? I’m not going to bother with it in this post, and do you know why? You’ve created enough magic boxes here — decoupled the program enough — that it practically writes itself. You use an XDocument to pop all Person nodes and read their attributes into First and Last name, skipping any nodes that don’t have both. With Linq to XML, how many lines of code do you think you need? Even without shortcuts, the jump from our stubbed implementation to the real one is small. And that’s the power of this approach — deconstruct problems using magic boxes until they’re all pretty simple.

Also notice another interesting benefit which is that the problems of runtime optimization and exception handling now become easier to sort out. The exception handling and expensive file read operations can be isolated to a single class and console writes, sorting, and other business processing need not be affected. You’re able to isolate difficult problems to have a smaller surface area in the code and to be handled as requirements and situation dictate rather than polluting the entire codebase with them.

Pulling Back to the General Case

I have studiously avoided discussion of how tests or TDD would factor into all of this, but if you’re at all familiar with testable code, it should be quite apparent that this methodology will result in testable components (or endpoints of the system, like file readers and console writers). There is a deep parallel between building magic boxes and writing testsable code — so much so that “build magic boxes” is great advice for how to make your code testable. The only leap from simply building boxes to writing testable classes is to remember to demand your dependencies in constructors, methods and setters, rather than instantiating them or going out and finding them. So if PersonStorageReader uses an XDocument to do its thing, pass that XDocument into its constructor.

But this concept of magic boxes runs deeper than maintainable code, TDD, or any of the other specific clean coding/design principles. It’s really about chunking problems into manageable bites. If you’re even writing code in a method and you find yourself thinking “ok, first I’ll do X, and then-” STOP! Don’t do anything yet! Instead, first ask yourself “if I had a magic box that did X so I didn’t have to worry about it here and now, what would that look like?” You don’t necessarily need to abstract every possible detail out of every possible place, but the exercise of simply considering it is beneficial. I promise. It will help you start viewing code elements as solvers of different problems and collaborators in a broader solution, instead of methodical, plodding steps in a gigantic recipe. It will also help you practice writing tighter, more discoverable and usable APIs since your first conception of them would be “what would I most like to be a client of right here?”

So do the lazy and idealistic thing – imagine that you have a magic box that will solve all of your problems for you. The leap from “magic box” to “collaborating interface” to “implemented functionality” is much simpler and faster than it seems when you’re isolating your problems. The alternative is to have a system that is one gigantic, procedural “magic box” of spaghetti and the “magic” part is that it works at all.