DaedTech

Stories about Software

By

TDD Even with DateTime.Now

Recently, I posted my incredulity at the idea that someone would go to the effort of writing unit tests and not source control them as a matter of course. I find that claim as absurd today as I did then, but I did happen on an interesting edge case where I purposely discarded a unit test I wrote during TDD that I would otherwise have kept.

I was writing a method that would take a year in integer form and populate a drop down list with all of the years starting with that one up through the current year. In this project, I don’t have access to an isolation framework like Moles or Typemock, so I have no way of making DateTime.Now return some canned value (check out this SO post for an involved discussion of the sensitivity of unit tests involving DateTime.Now).

So, as I thought about what I wanted this method to do, and how to get there, I did something interesting. I wrote the following test:

[TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
public void Adds_Item_When_Passed_2012()
{
    var myFiller = new CalendarDropdownFiller(new DateTimeFormatInfo());
    var myList = new DropDownList();
    myFiller.SeedWithYearsSince(myList, 2012);

    Assert.AreEqual(1, myList.Items.Count);
}

To get this to pass, I changed the method SeedWithYearsSince() to add a random item to the list. Next test I wrote was:

[TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
public void Adds_Item_With_Text_2012_When_Passed_2012()
{
    var myFiller = new CalendarDropdownFiller(new DateTimeFormatInfo());
    var myList = new DropDownList();
    myFiller.SeedWithYearsSince(myList, 2012);

    Assert.AreEqual("2012", myList.Items[0].Value);
}

Now, I had to actually add “2012” in the method, but it was still pretty obtuse. To get serious, I wrote the following test:

[TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
public void Adds_Two_Items_When_Passed_2011()
{
    var myFiller = new CalendarDropdownFiller(new DateTimeFormatInfo());
    var myList = new DropDownList();
    myFiller.SeedWithYearsSince(myList, 2011);

    Assert.AreEqual(2, myList.Items.Count);
}

Now the method had to do something smart, so I wrote:

public virtual void SeedWithYearsSince(DropDownList list, int year)
{
    for (int index = year; index <= DateTime.Now.Year; index++)
        list.Items.Add(new ListItem(index.ToString()));
}

And, via TDD, I got to the gist of my method correctly. (I would later write tests that passed in a null list and a negative year and test that descriptive exceptions were thrown, but this is more or less the finished product). But now, let's think about the unit tests vis a vis source control.

Of the three tests I've written, the first two should always pass unless I get around to finishing the time machine that I started building a few years back. We might consolidate those into a single test that's a little more meaningful, perhaps by dropping the first one. We might also tease out a few more cases here to guard against regressions, say proving that calling it with 2010 adds 2010, 2011 and 2012 or something. While I don't generally feel good about checking in tests that exercise code dependent on external state (like "Now"), we can feel pretty good about these given the nature of "Now".

But that last test about 2 items when passed 2011 is only good for the remainder of 2012. When you wake up bright and early on New Year's morning and run to the office and kick off a test run, this test will fail. Clearly we don't want to check that test in, so all things being equal, we'll discard it. That's a bummer, but it's okay. The point of the unit tests written here was a design strategy -- test driven development. If we can't keep the artifacts of that because, say, we don't have access to an isolation framework or permission ot use one, it's unfortunate, but c'est la vie. We'll check in the tests that we can and call it a day.

This same reasoning applies within the context of whatever restrictions are placed on you. Say you are assigned to a legacy codebase (using the Michael Feathers definition of "legacy" as code without unit tests) and do not have rights to add a test project, for whatever reason. Well, then write them to help you work, keep them around as best you can to help for as long as you can, and discard them when you have to. If you have a test project but not Moles or Typemock, you do what we did here. If you have code that you have to use that lacks seams, contains things like singeltons/static methods or otherwise presents testability problems, take the same approach. Better to test during TDD and discard then not to test at all since you can at least guard against regressions and get fast feedback during initial development.

I've often heard people emphasize that TDD is a development methodology first and the unit tests for source control are a nice ancillary benefit. But I think the example of DateTime.Now really drives home that point. The fact that DateTime.Now (or legacy code, GUI code, threaded code, etc) is fickle and hard to test need not be a blocker from doing TDD. Clearly I think we should strive to write only meaningful tests and to keep them all around, but this isn't an all or nothing proposition. Make sure you're verifying your code first and foremost, preserve what you can, and seek to improve through increasingly decoupled code, better tooling, and more practice writing good tests.

By

SQL Queries and String.Join()

Over the last few weeks, I’ve found myself working within a framework where I’m writing a lot of SQL. Most specifically, in the code I’m writing a lot of WHERE clauses related to optional user search parameters. As a simple example, consider a search over “customer” where a user could filter by a part of a customer name or by a selectable customer type or simply list all customers. This creates a situation where I can have a where clause with 0, 1, or 2 entries in it depending on what the user wants to do.

The consequence of this that my where clause may be blank, it may have a clause or it may have two clauses with an AND joining them. The most basic (naive) way to handle this is to check for each control/clause whether the user has entered something and if so, to append “{Clause} AND ” to a string builder. Then, you snip off the last 5 characters to take care of the spurious “AND” that got appended. I think we’ve all seen this sort of thing before in some form or another.

But then, I got to thinking a bit, and realized that the problem I was facing here was really that I would have n clauses and would want n – 1 ANDs (except the special case of zero, where I would want zero ANDs). A clause is just a string and the ” AND ” is essentially a delimiter, so this is really a problem of having a collection of strings and a delimiter and wanting to jam them together with it. What I want is the opposite of String.Split().

And, as it turns out, the opposite of Split() is a static method on the string class called String.Join(), which takes an array of strings and a delimiter and does exactly what I need. In this fashion I can add clauses to an object as strings and then query the object for a well-formed WHERE clause. In its simplest incarnation, it would look like this:

public class WhereBuilder
{
    private readonly List _clauses = new List();

    public void Add(string clause)
    {
        _clauses.Add(clause);
    }

    public string GetFullWhereText()
    {
        return String.Join(" AND ", _clauses.ToArray());
    }
}

You keep track of your various sub-clauses of the where in a list, and then join them together on the fly, when requested by consumer code. If you wanted to allow OR instead of AND, that’s pretty simple to support simultaneously:

public class WhereBuilder
{
    private readonly List _clauses = new List();

    public void Add(string clause)
    {
        _clauses.Add(clause);
    }

    public string GetConjunctionClause()
    {
        return Join(" AND ");
    }

    public string GetDisjunctionClause()
    {
        return Join(" OR ");
    }

    private string Join(string separator)
    {
        return String.Join(separator, _clauses.ToArray());
    }
}

Of course, this is handy for constructing clauses that all have the same operator only, and it doesn’t do anything about getting rid of the annoyance of monotonously specifying operators in side of the various clauses, but my purpose here was to highlight the handiness of String.Join() for those who hadn’t seen it before.

Stay tuned if you’re interested in a more sophisticated where clause builder — I’ve been playing with that a bit in my spare time and will post back here with it if it gets interesting.

By

Building Weird Light Switches – Out and Ref

Gut Reaction

I was talking with someone about possible approaches to an API the other day. He asked me if I’d favor a method that took a parameter and had a return value or if I’d prefer a method that was void and took two parameters but with one as a ref parameter. My immediate, knee-jerk response was “I don’t like ref parameters one bit, so I’d prefer the former.” He looked at me with a bit of surprise and then kind of shrugged as if to say, “whatever, weirdo” and went with former. To him it was six or half a dozen.

This made me wonder whether I’m being dogmatic and rigid in the way I approach software, so I spent some time thinking and reading about ref parameters and their cousin, the out parameter. For those of you not acquainted with C#, both of these are ways of passing parameters into methods with the main difference being whether the called method must change the value passed in (out) or whether it can optionally do so (ref). Another way of thinking of this is that you would use out when the initial value of the parameter does not matter and ref when it does.

Here is what the syntax looks like:

int myValue;
//This wouldn't compile, so don't do it: AddOneToValue(ref myValue);

SetValueToTwelve(out myValue);
Console.WriteLine(myValue);

AddOneToValue(ref myValue);
Console.WriteLine(myValue);

In this case, you will see printed 12 and then 13 on the next line, assuming the methods do what they say they will. (This gets even screwier if you’re a Java programmer, in which case you need to create a wrapper class or use int[] or something since ints are immutable and parameters are always passed by value even if objects are accessed on the heap by reference).

What Does the Internet Think?

Usually when I react strongly and then want to justify my reaction, I go see what others have to say, preferably those I respect. The estimable Jon Skeet has this to say:

Basically out parameters are usually a sign that you want to effectively return two results from a method. That’s usually a code smell – but there are some cases (most notably with the TryXXX pattern) where you genuinely want to return two pieces of information for good reasons and it doesn’t make much sense to encapsulate them together.

In other words, avoid out/ref where you can do so easily, but don’t go massively out of your way to avoid them.

In the answer below his, Ryan Lanciaux raises an interesting point when he says that “[out/ref parameters] basically add side-effects to your code and could be a nightmare when it comes to debugging.”

So, two take-aways here are that having a method return two distinct results is a code smell and that method side effects tend to be a problem. On the flip-side of the argument mainly seems to be somewhat of a pragmatic, duct-tape programming kind of argument that sometimes the purist approach just isn’t worth the effort and potential awkwardness. The iconic example of using out parameters is the T.TryParse(string, out t) methods in C# (which I really don’t like, but I’m trying to be suspend my bias for the sake of investigation).

Next up, here’s what, well, MSDN has to say in explanation of a static analysis design warning that they raise entitled “Avoid out parameters”:

Passing types by reference (using out or ref) requires experience with pointers, understanding how value types and reference types differ, and handling methods with multiple return values. Also, the difference between out and ref parameters is not widely understood.

Although return values are commonplace and heavily used, the correct application of out and ref parameters requires intermediate design and coding skills. Library architects who design for a general audience should not expect users to master working with out or ref parameters.

There’s a certain irony to this, but I definitely understand the point. The irony is that the same outfit that put these features into the language raises a warning telling you not to use them. Don’t get me wrong — I understand that this is akin to making computers that can be taken apart while warning users not to do so, since many of them aren’t really qualified — but the irony amuses me nonetheless. It’s also interesting that MSDN seems to think that pointers and reference vs value are “intermediate” language features. Perhaps the fact that I cut my teeth on C and C++ as a wee programmer is showing, but… seriously? Intermediate?

At any rate, the consensus on the subject that I’ve seen at these and a variety of other blogs and stack overflow posts seems to be that out/ref parameters are generally to be avoided… except when they’re sort of unavoidable, either because of interop concerns or because you really want (need?) a function that returns two or more things.

Do One Thing

But isn’t a function that does two things a violation of the Single Responsibility Principle of SOLID fame, applied at the method level? And aren’t out/ref parameters, pretty much by definition, side effects that constitute violations of Command-Query Separation, a paradigm in which methods that mutate state (commands) are separated from methods that retrieve information (queries) and ne’er the twain shall meet? I mean, any method that ‘returns’ two values is, well, doing at least two things and any method that mutates object state and kicks back a ref/out parameter is serving as command and query.

But what about methods like the obtuse ones above, SetValueToTwelve() and AddOneToValue()? Those are void methods that mutate only the out/ref parameter. They could be made static and rewritten as int Return12() and int AddOneToValue(int value) without altering their purpose or effect. So, they’re not really violating SRP or CQS, right? They’re just slightly more awkward versions of familiar APIs.

But that really hits home with me. Why do I want something that’s either slightly more awkward or a violation of some very helpful clean coding practices? I mean, we’re not really shooting for the moon there are we, if something is at best somewhat awkward and at worst a code smell. Why do it at all? Methods should pick one thing and do it (or return it) and should do it as non-awkwardly as possible.

What If Light Switches Worked This Way?

I like to think of our task as programmers in terms of abstractions (in case you hadn’t caught my series on abstractions, feel free to click that tag and see me harp on it in a lot of posts). One easy abstraction for me to relate to the world of programming is turning a light switch on and off. This is a classic case of a class Light that exposes a command SetOnValue(bool) and exposes a readonly property, bool IsOn. So, as in the real world, I move the switch up or down and then separately observe the results. (Let’s ignore for argument sake that it might be better to model “Switch” and “Light” as separate entities).

This is a great example of Command-Query Separation. Toggling the light on or off is a command, and looking at the light to see whether or not it’s on is a query. But, let’s blur the lines for a moment and rewrite this so that there is no readonly “IsOn” property. Instead, SetOnValue(value) will return a boolean indicating whether the light is on or not. So now, we have a switch that also acts as the thing that tells us whether or not it’s on — our wall switch also just became a light. Now, when you toggle the switch, the switch itself glows to give you feedback. Weird.

But, it gets weirder. Instead of having our SetOnValue() function return a bool, let’s feed it a ref parameter. On the way in, we’ll indicate the value we want, and on the way out, it will indicate the value that we’re going to get. In terms of modeling the real world, ref parameters are kind of mind-blowing. You hand some external thing a piece of yourself and it alters that for you. So now, we have a light switch that I flip on with my hand, and indicates the success of that operation by modifying my hand – let’s say burning it. So, I flip the switch on, and if it works, the blazing hot bulb in the switch burns my hand (but gives off no light, since the burn is how I know it’s working). So, there you have it – a strange path from a light switch that turns on lights to one that simply injures me.

I realize that this metaphor is a touch strained, but here’s the thing: ref and out parameters are weird and counter-intuitive. It’s hard for them not to strain a metaphor that they’re involved in. Anything I’m handed in life could be represented conceptually as thing or a collection of things. Any action I take in life could be represented as a void method with 0 or more parameters. But what is a ref paramter? Where in life do I take something, set it to a way I want it, give it to something, and then have it given back to me differently? Maybe as part of an assembly line or in some weird, Rube-Goldberg kind of process, but this is hardly how normal interactions are modeled.

Ref and out are leaky abstractions in terms of code readability. They reek of code. Code involving these things doesn’t read like simple, well written prose or a nice story — it reads like a procedural construct such as the withholding worksheet for your paycheck deductions. So, like dealing with the IRS, why don’t we avoid that if we can? And, I think you’d be pretty hard-pressed to argue that you can’t.

By

Constructor Overloads: Know When to Say When

Paralysis By Options

Do you ever find yourself in a situation where some API or another requires you to instantiate an object? (If you’re reading this blog, the answer is probably “yes”). What do you usually do at this point? Instantiate it, compile, and make sure you’re good before poking around to see what your new object has to offer, usually in the form of auto-complete/intellisense? I think that’s what most would do. Word DOC APIs and other such things are all well and good as a backup plan, but let’s get serious – you want to play with the object and read the instructions only if you can’t figure out what to do. And the last thing you want to do is go reading the code of that class or, worse still, hunt down the guy that wrote it.

But, what about those times that the instantiation gets a little sidetracked? You go to instantiate the object and it’s like wandering into a Baskin Robbins knowing only that you vaguely feel like ice cream. So many flavors to choose from, but which is the right one?

In the picture above, I’ve decided I want an Aquarium object, and Intellisense informs me that there are no less than 11 ways that I can make this happen. That’s right, 11. My immediate, gut reaction to this information is to go off to implement the “AdoptADog” method instead and put this nonsense off until later.

But Aren’t More Choices Better?

With constructors, no, not really. I’ve talked before about the problem with bloated constructors and my opinion that a constructor should do nothing but ensure that the object initializes with class level variants established. With that in mind, either some of these overloads are doing more than is necessary or else some of them fail to meet this basic criteria. The former is pointless speculative coding and the latter means that your objects can be instantiated in states that are not valid. Either one of these is a problem.

I believe there is a tendency, especially if you don’t practice TDD or even write unit tests at all, to go off on tangents about how developers may want to instantiate objects. Maybe developer X will want to instantiate an aquarium with all defaults whereas developer Y will want to specify how many gallons it holds and how many fish are in it. Maybe developer Z just wants to initialize with the kind of rocks that go in the bottom or the kind of light that shines on top. Maybe everyone wants to initialize specifying salt or fresh water. Let’s think of every combination of things anyone may want to do to this object and offer them all up as constructor overloads, right?

But you know what? That’s what the public API is for with accessors and mutators. Everyone can do it that way. Save the constructor for things without which the aquarium makes no sense (e.g. capacity) and let everyone call a property setter or a mutator for the rest. C# even has some syntactic sugar for just this occasion.

If you add in a bunch of overloads, you may think that you’re being helpful, but you’re really just muddying the waters and paralyzing your clients with options. I may want to instantiate an aquarium and use it to hold a bunch of dirt from my back yard — so why I am I being offered all of these options about fish and water and aquarium plants and plastic divers? I don’t care about any of that. But, I’ll hesitate to omit it because for all I know I should instantiate the object with those things. I mean, with all of those overloads, some are probably vestigial or at least less frequently used. I don’t want to use something that might be deprecated or untested and nobody wants to maintain a bunch of methods that may never even be used.

In the end, what I’ll wind up doing is digging out the word document that describes this thing or going to the developer who wrote it and asking which one to use. And that sucks. If you offer me only one option — the minimal constructor that establishes the invariants and forces any critical dependencies on the client — I’ll use that option and go on my merry way. There will be nothing to think about and certainly nothing to read word documents or send emails about. And that is the essence of providing usable code and good abstractions.

(And incidentally, since Visual Studio 2010, C# has really taken away any good excuse for a lot of overloads with optional/default parameters).

By

Methods Are Little Stories – Abstractions Are Important 6

If Then, If Then, If Then

Yesterday’s post where I included Grady Booch’s comment that clean code “reads like well written prose” made me think of something I’ve been contemplating. The other day I was looking at some code and I saw the following (obfuscated):

public void GrabUmbrellaIfNecessary()
{
    if (IsItRaining())
    {
        if (DoINeedToLeave())
        {
            if (AmIParkedInTheStreet())
            {
                GrabUmbrella();
            }
        }
    }
}

I automatically started refactoring this to the following:

public void GrabUmbrellaIfNecessary()
{
    if (IsItRaining() && DoINeedToLeave() && AmIParkedInTheStreet())
        GrabUmbrella();
}

and then:

public void GrabUmbrellaIfNecessary()
{
    if (DoINeedAnUmbrella())
        GrabUmbrella();
}

private bool DoINeedAnUmbrella()
{
    return IsItRaining() && DoINeedToLeave() && AmIParkedInTheStreet();
}

To me, this keeps in line with Grady’s statement and is easy to reason about at every level. “If I need an umbrella, get it” and “I need an umbrella if it’s raining, I need to leave, and I’m parked in the street” are pieces of code so simple that one need not be a programmer to understand the logic. I think it’s hard to argue that this is less conversational than “If it’s raining then if I need to leave then if I am parked in the street then grab an umbrella.”

But does this matter? Am I just being fussy and shuffling around the code to no real benefit? Are there advantages to the “ifception” approach (thanks to Dan Martin for this term)? Why would someone prefer this style? These were the things that I found myself contemplating.

The Case For Ifception?

In order to understand possible advantages or reasons for this preference, I sought to figure out the motivation. My first thought was that someone would write code this way if they missed the week in discrete math/logic where DeMorgan’s laws and the rules of inference in Boolean Algebra were covered. However, I don’t like to assume incompetence or ignorance when the only evidence present is evidence only of a different preference than mine, so let’s dismiss that as a motivation.

The second thing that occurred to me was a lack of awareness or mistrust of the compiler short-circuiting and operations. To put it another way, they believe that all three conditions will be checked even if the first one fails, so the ifception is more efficient. But, again, this requires an assumption of ignorance, so let’s assume that the author understands conditional short-circuiting.

After that a slightly more valid motivation dawned on me (and one that doesn’t assume ignorance/incompetence) – the author loves debugger! That is, perhaps the code author likes it this way because he or she prefers to be able to step through the method and see the short circuiting or success in action.

As I poked around a little more, I found code in the same class of this form as well:

public void GrabUmbrellaIfNecessary()
{
    if (!IsItRaining())
        return;

    if (!DoINeedToLeave())
        return;

    if (!AmIParkedInTheStreet())
        return;

    GrabUmbrella();
}

Two data points now seem to point to my conclusion. I believe the motivation here is Debugger Driven Development (DDD) — a term that I’ll use to describe the approach where you write production code specifically designed to be stepped through in the debugger. This is a rather pessimistic approach since it seems to say “when you’re dealing with my code you’re going to be in the debugger… a lot… seriously, I have no idea how or even if my code works.”

I will also allow for the possibility that someone might view these approaches as inherently more readable, but I can only imagine that’s the case as a result of familiarity. In other words, this style is only more readable if it’s what you expect to see — I doubt anyone not versed in programming would gravitate toward nested conditionals and/or return statements as approachable.

If anyone can think of an additional benefit that I’m missing, please let me know. Or, in other words:

public void LetMeKnowIfIAmMissingABenefit()
{
    if (!DoesABenefitExist())
        return;

    if (HasItAlreadyBeenMentioned())
        return;

    if (!AmIMissingIt())
        return;

    PleaseLetMeKnow();
}

Does It Really Matter?

So having tentatively identified a motivation for ifceptions (DDD), is this style preferable? Harmless? To be avoided? I actually wrestled with this for a while before forming my opinion. The style is very different from what I prefer and am used to, but I try very hard not to conflate “I’m not used to that” with “That’s bad”. Doing so is the height of arrogance and will greatly hinder one’s ability to learn.

That said, the conclusion I came to was that this should be avoided if possible. And the reason it should be avoided boils down to method level abstractions. A method should tell a story. The method “GrabUmbrellaIfNecessary” tells a story — it tells you that it’s going to figure out whether an umbrella is needed and grab it if so. As a client of that method, you’re going to take it at face value that it does what it advertises, but if you do decide to drill into the method, you’re expecting to see a concise implementation of what’s advertised.

In the factored example, that’s exactly what you see. What better captures “GrabUmbrellaIfNecessary” than a single if condition for “DoINeedAnUmbrella” followed by a “GrabUmbrella” for a true evaluation? But what about the ifception example? Well, I see that there’s a condition to see whether it’s raining or not and then a scoped block of code with another conditional. Oh, okay, if it’s raining, we’ll get in there and then we’ll see if I need to leave in which case we’ll get… another scoped block of code. Okay, okay, so now, we need to know where I’m parked and, what were we doing again? Oh, right, we’re seeing whether we need to get into another scoped block of code. Ah, okay, if we’re parked in the street, here’s the meat of the method – grab the umbrella!

Notice that in the ifception reading, you see words like “scope” and “block”. I’m having to think of scoping rules, brackets, nested conditionals, control flow and other language constructs. Each of these things has exactly nothing to do with whether I should bring my umbrella or not and yet I’m thinking of them. If you look at the flattened early return method, a similar thing is happening:

If it’s not raining, then return. Okay, assuming we’re still in the method, if it’s not true that I need to leave, then return. Okay, now if I’m still in the method, if I’m not parked in the street then return. Okay, if I’m still in the method, then get the umbrella.

“Return”? “In the method”? What does that have to do with whether I need an umbrella and getting that umbrella? And why do I care about “not is raining” if I’m trying to figure out whether to use an umbrella? If it’s not not raining, do I need an umbrella? I think… ugh.

An easy argument to make at this point is “Erik, you’re a programmer and you should understand things like scope and early returns — don’t be lazy.” While this is true, it’s also true that I’m capable of squinting and making out tiny, bright yellow font against a white background. In neither case is that enjoyable or a good use of my time and effort when it’s possible simply to have more clarity and ease of reading.

Programmers are paid to solve problems by handling and abstracting away complexity. This applies to end-users and fellow programmers as well. It’s easy to lose sight of this and believe that knowledge of particular languages, frameworks, etc is the end goal, but that knowledge is simply a tool used in pursuit of the actual end goal: problem solving. If methods are written in such a way as to make them read like “well written prose”, I don’t have to focus on language syntax and details. I don’t have to be acutely aware that I’m in the middle of a method, figuring out scope and/or when to return. Instead, I can focus on the business logic of my problem domain and meeting user requirements.

Method writing techniques that make language syntax an afterthought are good abstractions. They hide the bare-metal, nitty-gritty details of writing C# (or any language/framework) as much as possible, exposing only enough to facilitate understanding of the problems being solved. And while it’s never going to be possible to avoid all scoping, returning and other such method housekeeping, you can certainly arrange your methods in such a way as to minimize and hide it, rather than to distract readers by calling attention to it.