DaedTech

Stories about Software

By

Why the Statement “I Don’t Source Control My Unit Tests” Makes Me Happy

An Extraordinary Claim

I was talking to a friend the other day, and he relayed a conversation that he’d had with a co-worker. They were discussing writing unit tests and the co-worker claimed that he writes unit tests regularly, but discards them and never checks them into source control. I was incredulous at this, so I asked what the rationale was, to which my friend cited his co-worker’s claims (paraphrased) that “[unit tests] are fragile, they take too long to maintain, they prohibit adapting to new requirements, errors are caused by user change requests not incorrect code, and they’re hard to understand.” (emphasis mine, and I’ll return to this later).

This struck me as so bizarre — so preposterous — that I immediately assumed I was out of my league and had missed an industry sea-change. I remember a blog post by Scott Hanselman entitled “I’m A Phony. Are You?” and I felt exactly what he meant. Clearly, I’d been exposed. I was not only wrong about unit tests, but I was 180 degrees, polar opposite, comically wrong. I had wasted all of this time and effort with TDD, unit test maintenance, unit tests in the build, etc when what I really should have been doing was writing these things once and then tossing them since their benefit is outweighed by their cost.

Given that I don’t have direct access to this co-worker, I took to the internet to see if I could locate the landmark post or paper that had turned the tide on this subject and no doubt a whole host of other posts and papers following suit. So, I googled “Don’t Check In Unit Tests”. Your mileage may vary because I was logged into gmail, but I saw old posts by Steve Sanderson and Jeff Atwood encouraging people to write tests, a few snarky posts about not testing, and other sorts of things you might expect. I had missed it. I tried “Don’t version control unit tests” and saw similar results. A few variants on the same and similar results too. Apparently, most of the internet was still thinking inside of the box that the co-worker had escaped with his bold new vision.

Finally, I googled “Should I source control unit tests?” and found one link from programmer’s stack exchange and another from stackoverflow. In a quick and admittedly very rough count of people who had answered and/or voted one way or the other, the vote appeared to be about 200 to 0 for source controlling versus leaving out, respectively. The answer that most succinctly seemed to summarize the unanimous consensus was Greg Whitfield’s, which started with:

Indeed yes. How could anyone ever think otherwise?

Hmmm…

There’s Really No Argument

With my confidence somewhat restored, I started thinking of all the reasons that source controlling unit tests makes sense:

  1. Provides living ‘documentation’ of intentions of class author.
  2. Prevents inadvertent regressions during changes.
  3. Allows fearless refactoring of your code and anyone else’s.
  4. Incorporation into the build for metric and verification purposes.
  5. Testing state at the time of tagged/branched versions can be re-created.
  6. (As with source controlling anything) Hard drive or other hardware failure does not result in lost work.

I could probably go on, but I’m not going to bother. Why? Because I don’t think that the rationale are actually rationale for not checking in unit tests. I think it’s more likely that this person who “writes tests but doesn’t actually check them in” might have had a Canadian girlfriend as a teenager. In other words, I sort of suspect that these “disposable” unit tests, whose existence can neither be confirmed nor denied, may not necessarily actually exist and so the rationale for not checking them in becomes, well, irrelevant.

And, look at the rationale. The first three clauses (fragile, too long to maintain, prohibit new work) seem to be true only for someone who writes really bad unit tests (i.e. someone without much practice who may, I dunno, be discouraged during early attempts). Because while it’s true that unit tests, like any other code, constitute a maintenance liability, they need not be even remotely prohibitive, especially in a nicely decoupled code base. The fourth clause (errors are everyone’s fault but the programmer) is prima facie absurd, and the fifth and most interesting clause seems to be an unwitting admission of the problem – a difficulty understanding how to unit test. This is probably the most important reason for “not checking in” or, more commonly and accurately, not writing unit tests at all — they’re hard when you don’t know how to write them and haven’t practiced it.

So Why the Happy?

You’re probably now wondering about the post title and why any of this would make me happy. It’s simply, really. I’m happy that people feel the need to make excuses (or phantom unit tests) for why they don’t automate verification of their work. This indicates real progress toward what Uncle Bob Martin describes in a talk that he does called “Demanding Software Professionalism” and alludes to in this post. In his talk, Bob suggests that software development is a nascent field compared to others like medicine and accounting and that we’re only just starting to define what it means to be a software professional. He proposes automated verification in the form of TDD as the equivalent of hand-washing or double-entry bookkeeping respectively in these fields and thinks that someday we’ll look back on not practicing TDD and clean coding the way we now look back on surgeons that refused to wash their hands prior to surgery.

But having a good idea and even demonstrating that it works isn’t enough. Just ask Ignaz Semmelweis who initially discovered, and empirically demonstrated that hand-washing reduced surgery mortality rates, only to be ridiculed and dismissed by his peers in the face of cold-hard evidence. It wasn’t until later, after Semmelweis had been committed to an insane asylum and died that his observations and hypothesis got more backers (Lister, Pasteur, et al) and a better marketing campaign (an actual theoretical framework of explanation called “Germ Theory”). In Semmelweis’s time, a surgeon could just call him a crank and refuse to wash his hands before surgery. Decades later, he would have to say, “dude, no, I totally already washed them when you weren’t looking” if he was feeling lazy. You can even ask his Canadian girlfriend.

At the end of the day, I’m happy because the marketing for TDD and clean coding practice must be gaining traction and acceptance if people feel as though they have to make excuses for doing it. I try to be a glass half full kind of guy, and I think that’s the glass half full outlook. I mean one person not wanting to automate tests doesn’t really matter in the scheme of things at the moment since there is no shortage of people who also don’t want to, but it’s good to see people making excuses/stories for not wanting to rather than just saying “pff, waste of time.”

(And, for what it’s worth, I do acknowledge the remote possibility that someone actually does write and discard unit tests on a regular and rigorous basis. I just don’t think it’s particularly likely.)

By

Improve Productivity with the Humble ToDo List

Micro-Scrum

A week or two ago, I read Stephen Walther’s blog post “Scrum in 5 Minutes” and reading his description of the backlog reminded me of a practice that I’ve been getting a lot of mileage out of lately. My practice, inspired by Kent Beck in his book Test Driven Development By Example, is to keep a simple To-Do list of small development tasks as I work.

The parallels here are rather striking if you omit the portions of Scrum that have to do with collaboration and those types of logistics. When starting on a task, I think of the first few things that I’ll need to do and those go on the list. I prioritize them by putting the most important (usually the ones that will block progress on anything else) at the top, but I don’t really spend a lot of time on this, opting to revise or refine it if and when I need to. Any new item on the list is yellow and when done, I turn it green.

There are no intermediate states and there is no going back. If I have something like “create mortgage calculator class” and I turn it green when I’m happy with the class, I don’t later turn that back yellow or some other color if the mortgage calculator needs to change. This instead becomes a new task. Generally speaking, I try to limit the number of yellow tasks I have (in kind of a nod to Kanban’s WIP limits), though I don’t have a hard-fast rule for this. I just find that my focus gets cluttered when there are too many outstanding tasks.

If I find that a yellow item is taking me a long time, I will delete that item and replace it with several components of it. The aim is always to have my list be a series of tasks that take 5-15 minutes to complete (though they can be less). Items are added both methodically to complete the task and as reminders of things that occur to me when I’m doing something else. For example, if I fire up the application to verify a piece of integration that involves a series of steps and I notice that a button is the wrong color, I won’t drop everything and sidetrack myself by changing the button. I’ll add it to my queue; I don’t want to worry about this now, but I don’t want to forget about it.

I never actually decided on any of these ‘rules’. They all kind of evolved through some evolutionary process algorithm where I kept practices that seemed to help me and dropped ones that didn’t. There will probably be more refinement, but this process is really helping me.

So, What Are the Benefits

Here is a list of benefits that I see, in no particular order:

  1. Forces you to break problem into manageable pieces (which usually simplifies it for you).
  2. Helps prevent inadvertent procrastination because task seems daunting.
  3. Encourages productivity with fast feedback and “wins”.
  4. Prevents you from forgetting things.
  5. Extrapolated estimation is easier since you’re tracking your work at a more granular level.
  6. Helps you explain sources of complexity later if someone needs to know why you were delayed.
  7. Mitigates interruptions (not as much “alright, what on Earth was I doing?”)

Your mileage may vary here, and you might have a better process for all I know (and if you do, please share it!). But I’ve found this to be helpful enough to me that I thought I’d throw it out there in case it helped anyone else too.

By

Building Weird Light Switches – Out and Ref

Gut Reaction

I was talking with someone about possible approaches to an API the other day. He asked me if I’d favor a method that took a parameter and had a return value or if I’d prefer a method that was void and took two parameters but with one as a ref parameter. My immediate, knee-jerk response was “I don’t like ref parameters one bit, so I’d prefer the former.” He looked at me with a bit of surprise and then kind of shrugged as if to say, “whatever, weirdo” and went with former. To him it was six or half a dozen.

This made me wonder whether I’m being dogmatic and rigid in the way I approach software, so I spent some time thinking and reading about ref parameters and their cousin, the out parameter. For those of you not acquainted with C#, both of these are ways of passing parameters into methods with the main difference being whether the called method must change the value passed in (out) or whether it can optionally do so (ref). Another way of thinking of this is that you would use out when the initial value of the parameter does not matter and ref when it does.

Here is what the syntax looks like:

int myValue;
//This wouldn't compile, so don't do it: AddOneToValue(ref myValue);

SetValueToTwelve(out myValue);
Console.WriteLine(myValue);

AddOneToValue(ref myValue);
Console.WriteLine(myValue);

In this case, you will see printed 12 and then 13 on the next line, assuming the methods do what they say they will. (This gets even screwier if you’re a Java programmer, in which case you need to create a wrapper class or use int[] or something since ints are immutable and parameters are always passed by value even if objects are accessed on the heap by reference).

What Does the Internet Think?

Usually when I react strongly and then want to justify my reaction, I go see what others have to say, preferably those I respect. The estimable Jon Skeet has this to say:

Basically out parameters are usually a sign that you want to effectively return two results from a method. That’s usually a code smell – but there are some cases (most notably with the TryXXX pattern) where you genuinely want to return two pieces of information for good reasons and it doesn’t make much sense to encapsulate them together.

In other words, avoid out/ref where you can do so easily, but don’t go massively out of your way to avoid them.

In the answer below his, Ryan Lanciaux raises an interesting point when he says that “[out/ref parameters] basically add side-effects to your code and could be a nightmare when it comes to debugging.”

So, two take-aways here are that having a method return two distinct results is a code smell and that method side effects tend to be a problem. On the flip-side of the argument mainly seems to be somewhat of a pragmatic, duct-tape programming kind of argument that sometimes the purist approach just isn’t worth the effort and potential awkwardness. The iconic example of using out parameters is the T.TryParse(string, out t) methods in C# (which I really don’t like, but I’m trying to be suspend my bias for the sake of investigation).

Next up, here’s what, well, MSDN has to say in explanation of a static analysis design warning that they raise entitled “Avoid out parameters”:

Passing types by reference (using out or ref) requires experience with pointers, understanding how value types and reference types differ, and handling methods with multiple return values. Also, the difference between out and ref parameters is not widely understood.

Although return values are commonplace and heavily used, the correct application of out and ref parameters requires intermediate design and coding skills. Library architects who design for a general audience should not expect users to master working with out or ref parameters.

There’s a certain irony to this, but I definitely understand the point. The irony is that the same outfit that put these features into the language raises a warning telling you not to use them. Don’t get me wrong — I understand that this is akin to making computers that can be taken apart while warning users not to do so, since many of them aren’t really qualified — but the irony amuses me nonetheless. It’s also interesting that MSDN seems to think that pointers and reference vs value are “intermediate” language features. Perhaps the fact that I cut my teeth on C and C++ as a wee programmer is showing, but… seriously? Intermediate?

At any rate, the consensus on the subject that I’ve seen at these and a variety of other blogs and stack overflow posts seems to be that out/ref parameters are generally to be avoided… except when they’re sort of unavoidable, either because of interop concerns or because you really want (need?) a function that returns two or more things.

Do One Thing

But isn’t a function that does two things a violation of the Single Responsibility Principle of SOLID fame, applied at the method level? And aren’t out/ref parameters, pretty much by definition, side effects that constitute violations of Command-Query Separation, a paradigm in which methods that mutate state (commands) are separated from methods that retrieve information (queries) and ne’er the twain shall meet? I mean, any method that ‘returns’ two values is, well, doing at least two things and any method that mutates object state and kicks back a ref/out parameter is serving as command and query.

But what about methods like the obtuse ones above, SetValueToTwelve() and AddOneToValue()? Those are void methods that mutate only the out/ref parameter. They could be made static and rewritten as int Return12() and int AddOneToValue(int value) without altering their purpose or effect. So, they’re not really violating SRP or CQS, right? They’re just slightly more awkward versions of familiar APIs.

But that really hits home with me. Why do I want something that’s either slightly more awkward or a violation of some very helpful clean coding practices? I mean, we’re not really shooting for the moon there are we, if something is at best somewhat awkward and at worst a code smell. Why do it at all? Methods should pick one thing and do it (or return it) and should do it as non-awkwardly as possible.

What If Light Switches Worked This Way?

I like to think of our task as programmers in terms of abstractions (in case you hadn’t caught my series on abstractions, feel free to click that tag and see me harp on it in a lot of posts). One easy abstraction for me to relate to the world of programming is turning a light switch on and off. This is a classic case of a class Light that exposes a command SetOnValue(bool) and exposes a readonly property, bool IsOn. So, as in the real world, I move the switch up or down and then separately observe the results. (Let’s ignore for argument sake that it might be better to model “Switch” and “Light” as separate entities).

This is a great example of Command-Query Separation. Toggling the light on or off is a command, and looking at the light to see whether or not it’s on is a query. But, let’s blur the lines for a moment and rewrite this so that there is no readonly “IsOn” property. Instead, SetOnValue(value) will return a boolean indicating whether the light is on or not. So now, we have a switch that also acts as the thing that tells us whether or not it’s on — our wall switch also just became a light. Now, when you toggle the switch, the switch itself glows to give you feedback. Weird.

But, it gets weirder. Instead of having our SetOnValue() function return a bool, let’s feed it a ref parameter. On the way in, we’ll indicate the value we want, and on the way out, it will indicate the value that we’re going to get. In terms of modeling the real world, ref parameters are kind of mind-blowing. You hand some external thing a piece of yourself and it alters that for you. So now, we have a light switch that I flip on with my hand, and indicates the success of that operation by modifying my hand – let’s say burning it. So, I flip the switch on, and if it works, the blazing hot bulb in the switch burns my hand (but gives off no light, since the burn is how I know it’s working). So, there you have it – a strange path from a light switch that turns on lights to one that simply injures me.

I realize that this metaphor is a touch strained, but here’s the thing: ref and out parameters are weird and counter-intuitive. It’s hard for them not to strain a metaphor that they’re involved in. Anything I’m handed in life could be represented conceptually as thing or a collection of things. Any action I take in life could be represented as a void method with 0 or more parameters. But what is a ref paramter? Where in life do I take something, set it to a way I want it, give it to something, and then have it given back to me differently? Maybe as part of an assembly line or in some weird, Rube-Goldberg kind of process, but this is hardly how normal interactions are modeled.

Ref and out are leaky abstractions in terms of code readability. They reek of code. Code involving these things doesn’t read like simple, well written prose or a nice story — it reads like a procedural construct such as the withholding worksheet for your paycheck deductions. So, like dealing with the IRS, why don’t we avoid that if we can? And, I think you’d be pretty hard-pressed to argue that you can’t.

By

A Developer Journal – Genius or Neurosis?

Many moons ago in my first role as a developer, I had very little real work to do for the first month or so on the job, so I occupied myself with poking around the company intranet, jotting down acronyms, figuring out who was responsible for what, and documenting all of this in spreadsheets and word documents. After a while, I setup a mediawiki installation and started making actual wiki pages out of all of these thoughts. Some time (and employers) after that, this practice caught on an bit and I found myself in a position where others started using and at least getting some value out of the wikis.

For the last couple of years now, I’ve also been blogging, and before that I was in a grad program where I wrote term papers, research papers, etc. Both of these activities are a bit more focused than knowledge dumps on a wiki, but they are also forms of chronicling my experiences. So, long story short, for the entirety of my career, I’ve been heavily documenting pretty much everything I do.

When I moved into my house, I found a bunch of memorabilia and personal keepsakes stuffed in the attic. In an attempt to figure out who they belonged to, I read through some journals that were there and found that they consisted of incredibly mundane chronicling of days – what the weather was like, time awake and asleep, grocery trips, etc. It is my hope that my own chronicling of my developer life is not quite as banal as this, but even if it is, c’est la vie I suppose. And who knows, perhaps the author of those journals needed this information for some purpose I couldn’t discern (tracking a medical condition, staying organized and focused, etc).

In honor of this mystery person in my attic and my own natural tendency over the course of time toward more and more documentation, I’ve decided to start my own “developer” journal and I’ve logged my first entries this week. The journal is just a Word document at the moment, so I’m getting back to basics from my previous ascent through Excel, Mediawiki and WordPress, but I think this is good. All of those recording forms have a tendency toward hierarchical or formal organization that I don’t really want here. This is like me jotting notes during meetings in a notebook, but with less “action item: give Bill the TPS reports” and more “I just spent an hour trying to figure out why my CSS file was triggering an error and it turned out to be unrelated problem X in reality”.

Here’s what I do so far. I spend a sentence or two describing what I worked on during various time windows throughout the day or if I switch tasks. Given that I do work where clients are billed for my time, it makes a lot of sense to document that for later when I’m filling out more formal accountings of my work (though I mainly use Grindstone for this because of its precision and UI, it’s also kind of nice to have it “backed up” in narrative form for context).

In addition to that bit of context, I make notes any time someone helps me with something, introduces me to something new, etc. After all, there’s nothing worse than when you ask someone how to do X, get distracted for a few minutes, go to do X, and realize you need to ask again. I try to avoid looking like an idiot whenever possible, even if it isn’t always easy. So assists, notes, code review suggestions, etc go in here too.

And finally, I have two other things that I do. In green italics, I insert “lessons learned”. This is something like “Lesson Learned: if you compile a WPF project in VS 2010 with a XAML file focused in the XAML editor, you’ll sometimes get spurious compiler errors.” So, this is a more crystallized form of notes in that it focuses on things that I’ll probably want to remember later. The other thing is concerns/observations/suggestions, and that gets orange italics. This is things like “I see a lot of duplication here in this code and that’s a code smell, but I don’t yet have enough context to speak authoritatively.” The orange will function as a way for me to keep track of things that I think could be improved (previously, I’ve always kept a spreadsheet somewhere called “suggested refactorings” or something like that. I color code these things because I feel like at some point later I may want to assemble them into a list.

So here’s my thinking with this. I like to write and document, as should be obvious from my blogging and other documenting activities. But, there’s a clear difference between putting together nice, composed presentations/posts/essays and simply recording every thought that makes its way into your brain. The developer journal is a way to get the best of both worlds. I can jot stuff down that I’m not sure but I think might be important or that I might want to remember later, but without boring people in a wiki/blog/etc if it turns out not to matter. I guess you could say I’m keeping the journal so that I can remember more of what I think while also applying a better filter.

Does anyone else do anything like this? If not (or if so), does this seem like a good idea, or does this just seem neurotic and weird? Would you do something like this? Please feel free to weigh in below in the comments.

By

Constructor Overloads: Know When to Say When

Paralysis By Options

Do you ever find yourself in a situation where some API or another requires you to instantiate an object? (If you’re reading this blog, the answer is probably “yes”). What do you usually do at this point? Instantiate it, compile, and make sure you’re good before poking around to see what your new object has to offer, usually in the form of auto-complete/intellisense? I think that’s what most would do. Word DOC APIs and other such things are all well and good as a backup plan, but let’s get serious – you want to play with the object and read the instructions only if you can’t figure out what to do. And the last thing you want to do is go reading the code of that class or, worse still, hunt down the guy that wrote it.

But, what about those times that the instantiation gets a little sidetracked? You go to instantiate the object and it’s like wandering into a Baskin Robbins knowing only that you vaguely feel like ice cream. So many flavors to choose from, but which is the right one?

In the picture above, I’ve decided I want an Aquarium object, and Intellisense informs me that there are no less than 11 ways that I can make this happen. That’s right, 11. My immediate, gut reaction to this information is to go off to implement the “AdoptADog” method instead and put this nonsense off until later.

But Aren’t More Choices Better?

With constructors, no, not really. I’ve talked before about the problem with bloated constructors and my opinion that a constructor should do nothing but ensure that the object initializes with class level variants established. With that in mind, either some of these overloads are doing more than is necessary or else some of them fail to meet this basic criteria. The former is pointless speculative coding and the latter means that your objects can be instantiated in states that are not valid. Either one of these is a problem.

I believe there is a tendency, especially if you don’t practice TDD or even write unit tests at all, to go off on tangents about how developers may want to instantiate objects. Maybe developer X will want to instantiate an aquarium with all defaults whereas developer Y will want to specify how many gallons it holds and how many fish are in it. Maybe developer Z just wants to initialize with the kind of rocks that go in the bottom or the kind of light that shines on top. Maybe everyone wants to initialize specifying salt or fresh water. Let’s think of every combination of things anyone may want to do to this object and offer them all up as constructor overloads, right?

But you know what? That’s what the public API is for with accessors and mutators. Everyone can do it that way. Save the constructor for things without which the aquarium makes no sense (e.g. capacity) and let everyone call a property setter or a mutator for the rest. C# even has some syntactic sugar for just this occasion.

If you add in a bunch of overloads, you may think that you’re being helpful, but you’re really just muddying the waters and paralyzing your clients with options. I may want to instantiate an aquarium and use it to hold a bunch of dirt from my back yard — so why I am I being offered all of these options about fish and water and aquarium plants and plastic divers? I don’t care about any of that. But, I’ll hesitate to omit it because for all I know I should instantiate the object with those things. I mean, with all of those overloads, some are probably vestigial or at least less frequently used. I don’t want to use something that might be deprecated or untested and nobody wants to maintain a bunch of methods that may never even be used.

In the end, what I’ll wind up doing is digging out the word document that describes this thing or going to the developer who wrote it and asking which one to use. And that sucks. If you offer me only one option — the minimal constructor that establishes the invariants and forces any critical dependencies on the client — I’ll use that option and go on my merry way. There will be nothing to think about and certainly nothing to read word documents or send emails about. And that is the essence of providing usable code and good abstractions.

(And incidentally, since Visual Studio 2010, C# has really taken away any good excuse for a lot of overloads with optional/default parameters).