DaedTech

Stories about Software

By

SQL Queries and String.Join()

Over the last few weeks, I’ve found myself working within a framework where I’m writing a lot of SQL. Most specifically, in the code I’m writing a lot of WHERE clauses related to optional user search parameters. As a simple example, consider a search over “customer” where a user could filter by a part of a customer name or by a selectable customer type or simply list all customers. This creates a situation where I can have a where clause with 0, 1, or 2 entries in it depending on what the user wants to do.

The consequence of this that my where clause may be blank, it may have a clause or it may have two clauses with an AND joining them. The most basic (naive) way to handle this is to check for each control/clause whether the user has entered something and if so, to append “{Clause} AND ” to a string builder. Then, you snip off the last 5 characters to take care of the spurious “AND” that got appended. I think we’ve all seen this sort of thing before in some form or another.

But then, I got to thinking a bit, and realized that the problem I was facing here was really that I would have n clauses and would want n – 1 ANDs (except the special case of zero, where I would want zero ANDs). A clause is just a string and the ” AND ” is essentially a delimiter, so this is really a problem of having a collection of strings and a delimiter and wanting to jam them together with it. What I want is the opposite of String.Split().

And, as it turns out, the opposite of Split() is a static method on the string class called String.Join(), which takes an array of strings and a delimiter and does exactly what I need. In this fashion I can add clauses to an object as strings and then query the object for a well-formed WHERE clause. In its simplest incarnation, it would look like this:

public class WhereBuilder
{
    private readonly List _clauses = new List();

    public void Add(string clause)
    {
        _clauses.Add(clause);
    }

    public string GetFullWhereText()
    {
        return String.Join(" AND ", _clauses.ToArray());
    }
}

You keep track of your various sub-clauses of the where in a list, and then join them together on the fly, when requested by consumer code. If you wanted to allow OR instead of AND, that’s pretty simple to support simultaneously:

public class WhereBuilder
{
    private readonly List _clauses = new List();

    public void Add(string clause)
    {
        _clauses.Add(clause);
    }

    public string GetConjunctionClause()
    {
        return Join(" AND ");
    }

    public string GetDisjunctionClause()
    {
        return Join(" OR ");
    }

    private string Join(string separator)
    {
        return String.Join(separator, _clauses.ToArray());
    }
}

Of course, this is handy for constructing clauses that all have the same operator only, and it doesn’t do anything about getting rid of the annoyance of monotonously specifying operators in side of the various clauses, but my purpose here was to highlight the handiness of String.Join() for those who hadn’t seen it before.

Stay tuned if you’re interested in a more sophisticated where clause builder — I’ve been playing with that a bit in my spare time and will post back here with it if it gets interesting.

By

Multitasking that Actually Works – Suggestions Requested

There are a lot of ways in which you can do two or more things at once and succeed only in doing them very poorly, so I’m always in search of a way to do multiple things at once but to actually get value out of both things. As a programmer with an ever-longer commute, a tendency to work well over 40 hours per week, and a wide range of interests, my time is at a premium.

Two things that have come to be indispensable for me are listening to developer podcasts (.NET Rocks, Deep Fried Bytes, etc) while I drive and watching Pluralsight videos while I run on machines at the gym. Podcasts are made for audio consumption and I think probably invented more or less with commutes in mind, so this is kind of a no-brainer fit that infuses some learning and professional interest into otherwise dead time (although I might also start listening to fiction audio books, potentially including classic literature).

Watching Pluralsight while jogging is made possible with the advent of the nice smartphone and/or tablet, but is a bit interesting nonetheless. I find it works best on elliptical machines (pictured on the left) where I’m not bouncing much or making a ton of noise and can lay my phone sideways and facing me. I don’t think this is a workable strategy for jogging outdoors or even on a treadmill, so having a gym at my office with ellipticals is kind of essential for this.

These two productive examples of multi-tasking have inspired me to try to think of other ways to maximize my time. There are some important criteria, however. The multi-tasking must not detract significantly from either task so “catching up on sleep at work” and “watching TV while listening to the radio” don’t make the cut. Additionally, at least one task must be non-trivial, so “avoiding bad music while sleeping” also does not make the cut. And, finally, I’m not interested in tasks that depend on something being inefficient, so “catching up on my RSS reader while waiting for code to compile” is no good since what I ought to be doing is figuring out a way not to be blocked by excessive compile time (I realize one could make a philosophical argument about my commute being inefficient, but I’m asking for some non-rigorous leeway for case by case application here).

This actually isn’t trivial. Most tasks that are worth doing require the lion’s share of your attention and juggling two or more often ensures that you do a haphazard job at all of them, and my life already seems sort of hyper-optimized. I work through lunch, I don’t sleep all that much, and double up wherever I can. So, additional ways to make this happen are real gems.

Anyone have additional suggestions or things that they do to make themselves more efficient? Your feedback is definitely welcome and solicited in comment form!

By

Why the Statement “I Don’t Source Control My Unit Tests” Makes Me Happy

An Extraordinary Claim

I was talking to a friend the other day, and he relayed a conversation that he’d had with a co-worker. They were discussing writing unit tests and the co-worker claimed that he writes unit tests regularly, but discards them and never checks them into source control. I was incredulous at this, so I asked what the rationale was, to which my friend cited his co-worker’s claims (paraphrased) that “[unit tests] are fragile, they take too long to maintain, they prohibit adapting to new requirements, errors are caused by user change requests not incorrect code, and they’re hard to understand.” (emphasis mine, and I’ll return to this later).

This struck me as so bizarre — so preposterous — that I immediately assumed I was out of my league and had missed an industry sea-change. I remember a blog post by Scott Hanselman entitled “I’m A Phony. Are You?” and I felt exactly what he meant. Clearly, I’d been exposed. I was not only wrong about unit tests, but I was 180 degrees, polar opposite, comically wrong. I had wasted all of this time and effort with TDD, unit test maintenance, unit tests in the build, etc when what I really should have been doing was writing these things once and then tossing them since their benefit is outweighed by their cost.

Given that I don’t have direct access to this co-worker, I took to the internet to see if I could locate the landmark post or paper that had turned the tide on this subject and no doubt a whole host of other posts and papers following suit. So, I googled “Don’t Check In Unit Tests”. Your mileage may vary because I was logged into gmail, but I saw old posts by Steve Sanderson and Jeff Atwood encouraging people to write tests, a few snarky posts about not testing, and other sorts of things you might expect. I had missed it. I tried “Don’t version control unit tests” and saw similar results. A few variants on the same and similar results too. Apparently, most of the internet was still thinking inside of the box that the co-worker had escaped with his bold new vision.

Finally, I googled “Should I source control unit tests?” and found one link from programmer’s stack exchange and another from stackoverflow. In a quick and admittedly very rough count of people who had answered and/or voted one way or the other, the vote appeared to be about 200 to 0 for source controlling versus leaving out, respectively. The answer that most succinctly seemed to summarize the unanimous consensus was Greg Whitfield’s, which started with:

Indeed yes. How could anyone ever think otherwise?

Hmmm…

There’s Really No Argument

With my confidence somewhat restored, I started thinking of all the reasons that source controlling unit tests makes sense:

  1. Provides living ‘documentation’ of intentions of class author.
  2. Prevents inadvertent regressions during changes.
  3. Allows fearless refactoring of your code and anyone else’s.
  4. Incorporation into the build for metric and verification purposes.
  5. Testing state at the time of tagged/branched versions can be re-created.
  6. (As with source controlling anything) Hard drive or other hardware failure does not result in lost work.

I could probably go on, but I’m not going to bother. Why? Because I don’t think that the rationale are actually rationale for not checking in unit tests. I think it’s more likely that this person who “writes tests but doesn’t actually check them in” might have had a Canadian girlfriend as a teenager. In other words, I sort of suspect that these “disposable” unit tests, whose existence can neither be confirmed nor denied, may not necessarily actually exist and so the rationale for not checking them in becomes, well, irrelevant.

And, look at the rationale. The first three clauses (fragile, too long to maintain, prohibit new work) seem to be true only for someone who writes really bad unit tests (i.e. someone without much practice who may, I dunno, be discouraged during early attempts). Because while it’s true that unit tests, like any other code, constitute a maintenance liability, they need not be even remotely prohibitive, especially in a nicely decoupled code base. The fourth clause (errors are everyone’s fault but the programmer) is prima facie absurd, and the fifth and most interesting clause seems to be an unwitting admission of the problem – a difficulty understanding how to unit test. This is probably the most important reason for “not checking in” or, more commonly and accurately, not writing unit tests at all — they’re hard when you don’t know how to write them and haven’t practiced it.

So Why the Happy?

You’re probably now wondering about the post title and why any of this would make me happy. It’s simply, really. I’m happy that people feel the need to make excuses (or phantom unit tests) for why they don’t automate verification of their work. This indicates real progress toward what Uncle Bob Martin describes in a talk that he does called “Demanding Software Professionalism” and alludes to in this post. In his talk, Bob suggests that software development is a nascent field compared to others like medicine and accounting and that we’re only just starting to define what it means to be a software professional. He proposes automated verification in the form of TDD as the equivalent of hand-washing or double-entry bookkeeping respectively in these fields and thinks that someday we’ll look back on not practicing TDD and clean coding the way we now look back on surgeons that refused to wash their hands prior to surgery.

But having a good idea and even demonstrating that it works isn’t enough. Just ask Ignaz Semmelweis who initially discovered, and empirically demonstrated that hand-washing reduced surgery mortality rates, only to be ridiculed and dismissed by his peers in the face of cold-hard evidence. It wasn’t until later, after Semmelweis had been committed to an insane asylum and died that his observations and hypothesis got more backers (Lister, Pasteur, et al) and a better marketing campaign (an actual theoretical framework of explanation called “Germ Theory”). In Semmelweis’s time, a surgeon could just call him a crank and refuse to wash his hands before surgery. Decades later, he would have to say, “dude, no, I totally already washed them when you weren’t looking” if he was feeling lazy. You can even ask his Canadian girlfriend.

At the end of the day, I’m happy because the marketing for TDD and clean coding practice must be gaining traction and acceptance if people feel as though they have to make excuses for doing it. I try to be a glass half full kind of guy, and I think that’s the glass half full outlook. I mean one person not wanting to automate tests doesn’t really matter in the scheme of things at the moment since there is no shortage of people who also don’t want to, but it’s good to see people making excuses/stories for not wanting to rather than just saying “pff, waste of time.”

(And, for what it’s worth, I do acknowledge the remote possibility that someone actually does write and discard unit tests on a regular and rigorous basis. I just don’t think it’s particularly likely.)

By

Improve Productivity with the Humble ToDo List

Micro-Scrum

A week or two ago, I read Stephen Walther’s blog post “Scrum in 5 Minutes” and reading his description of the backlog reminded me of a practice that I’ve been getting a lot of mileage out of lately. My practice, inspired by Kent Beck in his book Test Driven Development By Example, is to keep a simple To-Do list of small development tasks as I work.

The parallels here are rather striking if you omit the portions of Scrum that have to do with collaboration and those types of logistics. When starting on a task, I think of the first few things that I’ll need to do and those go on the list. I prioritize them by putting the most important (usually the ones that will block progress on anything else) at the top, but I don’t really spend a lot of time on this, opting to revise or refine it if and when I need to. Any new item on the list is yellow and when done, I turn it green.

There are no intermediate states and there is no going back. If I have something like “create mortgage calculator class” and I turn it green when I’m happy with the class, I don’t later turn that back yellow or some other color if the mortgage calculator needs to change. This instead becomes a new task. Generally speaking, I try to limit the number of yellow tasks I have (in kind of a nod to Kanban’s WIP limits), though I don’t have a hard-fast rule for this. I just find that my focus gets cluttered when there are too many outstanding tasks.

If I find that a yellow item is taking me a long time, I will delete that item and replace it with several components of it. The aim is always to have my list be a series of tasks that take 5-15 minutes to complete (though they can be less). Items are added both methodically to complete the task and as reminders of things that occur to me when I’m doing something else. For example, if I fire up the application to verify a piece of integration that involves a series of steps and I notice that a button is the wrong color, I won’t drop everything and sidetrack myself by changing the button. I’ll add it to my queue; I don’t want to worry about this now, but I don’t want to forget about it.

I never actually decided on any of these ‘rules’. They all kind of evolved through some evolutionary process algorithm where I kept practices that seemed to help me and dropped ones that didn’t. There will probably be more refinement, but this process is really helping me.

So, What Are the Benefits

Here is a list of benefits that I see, in no particular order:

  1. Forces you to break problem into manageable pieces (which usually simplifies it for you).
  2. Helps prevent inadvertent procrastination because task seems daunting.
  3. Encourages productivity with fast feedback and “wins”.
  4. Prevents you from forgetting things.
  5. Extrapolated estimation is easier since you’re tracking your work at a more granular level.
  6. Helps you explain sources of complexity later if someone needs to know why you were delayed.
  7. Mitigates interruptions (not as much “alright, what on Earth was I doing?”)

Your mileage may vary here, and you might have a better process for all I know (and if you do, please share it!). But I’ve found this to be helpful enough to me that I thought I’d throw it out there in case it helped anyone else too.

By

Building Weird Light Switches – Out and Ref

Gut Reaction

I was talking with someone about possible approaches to an API the other day. He asked me if I’d favor a method that took a parameter and had a return value or if I’d prefer a method that was void and took two parameters but with one as a ref parameter. My immediate, knee-jerk response was “I don’t like ref parameters one bit, so I’d prefer the former.” He looked at me with a bit of surprise and then kind of shrugged as if to say, “whatever, weirdo” and went with former. To him it was six or half a dozen.

This made me wonder whether I’m being dogmatic and rigid in the way I approach software, so I spent some time thinking and reading about ref parameters and their cousin, the out parameter. For those of you not acquainted with C#, both of these are ways of passing parameters into methods with the main difference being whether the called method must change the value passed in (out) or whether it can optionally do so (ref). Another way of thinking of this is that you would use out when the initial value of the parameter does not matter and ref when it does.

Here is what the syntax looks like:

int myValue;
//This wouldn't compile, so don't do it: AddOneToValue(ref myValue);

SetValueToTwelve(out myValue);
Console.WriteLine(myValue);

AddOneToValue(ref myValue);
Console.WriteLine(myValue);

In this case, you will see printed 12 and then 13 on the next line, assuming the methods do what they say they will. (This gets even screwier if you’re a Java programmer, in which case you need to create a wrapper class or use int[] or something since ints are immutable and parameters are always passed by value even if objects are accessed on the heap by reference).

What Does the Internet Think?

Usually when I react strongly and then want to justify my reaction, I go see what others have to say, preferably those I respect. The estimable Jon Skeet has this to say:

Basically out parameters are usually a sign that you want to effectively return two results from a method. That’s usually a code smell – but there are some cases (most notably with the TryXXX pattern) where you genuinely want to return two pieces of information for good reasons and it doesn’t make much sense to encapsulate them together.

In other words, avoid out/ref where you can do so easily, but don’t go massively out of your way to avoid them.

In the answer below his, Ryan Lanciaux raises an interesting point when he says that “[out/ref parameters] basically add side-effects to your code and could be a nightmare when it comes to debugging.”

So, two take-aways here are that having a method return two distinct results is a code smell and that method side effects tend to be a problem. On the flip-side of the argument mainly seems to be somewhat of a pragmatic, duct-tape programming kind of argument that sometimes the purist approach just isn’t worth the effort and potential awkwardness. The iconic example of using out parameters is the T.TryParse(string, out t) methods in C# (which I really don’t like, but I’m trying to be suspend my bias for the sake of investigation).

Next up, here’s what, well, MSDN has to say in explanation of a static analysis design warning that they raise entitled “Avoid out parameters”:

Passing types by reference (using out or ref) requires experience with pointers, understanding how value types and reference types differ, and handling methods with multiple return values. Also, the difference between out and ref parameters is not widely understood.

Although return values are commonplace and heavily used, the correct application of out and ref parameters requires intermediate design and coding skills. Library architects who design for a general audience should not expect users to master working with out or ref parameters.

There’s a certain irony to this, but I definitely understand the point. The irony is that the same outfit that put these features into the language raises a warning telling you not to use them. Don’t get me wrong — I understand that this is akin to making computers that can be taken apart while warning users not to do so, since many of them aren’t really qualified — but the irony amuses me nonetheless. It’s also interesting that MSDN seems to think that pointers and reference vs value are “intermediate” language features. Perhaps the fact that I cut my teeth on C and C++ as a wee programmer is showing, but… seriously? Intermediate?

At any rate, the consensus on the subject that I’ve seen at these and a variety of other blogs and stack overflow posts seems to be that out/ref parameters are generally to be avoided… except when they’re sort of unavoidable, either because of interop concerns or because you really want (need?) a function that returns two or more things.

Do One Thing

But isn’t a function that does two things a violation of the Single Responsibility Principle of SOLID fame, applied at the method level? And aren’t out/ref parameters, pretty much by definition, side effects that constitute violations of Command-Query Separation, a paradigm in which methods that mutate state (commands) are separated from methods that retrieve information (queries) and ne’er the twain shall meet? I mean, any method that ‘returns’ two values is, well, doing at least two things and any method that mutates object state and kicks back a ref/out parameter is serving as command and query.

But what about methods like the obtuse ones above, SetValueToTwelve() and AddOneToValue()? Those are void methods that mutate only the out/ref parameter. They could be made static and rewritten as int Return12() and int AddOneToValue(int value) without altering their purpose or effect. So, they’re not really violating SRP or CQS, right? They’re just slightly more awkward versions of familiar APIs.

But that really hits home with me. Why do I want something that’s either slightly more awkward or a violation of some very helpful clean coding practices? I mean, we’re not really shooting for the moon there are we, if something is at best somewhat awkward and at worst a code smell. Why do it at all? Methods should pick one thing and do it (or return it) and should do it as non-awkwardly as possible.

What If Light Switches Worked This Way?

I like to think of our task as programmers in terms of abstractions (in case you hadn’t caught my series on abstractions, feel free to click that tag and see me harp on it in a lot of posts). One easy abstraction for me to relate to the world of programming is turning a light switch on and off. This is a classic case of a class Light that exposes a command SetOnValue(bool) and exposes a readonly property, bool IsOn. So, as in the real world, I move the switch up or down and then separately observe the results. (Let’s ignore for argument sake that it might be better to model “Switch” and “Light” as separate entities).

This is a great example of Command-Query Separation. Toggling the light on or off is a command, and looking at the light to see whether or not it’s on is a query. But, let’s blur the lines for a moment and rewrite this so that there is no readonly “IsOn” property. Instead, SetOnValue(value) will return a boolean indicating whether the light is on or not. So now, we have a switch that also acts as the thing that tells us whether or not it’s on — our wall switch also just became a light. Now, when you toggle the switch, the switch itself glows to give you feedback. Weird.

But, it gets weirder. Instead of having our SetOnValue() function return a bool, let’s feed it a ref parameter. On the way in, we’ll indicate the value we want, and on the way out, it will indicate the value that we’re going to get. In terms of modeling the real world, ref parameters are kind of mind-blowing. You hand some external thing a piece of yourself and it alters that for you. So now, we have a light switch that I flip on with my hand, and indicates the success of that operation by modifying my hand – let’s say burning it. So, I flip the switch on, and if it works, the blazing hot bulb in the switch burns my hand (but gives off no light, since the burn is how I know it’s working). So, there you have it – a strange path from a light switch that turns on lights to one that simply injures me.

I realize that this metaphor is a touch strained, but here’s the thing: ref and out parameters are weird and counter-intuitive. It’s hard for them not to strain a metaphor that they’re involved in. Anything I’m handed in life could be represented conceptually as thing or a collection of things. Any action I take in life could be represented as a void method with 0 or more parameters. But what is a ref paramter? Where in life do I take something, set it to a way I want it, give it to something, and then have it given back to me differently? Maybe as part of an assembly line or in some weird, Rube-Goldberg kind of process, but this is hardly how normal interactions are modeled.

Ref and out are leaky abstractions in terms of code readability. They reek of code. Code involving these things doesn’t read like simple, well written prose or a nice story — it reads like a procedural construct such as the withholding worksheet for your paycheck deductions. So, like dealing with the IRS, why don’t we avoid that if we can? And, I think you’d be pretty hard-pressed to argue that you can’t.