DaedTech

Stories about Software

By

FluentSqlGenerator

A while back, I made a post about using string.Join() to construct SQL where clauses from collections of individual clauses. In that post, I alluded to playing with a “more sophisticated” where clause builder. I did just that here and there and decided to post the results to github. You can find at here if you want to check it out.

My implementation makes use of the Composite design pattern and the idea of well formed formula semantics in formal systems such as propositional and first order logic. The latter probably sounds a little stuffy and exposes my inner math geek, but that’s just a rigorous way of expressing the concept of building a statement from literals and basic operations on those literals. To pull it back yet another level of stuffiness, consider simple arithmetic, in which all of the following are valid expressions: “6”, “12 + 6”, “12 + (9 – 3)”. The first expression is an atomic literal and the second expression a binary operation. The third expression is interesting in that it shows that functions can have arguments that are literals or other expressions (if this still seems strange, think of these examples as “6”, “Add(12, 6)” and “Add(12, Subtract(9, 3))”.

Think of how this applies to the propositional semantics that make up SQL query where clauses. I can have “Column1 = 12” or I can have “Column1 = 12 AND Column2 = 13” or I can have “Column1 = 12 AND (Column2 = 13 OR Column3 = 4)”. When I want to model this concept in an object oriented sense, I need to represent the operators “AND” and “OR” as objects with two properties: left expression and right expression. I also need it to be possible that either of these properties is a “literal” of the form “col = val” or that it could be another expression as with the last example. Composite is thus a natural fit when you consider that these clauses are really expression trees, in a very real sense. So there is a “Component” base that’s abstract and then “Clause” and “Operation” objects that inherit from them and are fungible when constructing expressions.

This was the core of the implementation, but I also dressed it up a bit with some extension methods to support a discoverable, fluent interface, but optionally (I’m still very leery of this construct, but this seems like an appropriate and judicious use). Another nice feature, in my opinion is that it supports generic parameters so you don’t have massive overloads — you can set your columns equal to objects, strings, decimals, ints, etc. It makes heavy use of ToString() with these generic parameters, so use any type you please so long as what you want out of it is well represented by ToString().

A sample API is as follows:

var clause = Column.Named("Column1").IsEqualTo(123);
Console.WriteLine(clause);

clause = Column.Named("Column1").IsEqualTo(123).And(Column.Named("Column2").IsEqualTo("123456"));
Console.WriteLine(clause);

clause = Column.Named("Column1").IsOneOf(1, 2, 3).Or(Column.Named("Column2").IsGreaterThan(12));
Console.WriteLine(clause);

clause = Are.AnyOfTheseTrue(Column.Named("Column1").IsEqualTo(832), Column.Named("Column2").IsLessThan(25.30));
Console.WriteLine(clause);

clause = Are.AreAllOfTheseTrue(Column.Named("Column1").IsEqualTo(832), Column.Named("Column2").IsLessThan(25.30), Column.Named("Column3").IsOneOf("Current", "Valid"));
Console.WriteLine(clause);

Console.ReadLine();

Currently supported SQL operations include various comparison (equal, not equal, greater, less, etc) as well as “like” and “in()”. Expression operators “AND”, “OR” and “NOT” are supported. The utility is well covered by unit tests and a handful of integration tests too if you want to poke around but preserve functionality.

Feel free to download, use, fork, enhance, make fun of, etc, whatever. I’m not pretending this is a problem never before solved nor that this is the most elegant solution imaginable, but it was fun to write, code-kata style, and if someone can get some use out of it, great. If I wind up making significant modifications to it or extending it, I’ll post updates here as well as checking changes into github.

By

Casting is a Polymorphism Fail

Have you ever seen code that looked like the snippet here?

public class Menagerie
{
    private List _animals = new List();

    public void AddAnimal(Animal animal)
    {
        _animals.Add(animal);
    }

    public void MakeNoise()
    {
        foreach (var animal in _animals)
        {
            if (animal is Cat)
                ((Cat)animal).Meow();
            else if (animal is Dog)
                ((Dog)animal).Bark();
        }
    }
}

You probably have seen code like this, and I hope that it makes you sad. I know it makes me sad. It makes me sad because it’s clearly the result of a fundamental failure to understand (or at least implement) polymorphism. Code written like this follows an inheritance structure, but it completely misses the point of that structure, which is the ability to do this instead:

public class Menagerie
{
    private List _animals = new List();

    public void AddAnimal(Animal animal)
    {
        _animals.Add(animal);
    }

    public void MakeNoise()
    {
        foreach (var animal in _animals)
            animal.MakeNoise();
    }
}

What’s so great about this? Well, consider what happens if I want to add “Bird” or “Bear” to the mix. In the first example with casting, I have to add a class for my new animal, and then I have to crack open the menagerie class and add code to the MakeNoise() method that figures out how to tell my new animal to make noise. In the second example, I simply have to add the class and override the base class’s MakeNoise() method and Menagerie will ‘magically’ work without any source code changes. This is a powerful step toward the open/closed principle and the real spirit of polymorphism — the ability to add functionality to a system with a minimum amount of upheaval.

But what about more subtle instances of casting? Take the iconic:

public void HandleButtonClicked(object sender, EventArgs e)
{
    var button = (Button)sender;
    button.Content = "I was clicked!";
}

Is this a polymorphism failure? It can’t be, can it? I mean, this is the pattern for event subscription/handling laid out by Microsoft in the C# programming guide. Surely those guys know what they’re doing.

As a matter of fact, I firmly believe that they do know what they’re doing, but I also believe that this pattern was conceived of and created many moons ago, before the language had some of the constructs that it currently does (like generics and various frameworks) and followed some of the patterns that it currently does. I can’t claim with any authority that the designers of this pattern would ask for a mulligan knowing what they do now, but I can say that patterns like this, especially ones that become near-universal conventions, tend to build up quite a head of steam. That is to say, if we suddenly started writing even handlers with strongly typed senders, a lot of event producing code simply wouldn’t work with what we were doing.

So I contend that it is a polymorphism failure and that casting, in general, should be avoided as much as possible. However, I feel odd going against a Microsoft standard in a language designed by Microsoft. Let’s bring in an expert on the matter. Eric Lippert, principal developer on the C# compiler team, had this to say in a stack overflow post:

Both kinds of casts are red flags. The first kind of cast raises the question “why exactly is it that the developer knows something that the compiler doesn’t?” If you are in that situation then the better thing to do is usually to change the program so that the compiler does have a handle on reality. Then you don’t need the cast; the analysis is done at compile time.

The “first kind” of cast he’s referring to is one he defines earlier in his post as one where the developer “[knows] the runtime type of this expression but the compiler does not know it.” That is the kind that I’m discussing here, which is why I chose that specific portion of his post. In our case, the developer knows that “sender” is a button but the compiler does not know that. Eric’s point, and one with which I wholeheartedly agree, is “why doesn’t the compiler know it and why don’t we do our best to make that happen?” It just seems like a bad idea to run a reality deficit between yourself and the compiler as you go. I mean, I know that the sender is a button. You know the sender is a button. The method knows the sender is a button (if we take its name, containing “ButtonClicked” at face value). Maintainers know the sender is a button. Why does everyone know sender is a button except for the compiler, who has to be explicitly and awkwardly informed in spite of being the most knowledgeable and important party in this whole situation?

But I roll all of this into a broader point about a polymorphic approach in general. If we think of types as hierarchical (inheritance) or composed (interface implementation), then there’s some exact type that suits my needs. There may be more than one, but there will be a best one. When writing a method and accepting parameters, I should accept as general a type as possible without needing to cast so that I can be of the most service. When returning something, I should be as specific as possible to give clients the most options. But when I talk about “possible” I’m talking about not casting.

If I start casting, I introduce error possibilities, but I also necessarily introduce a situation where I’m treating an object as two different things in the same scope. This isn’t just jarring from a readability perspective — it’s a maintenance problem. Polymorphism allows me to care only about some public interface specification and not implementation details — as long as the thing I get has the public API I need, I don’t really care about any details. But as soon as I have to understand enough about an object to understand that it’s actually a different object masquerading as the one I want, polymorphism is right out the window and I suddenly depend on knowing the intricate relationship details of the class in question. Now I break not only if my direct collaborators change, but also if some inheritance hierarchy or interface hierarchy I’m not even aware of changes.

The reason I’m posting all of this isn’t to suggest that casting should never happen. Clearly sometimes it’s necessary, particularly if it’s forced on you by some API or framework. My hope though is that you’ll look at it with more suspicion — as a “red flag”, in the words of Eric Lippert. Are you casting because it’s forced on you by external factors, or are you casting to communicate with the compiler? Because if it’s the latter, there are other, better ways to achieve the desired effect that will leave your code more elegant, understandable, and maintainable.

By the way, if you liked this post and you're new here, check out this page as a good place to start for more content that you might enjoy.

By

Splitting Strings With Substrings

The String.Split() method in C# is probably something with which any C# developer is familiar.

string x = "Erik.Dietrich";
var tokens = x.Split('.');

Here, “tokens” will be an array of strings containing “Erik” and “Dietrich”. It’s not exactly earth shattering to tokenize a string in this fashion. And some incarnation or another of this predates .NET, C# and probably even my time on this planet.

It’s Actually Harder Than You’d Think to Split Strings Using Sub-Strings

But what about if we want to split over a string instead?

What about if we have “..” as a delimiter instead of ‘.’ and I want to split “Erik..Dietrich” in the same way? Probably an overload of String.Split() that takes a string instead of a char, right? Well, actually no. As it turns out, the API for string.Split() is pretty unintuitive.

First of all, that call to x.Split(‘.’) is not actually invoking Split(char), but rather Split(params char[]). (Notwisthanding the fact that this isn’t advertised in the MSDN page unless you drill into the individual method.)

So, calling x.split(‘.’) and x.Split(‘.’, ‘&’, ‘%’, ‘^’) are equally valid, syntax-wise in the case of “Erik.Dietrich” (and in this case, both will give me back my first and last name).

So, what one might expect is that there would be an overload Split(params[] string) to allow the same behavior as splitting over zero or more characters. Nope. Instead you have Split(string[] separator, StringSplitOptions options).

What’s Really Not Great about the Default Way to Split Strings with Sub-Strings

Two things suck about this.

  1. I have to specify some enum that I don’t care about in the first place and that has only two options, one of which is “none”. I mean, really? You can’t just assume “none” and let users specify a different case if they want with another overload?
  2. But what sucks even more about this is that params have to be the last argument in the parameter list, so that option is out the window. You no longer get that snazzy params syntax that the char version has, and now you have to actually awkwardly create a string array. So, here is the new syntax following the old. Note that the new syntax is pretty hideous.
string x = "Erik.Dietrich";
var tokens = x.Split('.');

string y = "Erik..Dietrich";
var newTokens = y.Split(new string[] { ".." }, StringSplitOptions.None)

This Gets a Lot Easier and Prettier using Regex.Split

I was getting ready to write something to hide this mess from myself as a client, when I stumbled across a better alternative than rolling my own extension method or string splitting class: Regex.Split(). Here’s how it works:

string x = "Erik..Dietrich"
var tokens = Regex.Split(x, "..");

No fuss, no muss, and exactly what String.Split() should do. Granted, the arguments to Regex.Split() are both single strings (so if you want to specify multiple delimiters, you’ll have to cook up a regex recipe) and it’s a static method, but it has the advantage of already existing in the framework and being a much, much cleaner API than x.Split().

Use in good health!

By the way, if you liked this post and you're new here, check out this page as a good place to start for more content that you might enjoy.

By

Unit Testing DateTime.Now Without Isolation

My friend Paul pointed something out to me the other day regarding my post about TDD even when you have to discard tests. I believe that this trick was taken from the book Working Effectively With Legacy Code by Michael Feathers (though I haven’t yet read this one, so I can’t be positive.

I was writing some TDD test surrounding the following production method:

public virtual void SeedWithYearsSince(DropDownList list, int year)
{
    for (int index = year; index <= DateTime.Now.Year; index++)
        list.Items.Add(new ListItem(index.ToString()));
}

and the problem I was having is that any tests that I write and check in will be good through the end of 2012 and essentially have an expiration date of Jan 1st, 2013.

What Paul pointed out is that I could refactor this to the following:

protected virtual int CurrentYear
{
    get
    {
        return DateTime.Now.Year;
    }
}

public virtual void SeedWithYearsSince(DropDownList list, int year)
{
    for (int index = year; index <= CurrentYear; index++)
          list.Items.Add(new ListItem(index.ToString()));
    
}

And, once I've done that, I can introduce the following class into my test class:

public class CalenderDropDownFillerExtension : CalendarDropdownFiller
{
    private int _currentYear;
    protected override int CurrentYear
    {
        get
        {
            return _currentYear;
        }
    }

    public CalenderDropDownFillerExtension(DateTimeFormatInfo formatInfo, int yearToUse) : base(formatInfo)
    {
        _currentYear = yearToUse;
    }
            
}

With all that in place, I can write a test that no longer expires:

[TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
public void Adds_Two_Items_When_Passed_2011()
{
    var filler = new CalenderDropDownFillerExtension(new DateTimeFormatInfo(), 2012);
    var list = new DropDownList();
    filler.SeedWithYearsSince(list, 2011);

    Assert.AreEqual(2, list.Items.Count);
}

In this test, I use the new class that requires me to specify the current year. It overrides the base class, which uses DateTime.Now in favor of the "current" year I've passed it, which has nothing to do with the non-deterministic quantity "Now". As a result, I can TDD 'til the cows come home and check everything in so that nobody accuses me of having a Canadian girlfriend. In other words, I get to have my cake and eat it too.

By

Regions Are a Code Smell

There’s a blogger named Iris Classon that writes a series of posts called “stupid questions”, where she essentially posts a discussion-fueling question once per day. Recently, I noticed one there called “Should I Use Regions in my Code?”, and the discussion that ensued seemed to be one of whether regions were helpful in organizing code or whether they constituted a code smell.

For those of you who are not C# developers, a C# region is a preprocessor directive that the Visual Studio IDE (and probably others) uses to provide entry points for collapsing and hiding code. Eclipse lets you collapse methods, and so does VS, but VS also gives you this way of pre-defining larger segments of code to collapse as well. Here is a before and after look:

(As an aside, if you use CodeRush, it makes the regions look much nicer when not collapsed — see image at the bottom of the post)

I have an odd position on this. I’ve gotten used to the practice because it’s simply been the coding standard on most projects that I’ve worked in the last few years, and so I find myself doing habitually even when working on my own stuff. But, I definitely think they’re a code smell. Actually, let me refine that. I think regions are more of a code deodorant or code cologne. When we get up in the morning, we shower and put on deodorant and maybe cologne/perfume before going about our daily business (most do, anyway). And not to be gross or cynical, but the reason that we do this is that we kind of expect that we’re going to naturally gravitate toward stinking throughout the day and we’re engaging in some preventative medicine.

This is how I view regions in C# code, in a sense. Making them a coding standard or best practice of sorts is like teaching your children (developers, in the metaphor) that not bathing is fine, so long as they religiously wear cologne. So, in the coding world, you’re saying to developers, “Put in your regions first so that I don’t have to look at your unwieldy methods, haphazard organization and gigantic classes once you inevitably write them.” You’re absolving them of the responsibility for keeping their code clean by providing and, in fact, mandating a way to make it look clean without being clean.

So how do I justify my hypocrisy on this subject of using them even while thinking that they tend to be problematic? Well, at the heart of it lies my striving for Clean Code, following SRP, small classes, and above all, TDD. When you practice TDD, it’s pretty hard to write bloated classes with lots of overloads, long methods and unnecessary state. TDD puts natural pressure on your code to stay lean, compact and focused in much the same way that regions remove that same pressure. It isn’t unusual for me to write classes and region them and to have the regions with their carriage returns before and after account for 20% of the lines of code in my class. To go back to the hygiene metaphor, I’m like someone who showers quite often and doesn’t sweat, but still wears deodorant and/or cologne. I’m engaging in a preventative measure that’s largely pointless but does no harm.

In the end, I have no interest in railing against regions. I don’t think that people who make messes and use regions to hide them are going to stop making messes if you tell them to stop using regions. I also don’t think using regions is inherently problematic; it can be nice to be able to hide whole conceptual groups of methods that don’t interest you for the moment when looking at a class. But I definitely think it bears mentioning that from a maintainability perspective, regions do not make your 800 or 8000 line classes any less awful and smelly than they would be a in language without regions.