DaedTech

Stories about Software

By

Merging Done Right: Semantic Merge

There are few things in software development as surprisingly political as merging your code when there are conflicts.

A Tale of Merge Politics

Your first reaction to this is probably to think that I’m crazy, but seriously, think about what happens when your diff tool/source control combo tells you there’s a conflict. You peer at the conflict for a moment and then think, “alright, who did this?”

Was it Jim? Well, Jim’s kind of annoying and pretty new, so it’s probably fine just to blow his changes away and send him an email telling him to get the latest code and re-do his stuff.

Oh, wait, no, it looks like it was Janet. Uh oh.

She’s pretty sharp and a Principal so you’ll probably be both wrong and in trouble if you mess this up — better just revert your changes, get hers and rework your stuff.

Oh, on third look, it appears that it was Steve, and since Steve is your buddy, you’ll just go grab him and work through the conflict together.

Notice how none of this has anything to do with what the code should actually look like?

Standard Diff Tools are Bad at Telling You What Happened

Now, I’ll grant that this isn’t always the case; there are times when you can figure out what should be in the master copy of the source control.  But it’s pretty likely that you’ve sat staring at merge conflicts and thinking about people and not code.

Why is that?

Well, frankly because merge tools aren’t very good at telling you the story of what’s happened, and that’s why you need a human to come tell you the story. But which human, which story, and how interested you are in that interaction are all squarely the stuff of group dynamics and internal politics. Hopefully you get on well with your team and they’re happy to tell you the story.

But what if your tools could tell you that story?

What if, instead of saying, “Jim made text different on lines, 100, 124, 135-198, 220, 222-228,” your tooling said, “Jim moved a method, and deleted a few references to a field whereas you edited the method that he moved?”

Holy crap! You wouldn’t need to get Jim at all because you could just say, “oh, okay, I’ll do a merge where we do all of his stuff and then make my changes to that method he moved.”

Introducing Semantic Merge

I’ve been poking around with Roslyn and reading about it lately, and this led me to Semantic Merge. This is a diff tool that uses Roslyn, which means that it’s parsing your code into a syntax tree and actually reasoning about it as code, rather than text (or text with heuristics).

As such, it’s no mirage or trickery that it can say things like “oh, Jim moved the method but left it intact whereas you made some changes to it.” It makes perfect sense that it can do this.

Let’s take a look at this in action. I’m only showing you the tiniest hint of what’s possible, but I’d like to pick out a very simple example of where a traditional merge tool kind of chokes and Semantic Merge shines. It is, after all, a pay to play (although pretty affordable) tool, so a compelling case should be made.

The Old Way

Before you see how cool Semantic Merge is, let’s take a look at a typical diff scenario. I’ll do this using the Visual Studio compare tool that I use on a day to day basis.

And I’m calling this “the old way,” in spite of the fact that I fell in love with this as compared to the way it used to be in VS2010 and earlier. It’s actually pretty nice as far as diff tools go. I’m going to take a class and make a series of changes to it.

Here’s the before:

public class SomeClass
{
    private int _aNumber;
    private string _aWord;

    /// 
    /// Initializes a new instance of the SomeClass class.
    ///
    public SomeClass() 
    {
      _aNumber = 123; 
      _aWord = "Hello!"; 
    }

    public void PrintNumbers() 
    {
      for (int index = 0; index < _aNumber; index++) 
        Console.WriteLine(index); 
    } 

    public void PrintEvenNumbers() 
    { 
      for (int index = 0; index < _aNumber; index += 2) 
        Console.WriteLine(index); 
    }
   
    public void ChangeNumber(int number) 
    {
      if (number < 0) 
        throw new ArgumentException("number"); 

      _aNumber = number; 
    } 

    public void PrintTheWord() 
   { 
     Console.WriteLine(_aWord); 
   } 
   
   public void ChangeTheWord(string newWord) 
   {
     _aWord = newWord; 
   }
}

Now, what I’m going to do is swap the positions of PrintNumbers() and ChangeTheWord(), add some error checking to ChangeTheWord() and delete the comments above the constructor. Here’s the after:

public class SomeClass
{
    private int _aNumber;
    private string _aWord;

    public SomeClass()
    {
        _aNumber = 123;
        _aWord = "Hello!";
    }

    public void ChangeTheWord(string newWord)
    {
        if(string.IsNullOrEmpty(newWord))
            throw new ArgumentException("newWord");
        _aWord = newWord;
    }
        
    public void PrintEvenNumbers()
    {
        for (int index = 0; index < _aNumber; index += 2)
            Console.WriteLine(index);
    }

    public void ChangeNumber(int number)
    {
        if (number < 0)
            throw new ArgumentException("number");

        _aNumber = number;
    }

    public void PrintTheWord()
    {
        Console.WriteLine(_aWord);
    }

    public void PrintNumbers()
    {
        for (int index = 0; index < _aNumber; index++)
            Console.WriteLine(index);
    }
}

If I now want to compare these two files using the diff tool, here’s what I’m looking at:

StandardDiff

There’s a Better Way to Handle This

This is the point where I groan and mutter to myself because it annoys me that the tool is comparing the methods side by side as if I renamed one and completely altered its contents entirely.

I’m sure you can empathize. You’re muttering to yourself too and what you’re saying is, “you idiot tool, it’s obviously a completely different method.”

Well, here’s the same thing as summarized by Semantic Merge:

SemanticMergeDiff

It shows me that there are two types of differences here: moves and changes. I’ve moved the two methods PrintNumbers() and ChangeTheWord() and I’ve changed the constructor of the class (removing comments) and the ChangeTheWord() method.

Pretty awesome, huh? Rather than a bunch of screenshots to show you the rest, however, however, I’ll show you this quick clip of me playing around with it.

Some very cool stuff in there. First of all, I started where the screenshot left off — with a nice, succinct summary of what’s changed.

From there you can see that it’s easy to flip back and forth between the methods, even when moved, to see how they’re different. You can view each version of the source as well as a quick diff only of the relevant, apples-to-apples, changes.

It’s also nice, in general, that you can observe the changes according to what kind of change they are (modification, move, etc). And finally, at the end, I played a bit with the more traditional diff view that you’re used to — side by side text comparison.

But even with that, helpful UI context shows you that things have moved rather than the screenshot of the VS merge tool above where it looks like you’ve just butchered two different methods.

This is only scratching the surface of Semantic Merge. There are more features I haven’t covered at all, including a killer feature that helps auto-resolve conflicts by taking the base version of the code as well as server and local in order to figure out if there are changes only really made by one person.

You can check more of it out in this extended video about the tool. As I’ve said, it’s a pay tool, but the cost isn’t much and there’s a 30 day trial, so I’d definitely advise taking it for a spin if you work in a group and find yourself doing any merging at all.

By the way, if you liked this post and you're new here, check out this page as a good place to start for more content that you might enjoy.

By

TDD Chess Game Part 4: Getting Organized

Alright, welcome back to this series.

A couple of housekeeping things:

  1. I have bitten the bullet and used the Visual Studio White theme along with 14 point font to record, so hopefully the videos going forward should be easier to watch. It’s a little surreal to work with, but c’est la vie.
  2. The source code is now available on github for you to follow along. The coding is usually running ahead of my publication, so if you want to see the code from a given video, you may have to grab a slightly earlier version.

Here’s what I accomplish in this clip:

  • Started using a little todo list to keep track of what I’ve done and what I need to do.
  • Cleaned up code as reported by static analysis tools.
  • Pulled some production classes into their own namespaces and out of the test classes.
  • Defined an abstract Piece class.
  • Defined a second inheritor, “Rook,” for Piece.
  • Defined a bit of dumb functionality for Rook’s “GetMovesFrom” to get it started.
  • Implemented ability for a pawn to move two spaces on its first move.
  • Defined a piece concept of “HasMoved.” (albeit just for Pawn)

And here are the lessons to take away:

  • Keeping a list of smallish things you want to change can help you keep track of what needs to be done without distracting you too much (I picked this technique up from Kent Beck’s “Test Driven Development By Example.”)
  • If you’re using NCrunch, use the green dots being dark or bright as a quick way to tell if the code is compiling.
  • Gamify cosmetic issues. If “Optimize Namespaces” and things like that are important, make violations ugly and distracting in the IDE and you’ll get annoyed and fix them whereas you probably wouldn’t bother, otherwise.
  • It’s okay to write stupid tests if you do so knowing that you’ll fix them. Finding ways to always write a test to change production code is good for practicing the TDD discipline until it starts to become second nature.
  • It’s okay to write a test that causes a non-compile failure and then needing to do a good bit of work to get everything back to compiling/passing.
  • I’ve mentioned this previously, but it bears repeating: it’s okay to reuse a test (especially a stupid one) to get a failing test.
  • If you weren’t aware of C# yield keyword and deferred execution, it’d be a good thing to familiarize yourself with.
  • Force yourself not to copy and paste as much as possible, even when it seems dumb. Feeling the pain of re-typing things will make it painfully obvious when you’re duplicating code and could do something better.

And, here’s the clip:

By

Getting Started on the Roslyn Journey

It’s not as though it’s new; Roslyn CTP was announced in the fall of 2011, and people have been able to play with it since then. Roslyn is a quietly ground-breaking concept — a set of compilers that exposes compiling, code modeling, refactoring, and analysis APIs. Oh, and it was recently announced that the tool would be open source meaning that all of you Monday morning quarterback language authors out there can take a crack at implementing multiple inheritance or whatever other language horrors you have in mind.

I have to say that I, personally, have little interest in modifying any of the language compilers (unless I went to work on a language team, which would actually be a blast, I think), but I’m very interested in the project itself. This strikes me as such an incredible, ground-breaking concept and I think a lot of people are just kind of looking at this as a curiosity for real language nerds and Microsoft fanboys. The essential value in this offering, to me, is the standardizing of code as data. I’ve written about this once before, and I think that gets lost in the shuffle when there’s talk about emitting IL at runtime and infinite loops of code generation and whatnot. Forget the idea of dispatching a service call to turn blobs of text into executables at runtime and let’s agree later to talk instead about the transformative notion of regarding source code as entity collections rather than instruction sheets, scripts, or recipes.

But first, let’s get going with Roslyn. I’m going to assume you’ve never heard of this before and I’m going to take you from that state of affairs to doing something interesting with it in this post. In subsequent/later posts, we’ll dive back into what I’m driving at philosophically in the intro to this post about code as data.

Getting Started

(Note — I have VS2013 on all my machines and that is what I’ve used. I don’t know whether any/all of this would work in Studio 2012 or earlier, so buyer beware)

First things first. In order to use the latest Roslyn bits, you actually need a fairly recent version of Nuget. This caught me off guard, so hopefully I’ll save you some research and digging. Go to “Tools” menu and choose “Extensions and Updates.” Click on the “Updates” section at the left, and then click on “Visual Studio Gallery.”

NugetUpgrade

If you’re like me, your version was 2.7.something and it needs to be 2.8.1something or higher. This update will get you where you need to be. Once you’ve done that, you can simply install the API libraries via Nuget command line.

With that done, you’re ready to download the necessary installation files from Microsoft. Go to http://aka.ms/roslyn to get started. If you’re not signed in, you’ll be prompted to sign in with your Microsoft ID (you’ll need to create one if you don’t have one) and then fill out a survey. If you get lost along the way, your ultimate destination is to wind up here.

At this point, if you follow the beaten path and click the “Download” button, you’ll get something called download.dlm that, if your environment is like mine, is completely useless. So don’t do that. Click the circled “download” link indicated below to get the actual Roslyn SDK.

DownloadRoslyn

Once that downloads, unpack the zip file and run “Roslyn End User Preview” to install Roslyn language features. Now you can access the APIs and try out interesting new language features, like this one:

That’s all well and good for dog-fooding IDE changes and previewing new language features, but if you want access to the coolness from an API perspective, it’s time to fire up Nuget. Open up a project, and then the Nuget command line and type “Install-Package Microsoft.CodeAnalysis -Pre”

Once that finishes up, make your main entry point consist of the following code:

static void Main(string[] args)
{
    const string sourceCodePath = @"C:\Path\To\A\Csharp\Class\File.cs";

    var tree = CSharpSyntaxTree.ParseFile(sourceCodePath);
    var root = (CompilationUnitSyntax)tree.GetRoot();
    foreach (var field in root.DescendantNodes().OfType().Select(f => f.Declaration.GetLastToken()))
        Console.WriteLine(field.ToString());

    Console.ReadLine();
}

At this point, if you hit F5, what you’re going to see on the screen is a list of the fields contained in the class that you specify as your “sourceCodePath” variable (at least you will with the happy path — I haven’t tested this extensively to see if I can write classes that break it). Now, could you simply write a text parser (or, God forbid, some kind of horrible regex) to do this? Sure. Are there C# language modeling utilities like a Code DOM that would let you do this? Sure. Are any of these things the C# compiler? Nope. Just this.

So think about what this means. You’re not writing a utility that uses a popular C# source code modeling abstraction; you’re writing a utility that says, “hey, compiler, what are the fields in this source code?” And that’s pretty awesome.

My purpose here was to give you a path from “what’s this Roslyn thing anyway” to “wow, look at that, I can write a query against my own code.” Hopefully you’ve gotten that out of this, and hopefully you’ll go forth, tinker, and then you can come back and show me some cool tricks.

By

TDD Chess Game Part 3: Stumbling and Refactoring

My apologies. I meant to be a little more regular in this series, but I stumbled a bit out of the gate, as I got into the home stretch of my next Pluralsight course. Now that the course is delivered (not released yet – in the review/edit phase), I have some more time, so I’m planning to pick this back up and go with it a little more regularly.

One interesting thing that arises out of these “fits and starts” kind of passes at it is that it mimics an actual, common development scenario: spotty maintenance coding. What I mean is, so many TDD series that you’ll watch or coding dojo/exercises in which you’ll participate have a premise that you have some fixed length of time during which to pay complete attention. But in this series, I’m sort of poking at it for 10 minutes here and 20 minutes there, very seriously mimicking an environment where you’re plugging a lot of holes, thrashing a bit and saying, “where was I and what was I dong here?”

That’s evident in this clip, probably a little too much for me really to call it polished. And as such, I don’t accomplish a ton, but here’s what I did accomplish (not necessarily in order):

  • Tied up a loose end by getting rid of the last of the primitive obsession passing of x and y coordinate ints.
  • Implemented sanity precondition checks for the input Board’s AddPiece() method, in terms of where pieces could be placed.
  • Pushed functionality for validating coordinates into the coordinate itself.
  • Eliminated duplication in the validation with a refactoring.

And, here are some lessons to take away from this, both instructional from me and by watching me make mistakes:

  • After a conceptual refactoring, such as replacing multiple primitives with a type, take a look around to make sure you cleaned up all instances of the former.
  • When you’re not really sure what to do next (i.e. “coder’s block” or “paralysis by analysis”), implement some sanity checks for preconditions/invariants. This might jolt you into some next steps as you do it.
  • Make ABSOLUTELY SURE that a test goes red when you think it should go red. Not understanding why a unit test is passing is just as bad as not understanding why it’s failing. In both cases, it means you don’t understand what your code is doing. Stop everything and get your brain in sync with the code immediately to save yourself a lot of frustration later. (See “programming by coincidence,” which I saw coined in the book “The Pragmatic Programmer” — and then, don’t do it!)
  • You’re going to make mistakes. Often dumb ones. The beauty of TDD and its fast feedback loop is to prevent them from festering and being worse later.
  • This is more of an editorial/opinion take, but I’ve more recently gravitated toward allowing my TDD to include what might be called “integration tests” (tests that exercise the interaction between two classes). As long as the test makes sense from a behavioral standpoint and provides clarity, I think it’s fine. Some, particularly those in the BDD camp, even argue that this is preferred, and that your tests should really only go through the outer API of your module/application.
  • Eliminate duplication, however trivial and however subtle. If you see repetition of any kind, you can probably extract a method. Some productivity tools and IDEs will even help you locate possible duplication.

Finally, a few notes on the video itself (and resultant code):

  • For those of you who suggested a larger font size, look for that in part 4. I apologize, but I had actually recorded the video for this already when I was taking suggestions. In the production, I did zoom to a slightly smaller area, so we’ll see if that helps any.
  • I had one commenter express a preference for a white background instead of the VS Dark theme that I use. White work-spaces give me a headache, so I darken all IDEs and things that I work in. For one person, I don’t think I’ll pull the trigger, but if more people start responding and expressing that preference, I’ll agree to suck it up and change colors.
  • The code is now on github. I’ll commit the code each time I record the video and tag it with a comment corresponding to the part of the series in question. The initial push to master just reads “Initial publish to Github” but it corresponds to the code at the end of this clip. From here forward, I’ll sync them, though if you check the repo, it’ll probably run slightly ahead of me publishing the videos because I record the audio and do these writeups after the fact.
  • Again, the higher res you view this in the better. I’d go for 1440P if you can.

By

What To Return: IEnumerable or IList?

I’ve received a couple of requests in various media to talk about this subject, with the general theme being “I want to return a bunch of things, so what type of bunch should I use?” I’m using the term “bunch” in sort of a folksy, tongue-in-cheek way, but also for a reason relating to precision — I can’t call it a list, collection or group without evoking specific connotations of what I’d be returning in the C# world (as those things are all type names or closely describe typenames).

So, I’m using “bunch” to indicate that you want to return a “possibly-more-than-one.”

I suspect that the impetus for this question arises from something like a curt code review or offhand comment from some developer along the lines of “you should never return a list when you could return an IEnumerable.” The advice lacks nuance for whatever reason and, really, life is full of nuance.

So when and where should you use what? Well, the stock consultant answer of “it depends” makes a good bit of sense. You’ll also probably get all kinds of different advice from different people, but I’ll describe how I decide and explain my reasoning.

First Of All, What Are These Things?

Before we go any further, it probably makes sense to describe quickly what each of these possible return values is.

IList is probably simpler to describe. It’s a collection (I can use this because it inherits from ICollection) of objects that can be accessed via indexers, iterated over and (usually) rearranged. Some implementations of IList are readonly, others are fixed size, and others are variable size. The most common implementation, List, is basically a dynamic array for the sake of quick, easy understanding.

I’ve blogged about IEnumerable in the past and talked about how this is really a unique concept. Tl;dr version is that IEnumerable is not actually a collection at all (and it does not inherit from ICollection), but rather a combination of an algorithm and a promise.

If I return an IEnumerable to you, what I’m really saying is “here’s something that when you ask it for the next element, it will figure out how to get it and then give you the element until you stop asking or there are none left.” In a lot of cases, something with return type IEnumerable will just be a list under the hood, in which case the “strategy” is just to give you the next thing in the list.

But in some cases, the IEnumerable will be some kind of lazy loading scheme where each iteration calls a web service, hits a database, or for some reason invokes a 45 second Thread.Sleep. IList is (probably) a data structure; IEnumerable is a algorithm.

Since they’re different, there are cases when one or the other clearly makes sense.

When You’d Clearly Use IEnumerable

Given what I’ve said, IEnumerable (or perhaps IQueryable) is going to be your choice when you want deferred execution (you could theoretically implement IList in a way that provided deferred execution, but in my experience, this would violate the “principle of least surprise” for people working with your code and would be ill-suited since you have to implement the “Count” property).

If you’re using Entity Framework or some other database loading scheme, and you want to leave it up the code calling yours when the query gets executed, return IEnumerable. In this fashion, when a client calls the method you’re writing, you can return IEnumerable, build them a query (say with Linq), and say “here, you can have this immediately with incredible performance, and it’s up to you when you actually want to execute this thing and start hammering away at the database with retrieval tasks that may take milliseconds or seconds.”

Another time that you would clearly want IEnumerable is when you want to tell clients of your method, “hey, this is not a data structure you can modify — you can only peek at what’s there. If you want your own thing to modify, make your own by slapping what we give you in a list.”

To be less colloquial, you can return IEnumerable when you want to make it clear to consumers of your method that they cannot modify the original source of information. It’s important to understand that if you’re going to advertise this, you should probably exercise care in how the thing you’re returning will behave. What I mean is, don’t return IEnumerable and then give your clients something where they can modify the internal aggregation of the data (meaning, if you return IEnumerable don’t let them reorder their copy of it and have that action also reorder it in the place you’re storing it).

When you’d clearly use IList

By contrast, there are times when IList makes sense, and those are probably easier to understand.

If, for instance, your clients want a concrete, tangible, and (generally) modifiable list of items, IList makes sense.

  • If you want to return something with an ordering that matters and give them the ability to change that ordering, then give them a list.
  • If they want to be able to walk the items from front to back and back to front, give them a list.
  • Or, if they want to be able to look up items by their position, give them a list.
  • And if they want to be able to add or remove items, give them a list. Any random accesses and you want to provide a list.

Clearly, it’s a data structure you can wrap your head around easily — certainly more so than IEnumerable.

Good Polymorphic Practice

With the low hanging fruit out of the way, let’s dive into grayer areas. A rule of thumb that has served me well in OOP is “accept as generic as possible, return as specific as possible.” This is being as cooperative with client code as possible.

Imagine if I write a method called “ScareBurglar()” that takes an Animal as argument and invokes the Animal’s “MakeNoise()” method. Now, imagine that instead of taking Animal as the parameter, ScareBurglar took Dog and invoked Dog.MakeNoise(). That works, I suppose, but what if I had a guard-bear? I think the bear could make some pretty scary noises, but I’ve pigeon-holed my clients by being too specific in what I accept.

If MakeNoise() is a method on the base class, accept the base class so you can serve as many clients as possible.

On the flip side, it’s good to return very specific types for similar kinds of reasoning. If I have a “GetDog()” method that instantiates and returns a Dog, why pretend that it’s a general Animal? I mean, it’s always going to be a Dog anyway, so why force my clients that are interested in Dog to take an Animal and cast it?

I’ve blogged previously about what I think of casting. Be specific. If your clients want it to be an animal, they can just declare the variable to which they’re assigning the return value as Animal.

So, with this rule of thumb in mind, it would suggest that returning lists is a good idea when you’re definitely going to return a list. If your implementation instantiates a list and returns that list, with no possibility of it being anything else, then you might want to return a list. Well, unless…

Understanding the Significance of Interfaces

A counter-consideration here is “am I programming to an interface or in a simple concrete type.” Why does this matter?

Well, it can push back on what I mentioned in the last section. If I’m programming a class called “RandomNumberProvider” with a method “GetMeABunchOfNumbers()” that creates a list, adds a bunch of random numbers to it, and returns that list, then I should probably return List<int>.

But what if I’m designing an interface called IProvideNumbers? Now there is no concrete implementation — no knowledge that what I’m returning is going to be implemented as List everywhere. I’m defining an abstraction, so perhaps I want to leave my options open. Sure RandomNumberProvider that implements the interface only uses a list. But how do I know I won’t later want a second implementation called “DeferredExecutionNumberProvider” that only pops numbers as they’re iterated by clients?

As a TDD practitioner, I find myself programming to interfaces. A lot. And so, I often find myself thinking, what are the postconditions and abilities I want to guarantee to clients across the board?

This isn’t necessarily, itself, a by-product of TDD, but of programming to interfaces. And, with programming to interfaces, specifics can bite you at times. Interfaces are meant to allow flexibility and future-proofing, so getting really detailed in what you supply can tie your hands. If I promise only an IEnumerable, I can later define implementers that do all sorts of interesting things, but if I promise an IList, a lot of that flexibility (such as deferred execution schemes) go out the window.

The Client’s Burden

An interesting way to evaluate some of these tradeoffs is to contemplate what your client’s pain points might be if we guess wrong.

Let’s say we go with IEnumerable as a return type but the client really just wants a IList (or even just List). How bad is the client’s burden? Well, if client only wants to access the objects, it can just awkwardly append .ToList() to the end of each call to the method and have exactly what it wants. If the client wants to modify the state of the grouping (e.g. put the items in a different order and have you cooperate), it’s pretty hosed and can’t really use your services. However, that latter case is addressed by my “when a list is a no brainer” section — if your clients want to do that, you need to not give them an IEnumerable.

What about the flip side? If the client really wants an IEnumerable and you give them a list? Most likely they want IEnumerable for deferred execution purposes, and you will fail at that. There may be other reasons I’m not thinking of off the top, but it seems that erring when client wants an enumerable is kind of a deal-breaker for your code being useful.

Ugh, so what should I do?!?

Clear as mud?

Well, problem is, it’s a complicated subject and I can only offer you my opinion by way of heuristics (unless you want to send me code or gists, and then I can offer concrete opinions and I’m actually happy to do that).

At the broadest level, you should ask yourself what your client is going to be doing with the thing that you return and try to accommodate that. At the next broadest level, you should think to yourself, “do I want to provide the client a feature-rich experience at the cost of later flexibility or do I want to provide the client a more sparse set of behavior guarantees so that I can control more implementation details?”

It also pays to think of the things you’re returning in terms of what they should do (or have done to them), rather than what they are. This is the line of thinking that gets you to ask questions like “will clients need to perform random accesses or sorts,” but it lets you go beyond simple heuristics when engaged in design and really get to the heart of things. Think of what needs to be done, and then go looking for the data type that represents the smallest superset of those things (or, write your own, if nothing seems to fit).

I’ll leave off with what I’ve noticed myself doing in my own code. More often than not, when I’m communicating between application layers I tend to use a lot of interfaces and deal a lot in IEnumerable. When I’m implementing code within a layer, particularly the GUI/presentation layer in which ordering is often important, I favor collections and lists. This is especially true if there is no interface seem between the collaborating components. In these scenarios I’m more inclined to follow the “return the most specific thing possible” heuristic rather than the “be flexible in an interface” heuristic.

Another thing that I do is try to minimize the amount of collections that I pass around an application. The most common use case for passing around bunches of things is collections of data transfer objects, such as some method like “GetCustomersWithFirstName(string firstName).” Clearly that’s going to return a bunch of things. But in other places, I try to make aggregation an internal implementation detail to a class. Command-Query Separation helps with this. If I can, I don’t ask you for a collection, do things to it and hand it back. Instead I say “do this to your collection.”

And finally, when in doubt and all else seems to be a toss-up, I tend to favor promising the least (thus favoring future flexibility). So if I really can’t make a compelling case one way or the other for any reason, I’ll just say “you’re getting an IEnumerable because that makes maintenance programming likely to be less painful later.”

By the way, if you liked this post and you're new here, check out this page as a good place to start for more content that you might enjoy.