DaedTech

Stories about Software

By

The Myth of Quick Copy-And-Paste Programming

Quick and Dirty?

“We’re really busy right now–just copy that customer form and do a few find and replaces. I know it’s not a great approach, but we don’t have time to build the new form from scratch.”

Ever heard that one? Ever said it? Once, twice, dozens, or hundreds of times?

This has the conversational ring of, “I know I shouldn’t eat that brownie, but it’s okay. I’m planning to start exercising one of these days.” The subtext of the message is a willingness to settle–to trade self esteem for immediate gratification with the disclaimer “the outcome just isn’t important enough to me to do the hard thing.”

If that sounds harsh to you when put in the context of programming instead of diet and exercise, there’s a reason: you’re used to fooling yourself into believing that copy-and-paste programming sacrifices good design to save time. But the reality is that the “time savings” is a voluntary self-delusion and that it’s really trading good design for being able to “program” mindlessly for a few minutes or hours. Considered in this light, it’s actually pretty similar to gorging yourself on chocolate while thinking that you’ll exercise later when you can afford that gym membership and whatnot: “I’ll space out and fantasize about my upcoming vacation now while I’m programming and clean up the mess when I come back, refreshed.”

Hey, Wait a Minute

I know what you’re thinking. You’re probably indignant right now because you’re thinking of a time that you copied a 50,000 line file and you changed only two things in it.

Re-typing all 50,000 lines would have taken days, and you got the work done in minutes. The same number of minutes, in fact, that you’d have spent parameterizing those two differences and updating clients to call the new method, thus achieving good design and efficiency. Okay, so bad example.

Wait, Now I’ve Got It

Now, you’re thinking about the time that you copied that 50,000 line file and there were about 300 differences–no way you could easily have parameterized all of that. Only a copy, paste, and a bunch of find and replace could do the trick there.

After that, you were up and running in about an hour. And everything worked.

Oh, except for that one place where the text was a little different. You missed that one, but you found it after ten minutes of debugging.

Oh, and except for five more of those at ten or fifteen minutes a pop.

Oh, and then there was that twenty minutes you spent after the architect pointed out that a bunch of methods needed to be renamed because they made no sense named what they were in the new class. Then you were truly done.

Except, oh, crap, runtime binding was failing with that other module since you changed those method names to please the stupid architect. That was a doozy because no one noticed it until QA a week later, and then you spent a whole day debugging it and another day fixing it.

Oh, and then there was a crazy deadlock issue writing to the log file that some beta customer found three months later. As it turns out, you completely forgot that if the new and old code file methods executed in just the right interleaving, wackiness might ensue. Ugh, that took a week to reproduce and then another two weeks to figure out.

Okay, okay, so maybe that was a bad example of the copy-and-paste time savings.

Okay, Those Don’t Count

But you’re still mad at me. Maybe those weren’t the best examples, but all the other times you do it are good examples.

You’re getting things done and cranking out code. You’re doing things that get you 80% of the way there and making it so that you only have to do 20% of the work, rather than doing all 100% from scratch. Every time you copy and paste, you save 80% of the manpower (minus, of course, the time spent changing the parts of the 80% that turned out not to be part of the 80% after all).

The important point is that as long as you discount all of the things you miss while copying and pasting and all of the defects you introduce while doing it and all of the crushing technical debt you accrue while doing it and all of the downstream time fixing errors in several places, you’re saving a lot of time. I mean, it’s the same as how that brownie you were eating is actually pretty low in calories if you don’t count the flour, sugar, chocolate, butter, nuts, and oil.

Come to think of it, you’re practically losing weight by eating it.

We Love What We Have

Hmm…apparently, it’s easy to view an activity as a net positive when you make it a point to ignore any negatives. And it’s also understandable.

My flippant tone here is more for effect than it is meant to be a scathing indictment of people for cutting corners. There’s a bit of human psychology known as the Endowment Effect that explains a lot of this tendency. We have a cognitive bias to favor what we already have over what’s new, hypothetical, or in the possession of others.

Humans are loss averse (we feel the pain of a loss of an item more than we experience pleasure at acquiring the item, on average), and this leads to a situation in which we place higher economic value on things that we have than things that we don’t have. You may buy a tchotchke on vacation from a vendor for $10 and wouldn’t have paid a dollar more, yet when someone comes along and offers you $12 or $20 or even $30 for it, you suddenly get possessive and don’t want to part with it. This is the Endowment Effect in action.

We Love Our Copy Paste Code

What does this have to do with copying and pasting code? Well, if you substitute time/effort as your currency, there will be an innate cognitive bias toward adapting work that you’ve already done as opposed to creating some theoretical new implementation. In other words, you’re going to say, “This thing I already have is a source of more efficiency than whatever else someone might do.”

This means that, assuming each would take equal time and that time is the primary motivator, you’re going to favor doing something with what you already have over implementing something new from scratch (in spite of how much we all love greenfield coding). Unfortunately, it’s both easy and common to conflate the Endowment Effect’s cognitive bias toward reuse with the sloppy practice of copy and paste.

At the point of having decided to adapt existing code to the new situation, things can break one of two ways. You can achieve this reuse by abstracting and factoring common logic, or you can achieve it by copy and paste.

Once you’ve already decided on reuse versus blaze a new trail, the efficiency-as-currency version of the Endowment Effect has already played out in your mind–you’ve already, for better or for worse, opted to re-appropriate existing work. Now you’re just deciding between doing the favorable-but-hard thing or the sloppy-and-easy thing. This is why I said it’s more like opting to stuff your face with brownies for promises of later exercise than it is to save time at the expense of good design.

Copy Paste Programming as Empty Calories

Think about it. Copy and paste is satisfying the way those empty calories are satisfying. Doing busy-work looks a lot like doing mentally taxing work and is often valued similarly in suboptimally incentivized environments.

So, to copy and paste or not to copy and paste is often a matter of, “Do I want to listen to talk radio and space out while getting work done, or do I want to concentrate and bang my head against hard problems?” And if you do a real, honest self-evaluation, I’m pretty sure you’ll come to the same conclusion.

Copying and pasting is the reality television of programming–completely devoid of meaningful substance in favor of predictable, mindless repetition.

Quick and Dirty is Just Dirty

In the end, you make two decisions when you go the copy-and-paste route. You decide to adapt what you’ve got rather than invent something new, and that’s where the substance of the time/planning/efficiency decision takes place. And once you’ve made the (perfectly fine) decision to use what you’ve got, the next decision is whether to work (factor into a common location) or tune out and malinger (copy and paste).

And in the end, they’re going to be a wash for time. The up front time saved by not thinking through the design is going to be washed out by the time wasted on the defects that coding without thinking introduces, and you’re going to be left with extra technical debt even after the wash for a net time negative.

So, in the context of “quick and dirty,” the decision of whether new or reused is more efficient (“quick”) is in reality separated from and orthogonal to the decision of factor versus copy and paste (“dirty”). Next time someone tells you they’re going with the “quick and dirty” solution of copy and paste, tell them, “You’re not opting for quick and dirty–you’re just quickly opting for dirty.”

By the way, if you liked this post and you're new here, check out this page as a good place to start for more content that you might enjoy.

By

FluentSqlGenerator

A while back, I made a post about using string.Join() to construct SQL where clauses from collections of individual clauses. In that post, I alluded to playing with a “more sophisticated” where clause builder. I did just that here and there and decided to post the results to github. You can find at here if you want to check it out.

My implementation makes use of the Composite design pattern and the idea of well formed formula semantics in formal systems such as propositional and first order logic. The latter probably sounds a little stuffy and exposes my inner math geek, but that’s just a rigorous way of expressing the concept of building a statement from literals and basic operations on those literals. To pull it back yet another level of stuffiness, consider simple arithmetic, in which all of the following are valid expressions: “6”, “12 + 6”, “12 + (9 – 3)”. The first expression is an atomic literal and the second expression a binary operation. The third expression is interesting in that it shows that functions can have arguments that are literals or other expressions (if this still seems strange, think of these examples as “6”, “Add(12, 6)” and “Add(12, Subtract(9, 3))”.

Think of how this applies to the propositional semantics that make up SQL query where clauses. I can have “Column1 = 12” or I can have “Column1 = 12 AND Column2 = 13” or I can have “Column1 = 12 AND (Column2 = 13 OR Column3 = 4)”. When I want to model this concept in an object oriented sense, I need to represent the operators “AND” and “OR” as objects with two properties: left expression and right expression. I also need it to be possible that either of these properties is a “literal” of the form “col = val” or that it could be another expression as with the last example. Composite is thus a natural fit when you consider that these clauses are really expression trees, in a very real sense. So there is a “Component” base that’s abstract and then “Clause” and “Operation” objects that inherit from them and are fungible when constructing expressions.

This was the core of the implementation, but I also dressed it up a bit with some extension methods to support a discoverable, fluent interface, but optionally (I’m still very leery of this construct, but this seems like an appropriate and judicious use). Another nice feature, in my opinion is that it supports generic parameters so you don’t have massive overloads — you can set your columns equal to objects, strings, decimals, ints, etc. It makes heavy use of ToString() with these generic parameters, so use any type you please so long as what you want out of it is well represented by ToString().

A sample API is as follows:

var clause = Column.Named("Column1").IsEqualTo(123);
Console.WriteLine(clause);

clause = Column.Named("Column1").IsEqualTo(123).And(Column.Named("Column2").IsEqualTo("123456"));
Console.WriteLine(clause);

clause = Column.Named("Column1").IsOneOf(1, 2, 3).Or(Column.Named("Column2").IsGreaterThan(12));
Console.WriteLine(clause);

clause = Are.AnyOfTheseTrue(Column.Named("Column1").IsEqualTo(832), Column.Named("Column2").IsLessThan(25.30));
Console.WriteLine(clause);

clause = Are.AreAllOfTheseTrue(Column.Named("Column1").IsEqualTo(832), Column.Named("Column2").IsLessThan(25.30), Column.Named("Column3").IsOneOf("Current", "Valid"));
Console.WriteLine(clause);

Console.ReadLine();

Currently supported SQL operations include various comparison (equal, not equal, greater, less, etc) as well as “like” and “in()”. Expression operators “AND”, “OR” and “NOT” are supported. The utility is well covered by unit tests and a handful of integration tests too if you want to poke around but preserve functionality.

Feel free to download, use, fork, enhance, make fun of, etc, whatever. I’m not pretending this is a problem never before solved nor that this is the most elegant solution imaginable, but it was fun to write, code-kata style, and if someone can get some use out of it, great. If I wind up making significant modifications to it or extending it, I’ll post updates here as well as checking changes into github.

By

Casting is a Polymorphism Fail

Have you ever seen code that looked like the snippet here?

public class Menagerie
{
    private List _animals = new List();

    public void AddAnimal(Animal animal)
    {
        _animals.Add(animal);
    }

    public void MakeNoise()
    {
        foreach (var animal in _animals)
        {
            if (animal is Cat)
                ((Cat)animal).Meow();
            else if (animal is Dog)
                ((Dog)animal).Bark();
        }
    }
}

You probably have seen code like this, and I hope that it makes you sad. I know it makes me sad. It makes me sad because it’s clearly the result of a fundamental failure to understand (or at least implement) polymorphism. Code written like this follows an inheritance structure, but it completely misses the point of that structure, which is the ability to do this instead:

public class Menagerie
{
    private List _animals = new List();

    public void AddAnimal(Animal animal)
    {
        _animals.Add(animal);
    }

    public void MakeNoise()
    {
        foreach (var animal in _animals)
            animal.MakeNoise();
    }
}

What’s so great about this? Well, consider what happens if I want to add “Bird” or “Bear” to the mix. In the first example with casting, I have to add a class for my new animal, and then I have to crack open the menagerie class and add code to the MakeNoise() method that figures out how to tell my new animal to make noise. In the second example, I simply have to add the class and override the base class’s MakeNoise() method and Menagerie will ‘magically’ work without any source code changes. This is a powerful step toward the open/closed principle and the real spirit of polymorphism — the ability to add functionality to a system with a minimum amount of upheaval.

But what about more subtle instances of casting? Take the iconic:

public void HandleButtonClicked(object sender, EventArgs e)
{
    var button = (Button)sender;
    button.Content = "I was clicked!";
}

Is this a polymorphism failure? It can’t be, can it? I mean, this is the pattern for event subscription/handling laid out by Microsoft in the C# programming guide. Surely those guys know what they’re doing.

As a matter of fact, I firmly believe that they do know what they’re doing, but I also believe that this pattern was conceived of and created many moons ago, before the language had some of the constructs that it currently does (like generics and various frameworks) and followed some of the patterns that it currently does. I can’t claim with any authority that the designers of this pattern would ask for a mulligan knowing what they do now, but I can say that patterns like this, especially ones that become near-universal conventions, tend to build up quite a head of steam. That is to say, if we suddenly started writing even handlers with strongly typed senders, a lot of event producing code simply wouldn’t work with what we were doing.

So I contend that it is a polymorphism failure and that casting, in general, should be avoided as much as possible. However, I feel odd going against a Microsoft standard in a language designed by Microsoft. Let’s bring in an expert on the matter. Eric Lippert, principal developer on the C# compiler team, had this to say in a stack overflow post:

Both kinds of casts are red flags. The first kind of cast raises the question “why exactly is it that the developer knows something that the compiler doesn’t?” If you are in that situation then the better thing to do is usually to change the program so that the compiler does have a handle on reality. Then you don’t need the cast; the analysis is done at compile time.

The “first kind” of cast he’s referring to is one he defines earlier in his post as one where the developer “[knows] the runtime type of this expression but the compiler does not know it.” That is the kind that I’m discussing here, which is why I chose that specific portion of his post. In our case, the developer knows that “sender” is a button but the compiler does not know that. Eric’s point, and one with which I wholeheartedly agree, is “why doesn’t the compiler know it and why don’t we do our best to make that happen?” It just seems like a bad idea to run a reality deficit between yourself and the compiler as you go. I mean, I know that the sender is a button. You know the sender is a button. The method knows the sender is a button (if we take its name, containing “ButtonClicked” at face value). Maintainers know the sender is a button. Why does everyone know sender is a button except for the compiler, who has to be explicitly and awkwardly informed in spite of being the most knowledgeable and important party in this whole situation?

But I roll all of this into a broader point about a polymorphic approach in general. If we think of types as hierarchical (inheritance) or composed (interface implementation), then there’s some exact type that suits my needs. There may be more than one, but there will be a best one. When writing a method and accepting parameters, I should accept as general a type as possible without needing to cast so that I can be of the most service. When returning something, I should be as specific as possible to give clients the most options. But when I talk about “possible” I’m talking about not casting.

If I start casting, I introduce error possibilities, but I also necessarily introduce a situation where I’m treating an object as two different things in the same scope. This isn’t just jarring from a readability perspective — it’s a maintenance problem. Polymorphism allows me to care only about some public interface specification and not implementation details — as long as the thing I get has the public API I need, I don’t really care about any details. But as soon as I have to understand enough about an object to understand that it’s actually a different object masquerading as the one I want, polymorphism is right out the window and I suddenly depend on knowing the intricate relationship details of the class in question. Now I break not only if my direct collaborators change, but also if some inheritance hierarchy or interface hierarchy I’m not even aware of changes.

The reason I’m posting all of this isn’t to suggest that casting should never happen. Clearly sometimes it’s necessary, particularly if it’s forced on you by some API or framework. My hope though is that you’ll look at it with more suspicion — as a “red flag”, in the words of Eric Lippert. Are you casting because it’s forced on you by external factors, or are you casting to communicate with the compiler? Because if it’s the latter, there are other, better ways to achieve the desired effect that will leave your code more elegant, understandable, and maintainable.

By the way, if you liked this post and you're new here, check out this page as a good place to start for more content that you might enjoy.

By

Static and New Are Like Inline

C++ Inline

Reaching back into my C++ days, a concept exists called “inline”. “Suggesting inline” is a concept where you tell the C++ compiler that you want to dispense with function call overhead and slam the code in question right into the code stream of the caller. (It’s a matter of suggesting because the compiler might ignore this request during its optimization such as if you decide to rip a hole in space-time by suggesting inline on a recursive method). So, for instance:


inline int multiply(int x, int y)
{
     return x*y;
}

int main(int argc, char** argv)
{
     int product = multiply(2,5);
}

is effectively transformed into:

int main(int argc, char** argv)
{
     int product = 2*5;
}

This is conceptually similar to the concept of macros in C/C++. The idea is simple — you may have some block of code that you want to abstract out for reuse or readability, but you’d prefer the performance to mimick what would happen if you just wrote the code right in the method in question. You want to have your cake and to eat it too. Understandable — I mean, who doesn’t and what’s the point of having cake if you can’t eat it?

In the .NET world of managed languages, we don’t have this concept. It wouldn’t make any sense anyway as our builds generate byte code, which is processor-agnostic. The actual object code is going to be a matter between the runtime and the OS. So we don’t see the option for inline in the sense of runtime performance.

Metrics in OOP

One of the metrics in OOP that I think is a generally fair one is lines of code as a liability. If you have a 20-100 line class then life is good, but as you start creeping toward 300 lines it starts to smell. If it’s over 500 lines, kill it with fire (or, break it up into reasonable classes). The reason for this is a nod toward the Single Responsibility Principle and the concept of code smell. There’s nothing inherently wrong with large classes, per se, but they’re almost always an indication that a bunch of responsibilities are all knotted together in one tightly coupled mess. The same goes for methods on a smaller scale and with smaller scope of responsibility.

So, your personal preferences (and mine) about class/method size notwithstanding, it bears mentioning that smaller and more focused is generally considered better. The result of this is that it’s fairly common to evaluate class and method sizes with fellow developers and see people getting antsy as sizes get larger. On the flip side, it’s common to see people feel good when they keep class sizes small and to feel particularly good when they slay some hulking 5000 line beast and see it divided up into 20 more reasonably sized classes.

Another fairly common metric that I like is the number of parameters passed to a method. Like lines of code and large method/class, parameters trend toward code smell. No parameters is great, one is fine, two is a little noisy, and three is pushing it. More than that and you’ve written a method I want nothing to do with. I think a lot of people feel this way in terms of code they write and almost everyone feels this way when they’re a client (if a method has a bunch of parameters, good luck figuring out how to use it). This is true not only because it’s hard to keep all of the parameters straight but also because a method that needs a whole bunch of things to do its job is likely doing too large a job or too many jobs.

We like our methods/classes small and our parameter lists short.

Static and New: Gaming the System

One way to accomplish these goals is to ensure that your scopes are well defined, factored, and focused. But, if you don’t feel like doing that, you can cheat. If, for instance, you come across a constructor that takes 5 dependency parameters, the best thing to do would be to rework the object graph to have a more cohesive class that needed fewer dependencies. But, if time is short, you can just kick the pile of debris under the bed by having the constructor reach out into the static universe and pull its dependencies from the ether. Now your class has the cologne of a blissfully simple constructor hiding its dependnecy smell, thanks to some static or singleton access. The same thing can be applied with the “new” keyword. If you don’t want to rip holes in the fabric of your object graph with static state and functionality, you can always instantiate your own dependencies to keep that constructor looking slim (this would be identical in concept to having a stateless static factory method).

This trick also applies to cutting down lines of code. If you have a gigantic class, you could always port out some of its state and behavior out into a static class with static state or to an instance class spun up by the beast in question. In either case, lines of code goes down and arguments to public APIs stays the same.

Inline Revisited

Consider the following code:

public class Foo
{
    public int GetTotal(int n)
    {
        int total = 0;
        for (int index = 0; index < n; index++)
        {
            total += index;
        }
        return total;
    }
}

Let’s say that we thought that GetTotal method looked way too long and we wanted to shorten it by kicking parts of it under the bed when company came by to look at the class. What about this:

public class Foo
{
    public int GetTotal(int n)
    {
        return Utils.RunningSum(n);   
    }
}

This is fewer lines of code, to be sure. We’ve created a static Utils class that handles the dirty work of getting the running sum of all numbers up to and including a given number and we’ve delegated the work to this class. Not bad — we’ve eliminated 5 lines of code from both Foo and GetTotal(). And RunningSum is stateless, so there’s no worry about the kind of weird behavior and dependencies that you get from static state. This is a good thing, right?

Well, no, not really. I’d argue that this is fool’s gold in terms of our metrics. In a very real conceptual sense, Foo has no fewer lines of code than it initially did. Sure, from the perspective of organizing code it does — I’ve separated the concern of RunningSum from Foo’s GetTotal and we might make an argument that this factoring is a good thing (it’d be more interesting in a less trivial example). But from the perspective of coupling in the object graph, I’ve done exactly nothing.

When you call GetTotal(n), all of the same code is going to be executed in either case. All of the same branching will occur, all of the same logic will guide that branching, and all of the same local variables will be declared. From a dependency perspective as expressed in source code, you might as well inline Utils.RunningSum() into GetTotal(). To put this another way, you might as well conclude that Foo and GetTotal() are just as many lines of code as they ever were.

And that’s my larger point here. When your code invokes a static method or instantiates an object, client code calling your stuff has no choice in the matter. If I’m calling Foo’s GetTotal() method, it doesn’t make any difference to me if you call Utils.RunningSum() or just do the work yourself. It’s not as though I have any say in the matter. It’s not as though I can specify a strategy for computing the sum myself.

Now, by contrast, consider this:

public interface IComputeTotals
{
    int RunningSum(int n);
}

public class Foo
{
    public int GetTotal(int n, IComputeTotals computer)
    {
        return computer.RunningSum(n);
    }
}

Here I have a method that allows me to specify a number and a strategy for totals computing and it returns the computed total. Is this like inline? No, of course not — as the client, I have a lot of control here. Foo isn’t inlining anything because it needs me to specify the strategy.

But what about encapsulation? With the code in the method or abstracted to a hidden collaborator (static or new) the details of computation are hidden from me as the client whereas here they may not be (depending on where/how I got ahold of an instance that implements that interface). To that I say that it’s admittedly a tradeoff. This latter implementation is providing a seam and giving more options to client code whereas the former implementation is hiding more but leaving client code with fewer options. Assuming that any static methods involved are stateless/functional, that’s really the main consideration here – how much to cover up/hide and how much to expose.

I certainly have a preference for inversion of control and an aversion to static implementations both because of my desire for decoupled flexibility and my preference for TDD. But your mileage may vary. All I’d ask of anyone is to make informed decisions with eyes wide open. Metrics/smells like those about parameters and class/method size don’t exist in a vacuum. “Fixing” your 5000 line class by creating a static class and delegating 4800 lines of work to it behind the scenes is not reducing the size of that class in any meaningful sense, and it’s not addressing the bad smell the class creates; you’re just spraying perfume on it and hoping no one notices. The real problem isn’t simply that 5000 lines of code exist in one source file but rather that 5000 lines exist with no way to pry the dependencies apart and swap in alternate functionality.

So when you’re coding and bearing in mind certain heuristics and metrics, think of using static method calls and instantiating collaborators as what they really are — cosmetic ways to move code out of a method. The fact that the code moves to another location doesn’t automatically make it any more flexible, easier to maintain or less smelly. It’s providing additional seams and flexibility that has the effect you’re looking for.

By

Favor Outcomes Over Rules

Cargo Cults and Angry Monkeys

If you have never heard the term “Cargo Cult Programming“, it’s an it’s an excellent way to describe blindly following processes without understanding the theory behind them. The term originates from a story about aboriginal islanders during World War II. During the war, cargo planes regularly (and from their perspective, magically) arrived with supplies and food that benefited these islanders and then after the war the planes stopped coming. This apparently didn’t stop the aboriginals from building ad-hoc landing strips and the like in hopes of ‘summoning’ more planes with supplies. If that story isn’t to your liking, there is another one of questionable accuracy about monkeys and bananas.

I’ve made a fair number of posts lately that address subjects near and dear to programmers while still having broader reach and this is certainly another such subject. Doing things without understand why is generally the province of the incurious or the busy, but I think it’s worth generally forcing yourself to ask why you’re doing pretty much anything, particularly if you’re in a fairly cerebral line of work. As professionals, it behooves us to realize that we want supplies or that we want not to get sprayed rather than thinking that building landing strips and beating up monkeys are just things that we do and such is life.

In fact, I’d venture to say that we pride ourselves in this sort of thing as we advance in our careers. As programmers, we even help our users understand this concept. “I understand that you normally push the green button three times and hit the backspace key while pressing the mute button, but what are you actually trying to do when you do all that?” How often have you asked a user something like this? In essence you’re saying “forget the process for a minute — what’s the larger goal here?”

Do Unto Others

It then bears asking, if we pride ourselves on critical thinking and we seek to help users toward that end, why do we seem to encourage each other to dance for supplies and spray our team members with cold water? Why do we issue cryptic cargo cult orders to other programmers when we understand our own rationale? Let me give an example.

A few years back I was setting up a new machine for work on a project to which I was new, and the person guiding me through the setup told me that I needed to install KDiff, a spectacularly average comparison tool that even then hadn’t been revved or maintained in a few years. Now, I have nothing, per se, against KDiff, and the price is right, but this struck me as a very… specific… order. What difference did it make what I used for diff? I had used a paid version of Beyond Compare prior to that and always liked it, but when it comes to tooling I do enjoy poking around and so I didn’t mind trying a different tool.

I’m pickier about some things than others, so I just said “Bob’s Your Uncle,” installed KDiff and didn’t think about it again for a long time. It was actually many months later that I was sitting in on a code review for a developer who had just started on the project and found out the reasoning behind this cargo cult practice. The developer whose code was being reviewed casually opened BeyondCompare and one of the more tenured reviewers than me promptly freaked out that he wasn’t using KDiff. Bemused, I tuned in to hear the answer to a question that I’d completely forgotten about. The issue is that the project had a handful of XML files containing application meta-data that were source controlled and that some unnamed developer had, at some point, done something, presumably with some sort of diff tool, that had allegedly hosed one of these files by screwing up the text encoding. Nobody in the room, including the freaker-outer, knew who this was or exactly how or when it had happened, but nonetheless, it had been written in stone that from that day, Thou Shalt Use KDiff if you want to avoid punishment in the form of Biblical plagues.

This certainly wasn’t my only experience with such a thing. More recently, I overheard that the procedure for editing a particular file in source control was to grab the latest, edit it really fast and then check it in immediately. I was a bit perplexed by this until I learned that the goal was to avoid merge conflicts as this was a commonly edited file. I thought, why not just say in the first place that a lot of people edit this file and that merge conflicts are bad, in the same way that I had previously thought “why not just tell people that you’re worried about messed up encodings?”

And that line of questioning really drives to the heart of the matter here. Developers are generally quite skilled and pretty intelligent people and, more importantly, they tend to be be people that like to solve problems. So why not give them the benefit of the doubt when you’re working with them? Instead of giving your peers rote procedures to follow because they happened to work for you once, why not explain the problem and say “this is how I solved it, but if you have a better idea, I’m all ears!”

And you know what? They might. What if instead of forcing people to use KDiff, someone had written a validation script for the offending file that prevented checkins of a badly formed edit? What if instead of having flash edit-checkins of a file, the design of the application were altered to eliminate the contention and potential for error around that file? I suggest these things because I’m a fan of making the bad impossible. Are they the greatest ideas? Are they even practical in those situations? I honestly don’t know, but they might be approaches that have the advantage of placing fewer restrictions on developers or meaning one less thing to remember.

I would encourage you to trust your peers to grasp the bigger picture whenever possible (unless perhaps they’ve demonstrated that this trust isn’t warranted). As I say in the title, it seems like a good idea to favor describing desired outcomes over inventing rules to be memorized. With the former approach you free people you work with to be creative. With the latter approach, you constrain them and possibly even stress them out or frustrate them. And who knows, they may even surprise you with ideas and solutions that make your life easier.