DaedTech

Stories about Software

By

Comments: Here’s my Code and I’m Sorry

Heavily Commented Code: The Awful Empathy

Imagine a feeling–that empathy you get when you’re watching someone fail in front of an audience. Perhaps its a comedian relying on awful puns and props or a person in a public speaking course that just freezes after forgetting some lines. Ugh. It’s so awkward that it hurts. It’s almost like you’re up there with them. You start mumbling to yourself, “please, just be funny,” or, “please, just remember your lines.”

BanjoAccordionThe agony of sustained failure in such a situation doesn’t come on all at once. It creeps up on you in a crescendo of awful empathy and becomes memorable as it approaches the crest. But it starts subtly, before you realize it. There are warning signs that the train is unstable before it actually pops off of the rails. A classic warning sign is the “pre-excuse.”

Think of how you feel when someone gets up to talk at a meeting or give a presentation, and they start off with something like, “okay, this isn’t really finished, and Jones was supposed to help, but he got assigned to the Smith contract, and I did things in here that I’m not proud of, and…” With every clause that the speaker tacks onto the mounting lists of reasons that you’re going to hate it, you feel your discomfort mounting–so much so that you may even get preemptively angry or impatient because you know he’s going to bomb and you’re about to feel that tooth-grinding failure-empathy.

Okay. Now that the stage is set and we’re imagining the same feeling, know that this is how I feel when I open code file and see the green of comments (this is the color of comments in all my IDEs) prominently in a file. It’s as though the author of the code is on a stage in front of me, and he’s saying, “okay, so this is probably not very clear, and some of this is actually completely wrong, and changing this would be a nightmare, and you might want a beer or two, heh heh, but really, this will make more sense if you’re drunk, and, you know what, I’m sorry, really, really sorry because this is just, well, it’s just… it’s complete garbage. Sorry.”

That might seem a bit harsh, but think of what you’re looking at when you see code with a comment to actual code ratio approaching 1:1. You’re looking at a situation where someone needed to spend as much space and probably as much time trying to tell you what the code says as writing the code. Why would someone do that? Why would someone write a bunch of code and then write a bunch of English explaining to someone fluent in code what the code does? This is like me sending you an email in Spanish and putting the English equivalent after every sentence. I would do it if one of the two of us didn’t speak Spanish well or at all. And that’s how I feel when I see all those comments–either you don’t speak code very well or you think that I don’t speak code very well. The former occurs a lot with people who program haphazardly by coincidence. (“I better write this down in a comment because I had no idea that’s what an array was for. Who knew?”) The latter generates mind-numbing comments that rot. (“Declares an int called x and initializes it to 6.”) If you aren’t being forced to write comments by some kind of style policy and you’re not Zorro, you’re writing things in English because you’re not bothering to write and illuminate them in Code (I’m using the uppercase to distinguish simply writing some kind of compiling code from writing Code that communicates).

Self-Fulfilling Prophecy

There’s a substantial cross section of the developer world that thinks diligently commenting code is not only a best practice, but also table stakes for basic caring and being good at your craft. As I’ve previously explained, I used to be one of those people. I viewed writing comments in the same way that I view shutting drawers when I’m done using them or making my bed–just grunt work that’s unavoidable in life if you want to be a person that’s organized. Interestingly, I never really viewed them as particularly communicative, and, since adopting TDD and writing tiny methods, I viewed them largely as superfluous except for occasionally explaining some kind of political decision that transcended code (documenting APIs that you’ll release for public consumption are also an exception as this becomes part of your deliverable product). But I started to get increasingly disillusioned with the way comments would look in group code.

I would return to a method that I’d written six months earlier and for which I’d put a very clear doc comment, only to see something like this:

/// Do something with X
/// Never ever ever ever accept a negative value X
/// or millions of the most adorable puppies you've ever seen
/// will be senselessly butchered!!!!!!!
private void DoSomethingWithX(int x)
{
    //if(x < 0)
    // throw new ArgumentException("x", "You monster.");

    int y = x; //No matter what, do not let y be 15!!!!
    y = 15;
    PossibleDoom(y);
    PossibleDoom(x);
}

Holy crap! We're clearly playing Russian Roulette here. Did the requirements change and we're no longer endangering puppies? Is this code causing terrible things to happen? Who wrote that comment about 15? Hopefully not the same person that wrote the next line! And what should this code do--what the various comments say or what it actually does? Do the people responsible even work here anymore?

I'm pretty sure that anyone reading is chuckling and nodding sympathetically right now. You can only return to a method that you've written and see this sort of thing so many times before you come to a few conclusions:

  1. People almost never read your comments or any comments--they're just a step above contract legalese as far as pointless noise in our lives goes.
  2. Even if someone does read your comments, they certainly won't fix them along with the code they're changing.
  3. On a long enough timeline, your comments will all become dirty, confusing lies.
  4. If you want to minimize the degree to which you're a liar, minimize the comments that you write.

Surely this isn't really news to anyone, and it's probably answered with admonishments and mental notes to be more diligent about updating comments that everyone knows won't actually happen. So why is it then considered good form and often mandated to put lies into source control for the later 'benefit' of others? Why do we do this, discounting the bed-making/diligence/good-citizen motivation?

To answer that question, think of the kind of code where you see comments and the kind of code where you don't. If you see a four-line functional method with a single nested loop and no local variables, do you generally see comments in there? Probably not. How about a fifty line method with so many nested control structures that you need some kind of productivity add-in to know if you're scoped inside that else you saw earlier or another if, or maybe a while loop? Bingo--that's where comments go to hang out. Giant methods. Classes with lots of responsibilities and confusing internal state. Cryptically-named local variables. These things are all like cool, dank yards after a storm, sprouting explanatory comments like so many mushrooms. They sit there in mute testimony to the mildew-ridden, fungus-friendly conditions around them.

To put it another way, comments become necessary because the author isn't speaking Code well and punts, using English instead of fixing the code to be clear and expressive. Thus the comments are compensation for a lack of clarity. But they're more than that. They're an implied apology for the code as well. They're an apology for writing code and not Code. They're an apology for the fact that writing code and not Code results in the project being a legacy project before it's even done being written. They're an implied apology for big, lumbering classes, winding methods, confusing state, and other obfuscations of intent. But most of all, they're the preemptive, awkward-empathy-inducing, "hang onto your hat because what I'm doing here is actually nuts" pre-apologies/excuses to anyone with the misfortune of reading the code.

So please, I beg you--next time you find yourself thinking, "dude, nobody's ever going figure this wackiness out unless I spend a few sentences explaining myself," don't bother with the explanation. Bother instead to correct the "nobody's ever going to figure this out" part. Good Code speaks for itself so that you can focus on more important things.

By

A Metaphor to Help You Suck at Writing Software

“No plan survives contact with the enemy” –Helmuth von Moltke the Elder

Bureaucracy 101

Let’s set the scene for a moment. You’re a workaday developer in a workman kind of shop. A “waterfall” shop. (For back story on why I put quotes around waterfall, see this post). There is a great show of force when it comes to building software. Grand plans are constructed. Requirements gathering happens in a sliding sort of way where there is one document for vague requirements, another document for more specific requirements, a third document for even more specific requirements than that, and repeat for a few more documents. Then, there is the spec, the functional spec, the design spec, and the design document. In fact, there are probably several design documents.

There aren’t just the typical “waterfall” phases of requirements->design->code->test->toss over the wall, but sub-phases and, when the organism grows large enough, sub-sub-phases. There are project managers and business managers and many other kinds of managers. There are things called change requests and those have their own phases and documents. Requirements gathering is different from requirements elaboration. Design sub-phases include high-level, mid-level and low-level. If you carefully follow the process, most likely published somewhere as a mural-sized state machine or possibly a Gantt chart unsurpassed in its perfect hierarchical beauty, you will achieve the BUFD nirvana of having the actual writing of the code require absolutely no brain power. Everything will be so perfectly planned that a trained ape could write your software. That trained ape is you, workaday developer. Brilliant business stakeholder minds are hard at work perfecting the process of planning software in such fine grained detail that you need not trouble yourself with much thinking or problem solving.

Dude, wait a minute. Wat?!? That doesn’t sound desirable at all! You wake up in the middle of the night one night, sit bolt upright and are suddenly fundamentally unsure that this is really the best approach to building a thing with software. Concerned, you approach some kind of senior business project program manager and ask him about the meaning of developer life in your organization. He nods knowingly, understandingly and puts one arm on your shoulders, extending the other out in broad, professorial arc to help you share his vision. “You see my friend,” he says, “writing software is like building a skyscraper…” And the ‘wisdom’ starts to flow. Well, something starts to flow, at any rate.

Let’s Build a Software Skyscraper

Like a skyscraper, you can’t just start building software without planning and a lot of upfront legwork. A skyscraper can’t simply be assembled by building floors, rooms, walls, etc independently and then slapping them altogether, perhaps interchangeably. Everything is necessarily interdependent and tightly coupled. Just like your software. In the skyscraper, you simply can’t build the 20th floor before the 19th floor is built and you certainly can’t interchange those ‘parts’ much like in your software you can’t have a GUI without a database and you can’t just go swapping persistence models once you have a GUI. In both cases every decision at every point ripples throughout the project and necessarily affects every future decision. Rooms and floors are set in stone in both location and order of construction just as your classes and modules in a software project have to be built in a certain order and can never be swapped out from then on.Jenga

But the similarities don’t end with the fact that both endeavors involve an inseparable web of complete interdependence. It extends to holistic approaches and cost as well. Since software, like a skyscraper, is so lumbering in nature and so permanent once built, the concept of prototyping it is prima facie absurd. Furthermore, in software and skyscrapers, you can’t have a stripped-down but fully functional version to start with — it’s all or nothing, baby. Because of this it’s important to make all decisions up-front and immediately even when you might later have more information that would lead to a better-informed decision. There’s no deferral of decisions that can be made — you need to lock your architecture up right from the get-go and live with the consequences forever, whatever and however horrible they might turn out to be.

And once your software is constructed, your customers better be happy with it because boy-oh-boy is it expensive, cumbersome and painful to change anything about it. Like replacing the fortieth floor on a skyscraper, refactoring your software requires months of business stoppage and a Herculean effort to get the new stuff in place. It soars over the budget set forth and slams through and past the target date, showering passerby with falling debris all the while.

To put it succinctly in list form:

  1. There is only one sequence in which to build software and very little opportunity for deviation and working in parallel.
  2. Software is not supposed to be modular or swappable — a place for everything and everything in its place
  3. The concept of prototyping is nonsensical — you get one shot and one shot only.
  4. It is impossible to defer important decisions until more information is available. Pick things like database or markup language early and live with them forever.
  5. Changing anything after construction is exorbitantly expensive and quite possibly dangerous

Or, to condense even further, this metaphor helps you build software that is brittle and utterly cross-coupled beyond repair. This metaphor is the perfect guide for anyone who wants to write crappy software.

Let’s Build an Agile Building

Once you take the building construction metaphor to its logical conclusion, it seems fairly silly (as a lot of metaphors will if you lean too heavily on them in their weak spots). What’s the source of the disconnect here? To clarify a bit, let’s work backward into the building metaphor starting with good software instead of using it to build bad software.

AgileBuildingA year or so ago, I went to a talk given by “Uncle” Bob Martin on software professionalism. If I could find a link to the text of what he said, I would offer it (and please comment if you have one) but lacking that, I’ll paraphrase. Bob invited the audience to consider a proposition where they were contracting to have a house built and maintained with a particular contractor. The way this worked was you would give the contractor $100 and he would build you anything you wanted in a day. So, you could say “I want a two bedroom ranch house with a deck and a hot-tub and 1.5 bathrooms,” plop down your $100 and come back tomorrow to find the house built to your specification. If it turned out that you didn’t like something about it or your needs changed, same deal applied. Want another wing? Want to turn the half bath into a full bath? Want a patio instead of a deck? Make your checklist, call the contractor, give him $100 and the next day your wish would be your house.

From there, Bob invited audience members to weigh two different approaches to house-planning: try-it-and-see versus waterfall’s “big design up front.” In this world, would you hire expert architects to form plans and carpenters to flesh them out? Would you spend weeks or months in a “planning phase”? Or would you plop down $100 and say, “well, screw it — I’ll just try it and change it if I don’t like it?” This was a rather dramatic moment in the talk as the listener realized just before Bob brought it home that given a choice between agile, “try it and see” and waterfall “design everything up front” nobody sane would choose the latter. The “waterfall” approach to houses (and skyscrapers) is used because a better approach isn’t possible and not because it’s a good approach when there are alternatives.

Wither the Software-Construction Canard?

Given the push toward Agile software development in recent years and the questionable parallels of the metaphor in the first place, why does it persist? There is no shortage of people who think this metaphor is absurd, or at least misguided:

  1. Jason Haley, “It’s not like Building a House”
  2. Terence Parr, “Why writing software is not like engineereing”
  3. James Shore, “That Damned Construction Analogy”
  4. A whole series of people on stackoverlow
  5. Nathaniel T. Schutta, Why Software Development IS Like Building a House (Don’t let the title fool you – give this one a detailed read)
  6. Thomas Guest, “Why Software Development isn’t Like Construction”

If you google things like “software construction analogy” you will find literally dozens of posts like these.

So why the persistence? Well, if you read the last article, by Thomas Guest, you’ll notice a reference to Steve McConnell’s iconic book “Code Complete.” This book has an early chapter that explores a variety of metaphors for software development and offers this one up. In my first daedtech post I endorsed the metaphor but thought we could do better. I stand by that endorsement not because it’s a good metaphor for how software should be developed but because it’s a good metaphor for how it is developed. As in our hypothetical shop from the first section of the post, many places do use this approach to write (often bad) software. But the presence of the metaphor in McConnell’s book and for years and years before that highlights one of the main reasons for persistence: interia. It’s been around a long time.

But I think there’s another, more subtle reason it sticks around. Hard as it was to find pro posts about the software-construction pairing, the ones I did find share an interesting trait. Take a look at this post, for instance. As “PikeWake” gets down to explaining the metaphor, the first thing that he does is talk about project managers and architects (well, the first thing is the software itself, but right after that come the movers and shakers). Somewhere below that the low-skill grunts who actually write the software get a nod as well. Think about that for a moment. In this analogy, the most important people to the software process are the ones with corner offices, direct reports and spreadsheets, and the people who actually write the software are fungible drones paid to perform repetitive action, rather than work. Is it any wonder that ‘supervisors’ and other vestiges of the pre-Agile, command and control era love this metaphor? It might not make for good software, but it sure makes for good justification of roles. It’s comfortable in a world where companies like github are canning the traditional, hierarchical model, valuing the producers over the supervisors, and succeeding.

Perhaps that’s a bit cynical, but I definitely think there’s more than a little truth there. If you stripped out all of the word documents, Gantt charts, status meetings and other typical corporate overhead and embraced a world where developers could self-organize, prioritize and adapt, what would people with a lot of tenure but not a lot of desire or skill at programming do? If there were no actual need for supervision, what would happen? These can be unsettling, game changing questions, so it’s easier to cast developers as low-skill drones that would be adrift without clever supervisors planning everything for them than to dispense with the illusion and realize that developers are highly skilled, generally very intelligent knowledge workers quite capable of optimizing processes in which they participate.

In the end, it’s simple. If you want comfort food for the mid-level management set and mediocrity, then invite someone in to sweep his arm professorially and spin a feel-good tale about how building software is like building houses and managing people is like a father nurturing his children. If you want to be effective, leave the construction metaphor in the 1980s where it belongs.

By

Hilarious Conditional Bloopers!

For this Friday, I thought I’d do something a little more lighthearted and, in the tradition of bad television (or Robot Chicken’s satire thereof) post some programming bloopers. These are actual things that I’ve personally seen in source code as opposed to some kind of specific sampling of CodeSOD from the Daily WTF. Doing this series of posts about Boolean algebra made me think conditional logic fails I’ve seen both recently and long in the past.

For each one, I’ve tried my best to give it a catchy name, an explanation of the problem, an example of what it translates to in simple English (i.e. why it doesn’t “read like well written prose”), and what it ought to look like. So, without further ado, here are the bloopers:

The Ingrown Branch

private int _numberOfMilkCartons;
private bool _amIOutOfMilk;

public void FigureOutWhatToDo()
{
    if(_numberOfMilkCartons != 12)
        _numberOfMilkCartons = 12;
}

I call this The Ingrown Branch because of what it does. It introduces a conditional — a fork in the road, if you will — and it winds up in the same spot no matter which branch you take. In conversational English, this says “if the number of milk cartons is not 12, make it 12”. While this doesn’t sound ridiculous in the same way that others here will, consider what’s being done. If x is equal to 12, well, then do nothing because x is equal to 12. Otherwise, if it’s not equal to 12, set it to 12. What would be a simpler way to do this?

private int _numberOfMilkCartons;
private bool _amIOutOfMilk;

public void FigureOutWhatToDo()
{
    _numberOfMilkCartons = 12;
}

The Tautology

private bool _amIOutOfMilk;

public void FigureOutWhatToDo()
{
    if(_amIOutOfMilk || !_amIOutOfMilk)
        GoToTheStore();
}

In conversational English, a tautology is something that is vacuously true or redundant. In logic, this is something that is always true, ipso facto, such as “A or NOT(A)”, for instance. In terms of conversational English, this is like saying “If I’m out of milk or if I’m not out of milk, I’m going to go buy some milk.” Why not drop the spurious conditionals and get to the point:

public void FigureOutWhatToDo()
{
    GoToTheStore();
}

The Contradiction

private bool _amIOutOfMilk;

public void FigureOutWhatToDo()
{
    if(_amIOutOfMilk == !_amIOutOfMilk)
        GoToTheStore();
}

The opposite of a tautology, a contradiction is something that is vacuously false, such as primitive type not being equal to itself. With instances like this and the tautology, I usually see more complex incarnations that are harder to spot or else I give the benefit of the doubt and assume that manipulation of a more complex conditional occurred in the past and the thing was accidentally left in this condition. But this doesn’t alter the fact that I have seen code like this and that, in plain English, this would translate to “If I’m both completely out of milk and I have some milk, I’m going to buy milk.” It’s mind-bending nonsense that would best be described as:

//Yep, nothing at all

The Double Negative

private bool _amINotOutOfMilk;

public void FigureOutWhatToDo()
{
    if(!_amINotOutOfMilk)
        GoToTheStore();
}

I realize that this may be largely a product of speaking English as a first language, since double (and more) negatives are acceptable in some other languages. But you have to look at code like this and think, did anyone read this to themselves? “If I it’s false that I’m not out of milk, I will go to the store.” Wat? Okay, so not out of milk means that you have it, so if it’s false that you’re not out of milk, it’s false that you have it, and you are out of milk… aha! Why didn’t you just say so:

public void FigureOutWhatToDo()
{
    if(_amIOutOfMilk)
        GoToTheStore();
}

Ifception

public void FigureOutWhatToDo()
{
    if(_amIOutOfMilk)
        if(_amIOutOfEggs)
            if(_amIOutOfBeer)
                GoToTheStore();
}

An if within an if within an if… (credit to Dan Martin for this term). This is another mind-bending way of writing things that is rather jarring to the reader of the code, like saying “If I’m out of milk if I’m out of eggs if I’m out of beer, then I’m going to the store.” Dude, wat? Oh, you mean “If you’re out of milk AND you’re out of eggs AND you’re out of beer, then you’re going to the store? Well, nice to see what your breakfast priorities are, but at least that doesn’t read like the mumblings of a lunatic.”

The Robot

public void FigureOutWhatToDo()
{
    if(_amIOutOfMilk == true)
        GoToTheStore();
}

Perhaps this is a little nitpicky, but this explicit formation of Boolean conditionals bothers me. “If it equals true that I am out of milk, I will go to the store” sounds like some robot helper from the Jetsons or one of those shows that features a preposterous token “genius” whose intelligence is conveyed by having him speak like some robot helper from the Jetsons. Why not “If I’m out of milk, I will go to the store?”

public void FigureOutWhatToDo()
{
    if(_amIOutOfMilk)
        GoToTheStore();
}

The Yoda

private Milk _theMilk;

public void FigureOutWhatToDo()
{
    if(null == _theMilk)
        GoToTheStore();
}

If program in C you do, sense this makes and a clever trick to avoid assignment instead of comparison errors this is. If program in C you don’t, annoying and indicative that you’re not adopting the thinking of the language you’re working this is. When you speak English and try to sound like a native speaker, you don’t say “If missing is the milk, go to the store”. You say “If the milk is missing, go to the store.”

public void FigureOutWhatToDo()
{
    if(_theMilk == null)
        GoToTheStore();
}

The Existential No-Op

public void FigureOutWhatToDo()
{
    if(_amIOutOfMilk)
    {
        //GoToTheStore();
    }
}

Or, see variations where the comment is replaced by “return;” or some other similar thing. This is a conditional where, true or false, you do nothing. It sort of makes you question the very nature of (its) existence. This is like me saying to you “If I’m out of milk…” When you wait patiently for a moment and say “yes…?” I then say “nothing — that was all.” What should this be replaced with? How about just deleting the whole bit of nonsense?

Growing Pains

public void FigureOutWhatToDo()
{
    if(_amIOutOfMilk && _amIOutOfEggs && _amIOutOfBeer && _amIOutOfCheese && _amIOutOfChips && _amIOutOfMilk)
        GoToTheStore();
}

See what’s going on here? This conditional is growing so unwieldy that you forget by the end of it that you already mentioned being out of milk again. “If I’m out of milk, eggs, beer and milk, I’m going to the store.” “You said milk twice.” “I like milk.” How about dividing it up a bit and saying “If I am out of staples and I’m out of snacks, then I’m going to the store.”

public void FigureOutWhatToDo()
{
    if(AmIOutOfStaples() && AmIOutOfSnacks())
        GoToTheStore();
}

private bool AmIOutOfSnacks()
{
    return _amIOutOfBeer && _amIOutOfChips;
}

private bool AmIOutOfStaples()
{
    return _amIOutOfMilk && _amIOutOfEggs && _amIOutOfCheese;
}

The Mad Scoper

public void FigureOutWhatToDo()
{
    if(((AmIOutOfStaples())) && (AmIOutOfSnacks()))
        GoToTheStore();
}

I think we’ve all seen one of these — someone on the team or in the group has a few too many cups of coffee and really goes to town on the old 9 and 0 keys. This is probably done to make sure that order of operations is being followed when you don’t understand the order of operations. Conversational equivalent? “If I am out of staples, and I mean staples and not whatever is coming next until I’m ready to talk about that and now I’m ready so I’m going to talk about that and that is snacks and not staples we’re not talking about staples anymore we’re talking about snacks which if I’m out of I’m also not going to the store, okay done.” Let’s dial it back to the last solution:

public void FigureOutWhatToDo()
{
    if(AmIOutOfStaples() && AmIOutOfSnacks())
        GoToTheStore();
}

The Fly Swallower (aka The Train Wreck)

public class OldWoman
{
    public Horse _horse = new Horse();

    public object WhatDidYouSwallow()
    {
        if (_horse != null && _horse.Cow != null && _horse.Cow.Hog != null && _horse.Cow.Hog.Dog != null &&
            _horse.Cow.Hog.Dog.Cat != null && _horse.Cow.Hog.Dog.Cat.Bird != null &&
            _horse.Cow.Hog.Dog.Cat.Bird.Spider != null &&
                _horse.Cow.Hog.Dog.Cat.Bird.Spider.Fly != null)
            return _horse.Cow.Hog.Dog.Cat.Bird.Spider.Fly;

        return null;
    }
}

This is formally known as design with violations of the Law of Demeter, but it’s easier just to think of it as a train wreck. But the name I’m giving it comes from a nursery rhyme, which is how this starts to sound in conversational English. “There was an old lady who if she swallowed a horse who if it swallowed a cow who if it swallowed a hog who if it swallowed a dog…” How should this sound? There’s no easy fix. You need a different object model.

And with that, I’ll conclude my fun Friday post. This is meant to be light-hearted and in jest, but I’d say there’s definitely a good bit of truth here. You may not agree entirely with my assessment, but I think we’d all be well served to do the occasional double check to make sure we’re not leaving conditional bloopers in the code for others to read, triggering pointing and laughter. If you have other names for these or other conditional bloopers to mention (or you think I’m completely off base with one of these) please feel free to chime in.

By the way, if you liked this post and you're new here, check out this page as a good place to start for more content that you might enjoy.

By

How to Create Good Abstractions: Oracles and Magic Boxes

Oracles In Math

When I was a freshman in college, I took a class called “Great Theoretical Ideas In Computer Science”, taught by Professor Steven Rudich. It was in this class that I began to understand how fundamentally interwoven math and computer science are and to think of programming as a logical extension of math. We learned about concepts related to Game Theory, Cryptography, Logic Gates and P, NP, NP-Hard and NP-Complete problems. It is this last set that inspires today’s post.

This is not one of my “Practical Math” posts, so I will leave a more detailed description of these problems for another time, but I was introduced to a fascinating concept here: that you can make powerful statements about solutions to problems without actually having a solution to the problem. The mechanism for this was an incredibly cool-sounding concept called “The Problem Solving Oracle”. In the world of P and NP, we could make statements like “If I had a problem solving oracle that could solve problem X, then I know I could solve problem Y in polynomial (order n squared, n cubed, etc) time.” Don’t worry if you don’t understand the timing and particulars of the oracle — that doesn’t matter here. The important concept is “I don’t know how to solve X, but if I had a machine that could solve it, then I know some things about the solutions to these other problems.”

It was a powerful concept that intrigued me at the time, but more with grand visions of fast factoring and solving other famous math problems and making some kind of name for myself in the field of computer science related mathematics. Obviously, you’re not reading the blog of Fields Medal winning mathematician, Erik Dietrich, so my reach might have exceeded my grasp there. However, concepts like this don’t really fall out of my head so much as kind of marinate and lie dormant, to be re-appropriated for some other use.

Magic Boxes: Oracles For Programmers

One of the most important things that we do as developers and especially as designers/architects is to manage and isolate complexity. This is done by a variety of means, many of which have impressive-sounding terms like “loosely coupled”, “inverted control”, and “layered architecture”. But at their core, all of these approaches and architectural concepts have a basic underlying purpose: breaking problems apart into isolated and tackle-able smaller problems. To think of this another way, good architecture is about saying “assume that everything else in the world does its job perfectly — what’s necessary for your little corner of the world to do the same?”

That is why the Single Responsibility Principle is one of the most oft-cited principles in software design. This principle, in a way, says “divide up your problems until they’re as small as they can be without being non-trivial”. But it also implies “and just assume that everyone else is handling their business.”

Consider this recent post by John Sonmez, where he discusses deconstructing code into algorithms and coordinators (things responsible for getting the algorithms their inputs, more or less). As an example, he takes a “Calculator” class where the algorithms of calculation are mixed up with the particulars of storage and separates these concerns. This separates the very independent calculation algorithm from the coordinating storage details (which are of no consequence to calculations anyway) in support of his point, but it also provides the more general service of dividing up the problem at hand into smaller segments that take one another granted.

Another way to think of this is that his “Calculator_Mockless” class has two (easily testable) dependencies that it can trust to do their jobs. Going back to my undergrad days, Calculator_Mockless has two Oracles: one that performs calculations and the other that stores stuff. How do these things do their work? Calculator_Mockless doesn’t know or care about that; it just provides useful progress and feedback under the assumption that they do. This is certainly reminiscent of the criteria for “Oracle”, an assumption of functionality that allows further reasoning. However, “Oracle” has sort of a theoretical connotation in the math sense that I don’t intend to convey, so I’m going to adopt a new term for this concept in programming: “Magic Box”. John’s Calculator_Mockless says “I have two magic boxes — one for performing calculations and one for storing histories — and given these boxes that do those things, here’s how I’m going to proceed.”

How Spaghetti Code Is Born

It’s one thing to recognize the construction of Magic Boxes in the code, but how do you go about building and using them? Or, better yet, go about thinking in terms of them from the get-go? It’s a fairly sophisticated and not always intuitive method for deconstructing programming problems.

nullTo see what I mean, think of being assigned a hypothetical task to read an XML file full of names (right), remove any entries missing information, alphabetize the list by last name, and print out “First Last” with “President” pre-pended onto the string. So, for the picture here, the first line of the output should be “President Grover Cleveland”. You’ve got your assignment, now, quick, go – start picturing the code in your head!

What did you picture? What did you say to yourself? Was it something like “Well, I’d read the file in using the XDoc API and I’d probably use an IList<> implementer instead of IEnumerable<> to store these things since that makes sorting easier, and I’d probably do a foreach loop for the person in people in the document and while I was doing that write to the list I’ve created, and then it’d probably be better to check for the name attributes in advance than taking exceptions because that’d be more efficient and speaking of efficiency, it’d probably be best to append the president as I was reading them in rather than iterating through the whole loop again afterward, but then again we have to iterate through again to write them to the console since we don’t know where in the list it will fall in the first pass, but that’s fine since it’s still linear time in the file size, and…”

And slow down there, Sparky! You’re making my head hurt. Let’s back up a minute and count how many magic boxes we’ve built so far. I’m thinking zero — what do you think? Zero too? Good. So, what did we do there instead? Well, we embarked on kind of a willy-nilly, scattershot, and most importantly, procedural approach to this problem. We thought only in terms of the basic sequence of runtime operations and thus the concepts kind of all got jumbled together. We were thinking about exception handling while thinking about console output. We were thinking about file reading while thinking about sorting strings. We were thinking about runtime optimization while we were thinking about the XDocument API. We were thinking about the problem as a monolithic mass of all of its smaller components and thus about to get started laying the bedrock for some seriously weirdly coupled spaghetti code.

Cut that Spaghetti and Hide It in a Magic Box

Instead of doing that, let’s take a deep breath and consider what mini-problems exist in this little assignment. There’s the issue of reading files to find people. There’s the issue of sorting people. There’s the issue of pre-pending text onto the names of people. There’s the issue of writing people to console output. There’s the issue of modeling the people (a series of string tuples, a series of dynamic types, a series of anonymous types, a series of structs, etc?). Notice that we’re not solving anything — just stating problems. Also notice that we’re not talking at all about exception handling, O-notation or runtime optimization. We already have some problems to solve that are actually in the direct path of solving our main problem without inventing solutions to problems that no one yet has.

So what to tackle first? Well, since every problem we’ve mentioned has the word “people” in it and the “people” problem makes no mention of anything else, we probably ought to consider defining that concept first (reasoning this way will tell you what the most abstract concepts in your code base are — the ones that other things depend on while depending on nothing themselves). Let’s do that (TDD process and artifacts that I would normally use elided):

public struct Person
{
    public string FirstName { get; set; }
    public string LastName { get; set; }

    public override string ToString()
    {
        return string.Format("{0} {1}", FirstName, LastName);
    }
}

Well, that was pretty easy. So, what’s next? Well, the context of our mini-application involves getting people objects, doing things to them, and then outputting them. Chronologically, it would make the most sense to figure out how to do the file reads at this point. But “chronologically” is another word for “procedurally” and we want to build abstractions that we assemble like building blocks rather than steps in a recipe. Another, perhaps more advantageous way would be to tackle the next simplest task or to let the business decide which functionality should come next (please note, I’m not saying that there would be anything wrong with implementing the file I/O at this point — only that the rationale “it’s what happens first sequentially in the running code” is not a good rationale).

Let’s go with the more “agile” approach and assume that the users want a stripped down, minimal set of functionality as quickly as possible. This means that you’re going to create a full skeleton of the application and do your best to avoid throw-away code. The thing of paramount importance is having something to deliver to your users, so you’re going to want to focus mainly on displaying something to the user’s medium of choice: the console. So what to do? Well, imagine that you have a magic box that generates people for you and another one that somehow gets you those people. What would you do with them? How ’bout this:

public class PersonConsoleWriter
{
    public void WritePeople(IEnumerable people)
    {
        foreach (var person in people)
            Console.Write(person);
    }
}

Where does that enumeration come from? Well, that’s a problem to deal with once we’ve handled the console writing problem, which is our top priority. If we had a demo in 30 seconds, we could simply write a main that instantiated a “PersonConsoleWriter” and fed it some dummy data. Here’s a dead simple application that prints, well, nothing, but it’s functional from an output perspective:

public class PersonProcessor
{
    public static void Main(string[] args)
    {
        var writer = new PersonConsoleWriter();
        writer.WritePeople(Enumerable.Empty());
    }
}

What to do next? Well, we probably want to give this thing some actual people to shove through the pipeline or our customers won’t be very impressed. We could new up some people inline, but that’s not very impressive or helpful. Instead, let’s assume that we have a magic box that will fetch people out of the ether for us. Where do they come from? Who knows, and who cares — not main’s problem. Take a look:

public class PersonStorageReader
{
    public IList GetPeople()
    {
        throw new NotImplementedException();
    }
}

Alright — that’s pretty magic box-ish. The only way it could be more so is if we just defined an interface, but that’s overkill for the moment and I’m following YAGNI. We can add an interface if thing later needs to pull customers out of a web service or something. At this point, if we had to do a demo, we could simply return the empty enumeration instead of throwing an exception or we could dummy up some people for the demo here. And the important thing to note is that now the thing that’s supposed to be generating people is the thing that’s generating the people — we just have to sort out the correct internal implementation later. Let’s take a look at main:

public static void Main(string[] args)
{
    var reader = new PersonStorageReader();
    var people = reader.GetPeople();

    var writer = new PersonConsoleWriter();
    writer.WritePeople(people);
}

Well, that’s starting to look pretty good. We get people from the people reader and feed them to the people console writer. At this point, it becomes pretty trivial to add sorting:

public static void Main(string[] args)
{
    var reader = new PersonStorageReader();
    var people = reader.GetPeople().OrderBy(p => p.LastName);

    var writer = new PersonConsoleWriter();
    writer.WritePeople(people);
}

But, if we were so inclined, we could easily look at main and say “I want a magic box that I hand a person collection to and it gives me back a sorted person collection, and that could be stubbed out as follows:

public static void Main(string[] args)
{
    var reader = new PersonStorageReader();
    var people = reader.GetPeople();

    var sorter = new PersonSorter();
    var sortedPeople = sorter.SortList(people);

    var writer = new PersonConsoleWriter();
    writer.WritePeople(people);
}

The same kind of logic could also be applied for the “pre-pend the word ‘President'” requirement. That could pretty trivially go into the console writer or you could abstract it out. So, what about the file logic? I’m not going to bother with it in this post, and do you know why? You’ve created enough magic boxes here — decoupled the program enough — that it practically writes itself. You use an XDocument to pop all Person nodes and read their attributes into First and Last name, skipping any nodes that don’t have both. With Linq to XML, how many lines of code do you think you need? Even without shortcuts, the jump from our stubbed implementation to the real one is small. And that’s the power of this approach — deconstruct problems using magic boxes until they’re all pretty simple.

Also notice another interesting benefit which is that the problems of runtime optimization and exception handling now become easier to sort out. The exception handling and expensive file read operations can be isolated to a single class and console writes, sorting, and other business processing need not be affected. You’re able to isolate difficult problems to have a smaller surface area in the code and to be handled as requirements and situation dictate rather than polluting the entire codebase with them.

Pulling Back to the General Case

I have studiously avoided discussion of how tests or TDD would factor into all of this, but if you’re at all familiar with testable code, it should be quite apparent that this methodology will result in testable components (or endpoints of the system, like file readers and console writers). There is a deep parallel between building magic boxes and writing testsable code — so much so that “build magic boxes” is great advice for how to make your code testable. The only leap from simply building boxes to writing testable classes is to remember to demand your dependencies in constructors, methods and setters, rather than instantiating them or going out and finding them. So if PersonStorageReader uses an XDocument to do its thing, pass that XDocument into its constructor.

But this concept of magic boxes runs deeper than maintainable code, TDD, or any of the other specific clean coding/design principles. It’s really about chunking problems into manageable bites. If you’re even writing code in a method and you find yourself thinking “ok, first I’ll do X, and then-” STOP! Don’t do anything yet! Instead, first ask yourself “if I had a magic box that did X so I didn’t have to worry about it here and now, what would that look like?” You don’t necessarily need to abstract every possible detail out of every possible place, but the exercise of simply considering it is beneficial. I promise. It will help you start viewing code elements as solvers of different problems and collaborators in a broader solution, instead of methodical, plodding steps in a gigantic recipe. It will also help you practice writing tighter, more discoverable and usable APIs since your first conception of them would be “what would I most like to be a client of right here?”

So do the lazy and idealistic thing – imagine that you have a magic box that will solve all of your problems for you. The leap from “magic box” to “collaborating interface” to “implemented functionality” is much simpler and faster than it seems when you’re isolating your problems. The alternative is to have a system that is one gigantic, procedural “magic box” of spaghetti and the “magic” part is that it works at all.

By

The Myth of Quick Copy-And-Paste Programming

Quick and Dirty?

“We’re really busy right now–just copy that customer form and do a few find and replaces. I know it’s not a great approach, but we don’t have time to build the new form from scratch.”

Ever heard that one? Ever said it? Once, twice, dozens, or hundreds of times?

This has the conversational ring of, “I know I shouldn’t eat that brownie, but it’s okay. I’m planning to start exercising one of these days.” The subtext of the message is a willingness to settle–to trade self esteem for immediate gratification with the disclaimer “the outcome just isn’t important enough to me to do the hard thing.”

If that sounds harsh to you when put in the context of programming instead of diet and exercise, there’s a reason: you’re used to fooling yourself into believing that copy-and-paste programming sacrifices good design to save time. But the reality is that the “time savings” is a voluntary self-delusion and that it’s really trading good design for being able to “program” mindlessly for a few minutes or hours. Considered in this light, it’s actually pretty similar to gorging yourself on chocolate while thinking that you’ll exercise later when you can afford that gym membership and whatnot: “I’ll space out and fantasize about my upcoming vacation now while I’m programming and clean up the mess when I come back, refreshed.”

Hey, Wait a Minute

I know what you’re thinking. You’re probably indignant right now because you’re thinking of a time that you copied a 50,000 line file and you changed only two things in it.

Re-typing all 50,000 lines would have taken days, and you got the work done in minutes. The same number of minutes, in fact, that you’d have spent parameterizing those two differences and updating clients to call the new method, thus achieving good design and efficiency. Okay, so bad example.

Wait, Now I’ve Got It

Now, you’re thinking about the time that you copied that 50,000 line file and there were about 300 differences–no way you could easily have parameterized all of that. Only a copy, paste, and a bunch of find and replace could do the trick there.

After that, you were up and running in about an hour. And everything worked.

Oh, except for that one place where the text was a little different. You missed that one, but you found it after ten minutes of debugging.

Oh, and except for five more of those at ten or fifteen minutes a pop.

Oh, and then there was that twenty minutes you spent after the architect pointed out that a bunch of methods needed to be renamed because they made no sense named what they were in the new class. Then you were truly done.

Except, oh, crap, runtime binding was failing with that other module since you changed those method names to please the stupid architect. That was a doozy because no one noticed it until QA a week later, and then you spent a whole day debugging it and another day fixing it.

Oh, and then there was a crazy deadlock issue writing to the log file that some beta customer found three months later. As it turns out, you completely forgot that if the new and old code file methods executed in just the right interleaving, wackiness might ensue. Ugh, that took a week to reproduce and then another two weeks to figure out.

Okay, okay, so maybe that was a bad example of the copy-and-paste time savings.

Okay, Those Don’t Count

But you’re still mad at me. Maybe those weren’t the best examples, but all the other times you do it are good examples.

You’re getting things done and cranking out code. You’re doing things that get you 80% of the way there and making it so that you only have to do 20% of the work, rather than doing all 100% from scratch. Every time you copy and paste, you save 80% of the manpower (minus, of course, the time spent changing the parts of the 80% that turned out not to be part of the 80% after all).

The important point is that as long as you discount all of the things you miss while copying and pasting and all of the defects you introduce while doing it and all of the crushing technical debt you accrue while doing it and all of the downstream time fixing errors in several places, you’re saving a lot of time. I mean, it’s the same as how that brownie you were eating is actually pretty low in calories if you don’t count the flour, sugar, chocolate, butter, nuts, and oil.

Come to think of it, you’re practically losing weight by eating it.

We Love What We Have

Hmm…apparently, it’s easy to view an activity as a net positive when you make it a point to ignore any negatives. And it’s also understandable.

My flippant tone here is more for effect than it is meant to be a scathing indictment of people for cutting corners. There’s a bit of human psychology known as the Endowment Effect that explains a lot of this tendency. We have a cognitive bias to favor what we already have over what’s new, hypothetical, or in the possession of others.

Humans are loss averse (we feel the pain of a loss of an item more than we experience pleasure at acquiring the item, on average), and this leads to a situation in which we place higher economic value on things that we have than things that we don’t have. You may buy a tchotchke on vacation from a vendor for $10 and wouldn’t have paid a dollar more, yet when someone comes along and offers you $12 or $20 or even $30 for it, you suddenly get possessive and don’t want to part with it. This is the Endowment Effect in action.

We Love Our Copy Paste Code

What does this have to do with copying and pasting code? Well, if you substitute time/effort as your currency, there will be an innate cognitive bias toward adapting work that you’ve already done as opposed to creating some theoretical new implementation. In other words, you’re going to say, “This thing I already have is a source of more efficiency than whatever else someone might do.”

This means that, assuming each would take equal time and that time is the primary motivator, you’re going to favor doing something with what you already have over implementing something new from scratch (in spite of how much we all love greenfield coding). Unfortunately, it’s both easy and common to conflate the Endowment Effect’s cognitive bias toward reuse with the sloppy practice of copy and paste.

At the point of having decided to adapt existing code to the new situation, things can break one of two ways. You can achieve this reuse by abstracting and factoring common logic, or you can achieve it by copy and paste.

Once you’ve already decided on reuse versus blaze a new trail, the efficiency-as-currency version of the Endowment Effect has already played out in your mind–you’ve already, for better or for worse, opted to re-appropriate existing work. Now you’re just deciding between doing the favorable-but-hard thing or the sloppy-and-easy thing. This is why I said it’s more like opting to stuff your face with brownies for promises of later exercise than it is to save time at the expense of good design.

Copy Paste Programming as Empty Calories

Think about it. Copy and paste is satisfying the way those empty calories are satisfying. Doing busy-work looks a lot like doing mentally taxing work and is often valued similarly in suboptimally incentivized environments.

So, to copy and paste or not to copy and paste is often a matter of, “Do I want to listen to talk radio and space out while getting work done, or do I want to concentrate and bang my head against hard problems?” And if you do a real, honest self-evaluation, I’m pretty sure you’ll come to the same conclusion.

Copying and pasting is the reality television of programming–completely devoid of meaningful substance in favor of predictable, mindless repetition.

Quick and Dirty is Just Dirty

In the end, you make two decisions when you go the copy-and-paste route. You decide to adapt what you’ve got rather than invent something new, and that’s where the substance of the time/planning/efficiency decision takes place. And once you’ve made the (perfectly fine) decision to use what you’ve got, the next decision is whether to work (factor into a common location) or tune out and malinger (copy and paste).

And in the end, they’re going to be a wash for time. The up front time saved by not thinking through the design is going to be washed out by the time wasted on the defects that coding without thinking introduces, and you’re going to be left with extra technical debt even after the wash for a net time negative.

So, in the context of “quick and dirty,” the decision of whether new or reused is more efficient (“quick”) is in reality separated from and orthogonal to the decision of factor versus copy and paste (“dirty”). Next time someone tells you they’re going with the “quick and dirty” solution of copy and paste, tell them, “You’re not opting for quick and dirty–you’re just quickly opting for dirty.”

By the way, if you liked this post and you're new here, check out this page as a good place to start for more content that you might enjoy.