Stories about Software


Seeing the Value in Absolutes

The other day, I told a developer on my team that I wouldn’t write methods with more than three parameters. I said this in a context where many people would say, “don’t write code with more than three parameters in a method,” in that I am the project architect and coding decisions are mine to make. However, I feel that the way you phrase things has a powerful impact on people, and I believe code reviews that feature orders to change items in the code are creativity-killing and soul-sucking. So, as I’ve explained to people on any number of occasions, my feedback consists neither of statements like “that’s wrong” nor statements like “take that out.” I specifically and always say, “that’s not what I would do.” I’ve found that people listen to this the overwhelming majority of the time and, when they don’t, they often have a good reason I hadn’t considered. No barking of orders necessary.

But back to what I said a few days ago. I basically stated the opinion that methods should never have more than three parameters. And right after I had stated this, I was reminded of the way I’ve seen countless conversations go in person, on help sites like Stack Overflow, and in blog comments. Does this look familiar?

John: You should never have more than three parameters in a method call.
Jane: Blanket statements like that tend to be problematic. Three method parameters is really, technically, more of a “code smell” than necessarily a problem. It’s often a problem, but it might not be.
John: I think it’s necessarily a problem. I can’t think of a situation where that’s desirable.
Jane: How about when someone is holding a gun to your head and telling you to write a method that takes four parameters.
John: (Rolls his eyes)
Jane: Look, there’s probably a better example. All I’m saying is you should never use absolutes, because you never know.
John: “You should never use absolutes” is totally an absolute! You’re a hypocrite!
Both: (Devolves into pointless bickering)

A lot of times during debates, particularly when you have smart and/or exacting participants, the conversation is derailed by a sort of “gotcha” game of one-upsmanship. It’s as though they are at an impasse as to the crux of the matter, so the two begin sniping at one another about tangentially-related or totally non-related minutiae until someone makes a slip, and this somehow settles the debate. Of course, it’s an exercise in futility because both sides think their opponent is the first to slip up. Jane thinks she’s won this argument because John used an absolute qualifier and she pointed out some (incredibly preposterous and contrived) counter-example, and John thinks he won with his ad hominem right before the end about Jane’s hypocrisy.

In this debate, they both lose, in my opinion. I agree with John’s premise but not his justification, and the difference matters. And Jane’s semantic nitpicking doesn’t get us to the right justification (counter or pro), either. Prescriptive matters of canon when it comes to programming are troubling for the same reason that absolutes are often troubling in our day-to-day lives. Even the most clear-cut seeming things, like “it’s morally reprehensible to kill people,” wind up having many loopholes in their application (“it’s morally reprehensible to kill people — unless, of course, it’s war, self-defense, certain kinds of revenge for really bad things, accidental, state-sanctioned execution, etc., etc.”). So for non-important stuff like the number of parameters to a method, aren’t we kind of hosed and thus stuck in a relativistic quagmire?

I’d argue not, and furthermore, I’d argue that the fact of the rules is more important than the rules themselves. It’s more important to have a restriction like “don’t have more than three parameters to a method” than it is to have that specific restriction. If it were “don’t have more than two method parameters” or “don’t have more than four method parameters,” we’d still be sitting pretty. Why, you ask? Well, a man named Barry Schwartz coined this phrase: “the paradox of choice: why more is less.” Restrictions limit choice, which is merciful

Developers are smart, and they want to solve problems — often hard problems. But, really, they want to solve directed problems efficiently. To understand what I mean, ask yourself which of these propositions is more appealing to you: (1) make a website that does anything in any programming language with any framework or (2) use F# to parse a large text file and have the running process use no more than 1 gig of memory. The first proposition makes your head hurt while the second gets your mental juices flowing as you decide whether to try to solve the problem algorithmically or to cheat and write interim results to disk.

Well, the same thing happens with a lot of the “best practice” rules that surround us in software development. Don’t make your classes too big. Don’t make your methods too big. Don’t have too many parameters. Don’t repeat your code. While they can seem like (and be, if you don’t understand the purpose behind them) cargo-cult mandates if you simply focus on the matter of relativism vs absolutes, they’re really about removing (generally bad) options so that you can be creative within the context remaining, as well as productive and happy. Developers who practice DRY and who write small classes with small methods and small method signatures don’t have to spend time thinking “how many parameters should this method have” or “is this class getting too long?” Maybe this sounds restrictive or draconian to you, but think of how many options have been removed from you by others: “does the code have to compile,” or “is the code allowed to wipe out our production data?” If you’re writing code for any sort of business purposes, the number of things you can’t do dwarfs the number of things you can.

Of course, just having rules for the sake of rules is the epitome of dumb cargo cult activity. The restrictions have to be ones that contribute overall to a better code base. And while there may be some debate about this, I doubt that anyone would really argue with statements like “favor small methods over large ones” and “favor simple signatures over complex ones.” Architects (or self-organizing teams) need to identify general goals like these and turn them into liberating restrictions that remove paralysis by analysis while keeping the code base clean. I’ve been of the opinion for a while now that one of the core goals of an architect should be providing a framework that prevents ‘wrong’ decisions so that the developers can focus on being creative and solving problems rather than avoiding pitfalls. I often see this described as “making sure people fall into the pit of success.”


Going back to the “maximum of three parameters rule,” it’s important to realize that the question isn’t “is that right 99% of the time or 100% of the time?” While Jane and John argue over that one percent, developers on their team are establishing patterns and designs predicated upon methods with 20 parameters. Who cares if there’s some API somewhere that really, truly, honestly makes it better to user four parameters in that one specific case? I mean, great — you proved that on a long enough timeline, weird aberrations happen. But you’re missing out on the productivity-harnessing power of imposing good restrictions. The developers in the group might agree, or they might be skeptical. But if they care enough to be skeptical, it probably means that they care about their craft and enjoy a challenge. So when you present it to them as a challenge (in the same way speeding up runtime or reducing memory footprint is a challenge), they’ll probably warm to it.


How We Get Coding Standards Wrong

The other day, I sat in on a meeting where a large-ish group was discussing “standards” for their particular area of software development. I have the word standards in quotes because, by design, there wasn’t a clear definition as to what sorts of standards they would be; it was an open-ended exercise. The standard could cover anything from variable casing to development practices and principles to holistic approaches. When asked for my input, I was sort of bemused by the process, and I said that I didn’t really have much in the way of an answer. I declined to elaborate much more on that since I wasn’t interested in derailing the meeting in any way, but it did get me to thinking about why the exercise seemed sort of futile to me.

I generally have a natural leeriness when it comes to coding and development standards and especially activities designed to flesh those out, and in this post I’d like to explore why. It isn’t that I don’t believe standards should exist or that I believe they aren’t important. It’s just that I think we frequently miss the point and create standards out of some sense that it’s The Right Thing, and thus create standards that are pointless or even detrimental.

Standards by Committee Anti-Pattern

One problem with defining standards in a group setting is that any group containing some socially savvy people is going to gravitate toward diplomacy. Contentious and arbitrary subjects (so-called “religious wars”) like camel case versus Pascal case or where the bracket after a function goes will be avoided in favor or things upon which a consensus may be reached. But think about what’s actually happening–everyone’s agreeing that the things that everyone already does should be standardized. This is a fairly vacuous exercise in bureaucracy, useful only in the hypothetical realm where a new person comes on board and happens to disagree with something upon which twenty others agree.

People doing this are solving a problem that doesn’t exist: “how do we make sure everyone does this the same way when everyone’s currently doing it the same way?” It also tends to favor documenting current process rather than thinking critically about ideal process.

Let’s capture all of the stuff that we all do and write it down. Okay, so, coding standards. When working on a .NET project, first drive to the office. Then, have your keycard ready to get in the building. Next, enter the building…

Obviously this is silly, but hopefully the point hits home. The simple fact that you do something or that everyone in the group does something doesn’t mean that it’s worth capturing as trainable knowledge and enforcing on the group. And yet this is a direction I frequently see groups take as they get into a groove of “yes, and” when discussing standards. It can just turn into “let’s make a list of everything we do.”

Pointless Homogeneity

The concept of capturing the intersection of everyone’s approach and coding style dovetails into another problem with groups hashing out standards: a group-think bias. Slightly different from the notion that everything common should be documented, this is the notion that everything should be common. For instance, I once worked in a shop where developers were all mandated to use the same diff tool. I’m not kidding. If anyone bothered with a justification for this, I don’t recall what it was, other than some nod to pointless standards.


You can take this pretty far. Imagine demands that you use the same syntax highlighting colors as your peers or that you keep your file system organized in the same way as everyone else. What does this have to do with the code you’re producing? Who knows…

It might seem like the kind of thing where you should just indulge the the harmless control freak driving it or the group that dreams it up as a unit, but this runs the risk of birthing a toxic culture. With everything, however inconsequential, homogenized, there is no room for creative thinkers to innovate with new approaches.

Make-Work Tasks

Another risk you run when coming up with standards is to create so-called standards that amount to codifying and mandating busy-work. I’ve made my evolving opinion of comments in code quite clear on a few occasions, and I consider them to be an excellent example. When you make “comment every method” a standard, you’re standardizing the procedure (mindlessly adding comments) and not the goal (clarity and communication).

There are plenty of other examples one might dream up. The silly mandate of “sort and organize usings” that I blogged about some time back comes to mind. This is another example of standardizing pointless make-work tasks that provide no substantive benefit. In all cases, the problem is that you’re not only asking developers to engage in brainless busy-work–you’re codifying it as an official mandate.

Getting Too Specific

Another source of issues that I’ve seen in the establishment of standards is a tendency to get too specific. “What sort of convention should we use when we declare a generic interface below an enumeration inside of a nested class?” Really? Does that come up often enough that it’s important for everyone to get on the same page about how to approach it?

I recognize the human desire for set closure; we don’t like it when a dresser is missing a drawer or when we haven’t collected the whole set, but sometimes you’ve just got to let it go. We’re not the IRS–it’s going to be alright if there are contingencies that we haven’t covered and oddball loopholes that we haven’t addressed.

Missing the Point

For me, this is the biggest one. Usually standards discussions are about superficial programming concerns rather than substantive ones, and that’s unfortunate. It is the aforementioned camel vs Pascal case wars or whether to put brackets and which kinds to use. To var or not to var? Should constants be all caps? If an interface is in a forest and doesn’t have an “I” in front of its name, is it still an interface?

I understand the benefit of consistency in naming, casing, and other syntactic considerations. I really do, in spite of my tendency to be dismissive and iconoclast on this front when discussing them. But, first off, let’s not pretend that there really is a right way with these things–there’s just the way that you’re used to doing them. And, more importantly, let’s not pretend that this is really all that important in the grand scheme of things.

We use consistent casing and naming so that a reader of the code can tell at a glance whether something is a field or a local variable or whether something is a method or a property or a constant. It’s really about promoting readability, which, in turn, is about maximizing maintainability. But you know what’s much harder on maintainability than Jones’s great constant casing blunder of 2010 where he forgot to use ALL CAPS? Writing bad code.

If you’re banging out behemoth methods with control statements eight deep, all of the camel case in the world isn’t going to make your code readable. A standard mandating that all such methods be prepended with “yuck” might help, but the real thing that you need is some standards about writing clean code. Keeping methods and classes small and focused, principles like DRY and SOLID, and other good design principles are much more important standards to which to aspire, but they’re often less concrete and harder to enforce. It’s much easier and more rote for a code reviewer to look for casing issues or missing comments than to analyze code for good software practice and object-oriented design. The latter is often less cut-and-dry and more a matter of degrees, so it’s frequently glossed over in favor of more tangible, simple things. Problem is, those tangible, simple things really aren’t all that important to the health of your applications and projects over the long haul.

It’s All Just Premature Optimization

The common thread here is that all of these standards anti-patterns result from solving non-existent problems. If you have some collection of half-baked standards at your company that go on for some pages and then say, “after that, follow the Microsoft standards,” imagine how they came about. I bet a few of the group’s original engineers or most senior people had a conversation that went something like, “We should probably have some standards.” “Yeah, I guess… but why now?” “I dunno… I think it’s, like, what you’re supposed to do.”

I suspect that if you did a survey, a lot more standards documents have started with conversations like that than with conversations about hours lost to maintenance and difficulty reading code. They are born out of cargo-cult practice rather than a necessity to solve some problem. Philosophically, they start as solutions in search of a problem rather than solutions to actual problems.

The situation is complicated by the fact that adoption of certain standards may have solved real problems in the past for developers on the team, and they’re simply doing the smart thing and carrying their knowledge forward. The trouble is that not all projects face the same problems. When discussing approaches, start with abstract and general abiding principles like SOLID and DRY and take it from there. If half of your team uses camel case and the other half Pascal and it’s causing communication and maintenance difficulties, flip a coin and set a standard. Repeat as necessary with other standards to keep the project moving and humming. But don’t make them up just for the sake of doing so. You wouldn’t start writing random code that may never solve any actual problem, so why create a standard that way?


The Way We Write Code is Stupid: Source Code Files Considered Harmful

Order Doesn’t Matter

Please pardon the loaded phrasing in the title, but that’s how the message came to me from my subconscious brain: bluntly and without ceremony. I was doing a bit of work in Apex, the object-oriented language specific to Salesforce.com, and it occurred to me that I had no idea what idiomatic Apex looked like. (I still don’t.) In C++, the convention (last time I was using it much, anyway) is to first define public members in class headers and then the private members at the bottom. In C#, this is inverted. I’ve seen arguments of all sorts as to which approach is better and why. Declaring them at the top makes sense since the first thing you want to see in the class is what its state will consist of, but declaring the public stuff at the top makes sense since that’s what consumers will interact with and it’s like the above-water part of your code iceberg.

When programming in any of the various programming languages I know, I have this mental cache of what’s preferred in what language. I attempt to ‘speak’ it without an accent. But with Apex, I have no idea what the natives ‘sound’ like, not having seen it in use before. Do I declare instance variables at the bottom or the top? Which is the right way to eat bread: butter side up or butter side down? I started googling to see what the ‘best practice’ was for Apex when the buzzing in my subconscious reached some kind of protesting critical mass and morphed into a loud, clear message: “this is completely stupid.”

I went home for the day at that point–it was late anyway–and wondered what had prompted this visceral objection. I mean, it obviously didn’t matter from a compiled code perspective whether instance variables or public methods come first, but it’s pretty well established and encouraged by people as accomplished and prominent as “Uncle” Bob Martin that consistency of source code layout matters, if not the layout specifics (paraphrased from my memory of his video series on Clean Coders). I get it. You don’t want members of your team writing code that looks completely different from class to class because that creates maintenance headaches and obscures understanding. So what was my problem?

I didn’t know until the next morning in the shower, where I seem to do my most abstract thinking. I didn’t think it was stupid to try to make my Apex code look like ‘standard’ Apex. I thought it was stupid that I needed to do so at all. I thought it was stupid to waste any time thinking about how to order code elements in this file when the only one whose opinion really matters–the compiler–says, “I don’t care.” Your compiler is trying to tell you something. Order doesn’t matter to it, and you shouldn’t care either.

Use Cases: What OOP Developers Want

But the scope of my sudden, towering indignation wasn’t limited to the fact that I shouldn’t have to care about the order of methods and fields. I also shouldn’t have to care about camel or Pascal casing. I shouldn’t have to care about underscores in front of field names or inside of method names. It shouldn’t matter to me if public methods come before private or how much indentation is the right amount of indentation. Should methods be alphabetized or should they be in some other order? I don’t care! I don’t care about any of this.

Let’s get a little more orderly about this. Here are some questions that I ask frequently when I’m writing source code in an OOP language:

  • What is the public API of this type?
  • What private methods are in the ‘tree’ of this public method?
  • What methods of this type mutate or reference this field?
  • What are the types in this namespace?
  • What are the implementations of this interface in this code base?
  • Let’s see this method and any methods that it overrides.
  • What calls this method?

Here are some questions that I never ask out of actual interest when writing source code.  These I either don’t ask at all or ask in exasperation:

  • What’s the next method in this file?
  • How many line feed characters come before the declaration of this variable?
  • Should I use tabs or spaces?
  • In what region is this field’s declaration?
  • Did the author of this file alphabetize anything in it?
  • Does this source file have Windows or *NIX line break characters?
  • Is this a field or a method or what?

With the first set of questions, I ask them because they’re pieces of information that I want while reasoning about code.  With the second set of questions, they’re things I don’t care about.  I view asking these questions as an annoyance or failure.  Do you notice a common pattern?  I certainly do.  All of the questions whose answers interest me are about code constructs and all the ones that I don’t care about have to do with the storage medium for the code: the file.

But there’s more to the equation here than this simple pattern.  Consider the first set of questions again and ask yourself how many of the conventions that we establish and follow are simply ham-fisted attempts to answer them at a glance because the file layout itself is incapable of doing so.  Organizing public and private separately is a work-around to answer the first question, for example.  Regions in C#, games with variable and method naming, “file” vs “type” view, etc. are all attempts to overcome the fact that files are actually really poor communication media for object-oriented concepts.  Even though compilers are an awful lot different now than they were forty years ago, we still cling to the storage medium for source code best suited to those old compilers.

Not Taking our own Advice

If you think of an ‘application’ written in MS Access, what comes to mind?  How about when you open up an ASP web application and find wizard-generated data sources in the markup, or when you open up a desktop application and find SQL queries right in your code behind?  I bet you think “amateurs wrote this.”  You are filled with contempt for the situation–didn’t anyone stop to think about what would happen if data later comes in some different form?  And what about some kind of validation?  And, what the–ugh… the users are just directly looking at the tables and changing the column order and default sorting every time they look at the data.  Is everyone here daft?  Don’t they realize how ridiculous it is to alter the structure of the actual data store every time someone wants a different ordering of the data?

OldYoungAnd you should see some of the crazy work-arounds and process hacks they have in place. They actually have a scheme where the database records the name of everyone who opens up a table and makes any kind of change so that they can go ask that person why they did it.  And–get this–they actually have this big document that says what the order of columns in the table should be.  And–you can’t make this stuff up–they fight about it regularly and passionately.  Can you believe the developers that made this system and the suckers that use it? I mean, how backward are they?

In case you hadn’t followed along with my not-so-subtle parallel, I’m pointing out that we work this way ourselves even as we look with scorn upon developers who foist this sort of thing on users and users who tolerate it.  This is like when you finally see both women in the painting for the first time–it’s so clear that you’ll never un-see it again.  Why do we argue about where to put fields and methods and how to order things in code files when we refuse to write code that sends users directly into databases, compelling them to bicker over the order of column definition in the same?  RDBMS (or any persistence store) is not an appropriate abstraction for an end user–any end user–whether he understands the abstraction or not.  We don’t demand that users fight, decide that there is some ‘right’ way to order invoices to be printed, and then lock the Invoice table in place accordingly for all time and pain of shaming for violations of an eighty-page invoice standard guideline document.  So why do that to ourselves?  When we’re creating object-oriented code, sequential files, and all of the particular orderings, traversings and renderings thereof are wildly inappropriate abstractions for us.

What’s the Alternative?

Frankly, I don’t know exactly what the alternative is yet, but I think it’s going to be a weird and fun ride trying to figure that out.  My initial, rudimentary thoughts on the matter are that we should use some sort of scheme in which the Code DOM is serialized to disk for storage purposes.  In other words, the domain model of code is that there is something called Project, and it has a collection of Namespace.  Namespace has a collection of Type, which may be Interface, Enum, Struct, Class (for C# anyway–for other OOP languages, it’s not hard to make this leap).  Class has one collection each of Field, Method, Property, Event.  The exact details aren’t overly important, but do you see the potential here?  We’re creating a hierarchical model of code that could be expressed in nested object or relational format.

In other words, we’re creating a domain model entirely independent of any persistence strategy.  Can it be stored in files?  Sure. Bob’s your uncle.  You can serialize these things however you want.  And it’ll need to be written to file in some form or another for the happiness of the compiler (at least at first).  But those files handed over to the compiler are output transforms rather than the lifeblood of development.

Think for a minute of us programmers as users of a system with a proper domain, one or more persistence models, and a service layer.  Really, stop and mull that over for a moment.  Now, go back to the use cases I mentioned earlier and think what this could mean.  Here are some properties of this system:

  1. The basic unit of interaction is going to be the method, and you can request methods with arbitrary properties, with any filtering and any ordering.
  2.  What appears on your screen will probably be one or more methods (though this would be extremely flexible).
  3. It’s unlikely that you’d ever be interested in “show me everything in this type.”  Why would you?  The only reason we do this now is that editing text files is what we’re accustomed to doing.
  4. Tracing execution paths through code would be much easier and more visual and schemes that look like Java’s “code bubbles” would be trivial to create and manipulate.
  5. Most arguments over code standards simply disappear as users can configure IDE preferences such as “prepend underscores to all field variables you show me,” “show me everything in camel casing,” and, “always sort results in reverse alphabetical order.”
  6. Arbitrary methods from the same or different types could be grouped together in ad-hoc fashion on the screen for analysis or debugging purposes.
  7. Version/change control could occur at the method or even statement level, allowing expression of “let’s see all changes to this method” or “let’s see who made a change to this namespace” rather than “let’s see who changed this file.”
  8. Relying on IDE plugins to “hop” to places in the code automatically for things like “show all references” goes away in favor of an expressive querying syntax ala NDepend’s “code query language.”
  9. New domain model allows baked-in refactoring concepts and makes operations like “get rid of dead code” easier or trivial, in some cases.

Longer Reaching Impact

If things were to go in this direction, I believe that it would have a profound impact not just on development process but also on the character and quality of object oriented code that is written in general.  The inherently sequential nature of files and the way that people reason about file parsing, I believe, lends to or at least favors the dogged persistence of procedural approaches to object oriented programming (static methods, global state, casting, etc.).  I think that the following trends would take shape:

  1. Smaller methods.  If popping up methods one at a time or in small groups becomes the norm, having to scroll to see and understand a method will become an anomaly, and people will optimize to avoid it.
  2. Less complexity in classes.  With code operations subject to a validation of sorts, it’d be fairly easy to incorporate a setting that warns users if they’re adding the tenth or twentieth or whatever method to a class.  In extreme cases, it could even be disallowed (and not through the honor system or ex post facto at review or check in–you couldn’t do it in the first place).
  3. Better conformance to Single Responsibility Principle (SRP).  Eliminating the natural barrier of “I don’t want to add a new file to source control” makes people less likely awkwardly to wedge methods into classes in which they do not belong.
  4. Better cohesion.  It becomes easy to look for fields hardly used in a type or clusters of use within a type that could be separated easily into multiple types.
  5. Better test coverage.  Not only is this a natural consequence of the other items in this list, but it would also be possible to define “meta-data” to allow linking of code items and tests.

What’s Next?

Well, the first things that I need to establish is that this doesn’t already exist somewhere in the works and that I’m not a complete lunatic malcontent.  I’d like to get some feedback on this idea in general.  The people to whom I’ve explained a bit so far seem to find the concept a bit far-fetched but somewhat intriguing.

I’d say the next step, assuming that this passes the sanity check would be perhaps to draw up a white paper discussing some implementation/design strategies with pros and cons in a bit more detail.  There are certainly threats to validity to be worked out such as the specifics of interaction with the compiler, the necessarily radical change to source control approaches, the performance overhead of performing code transforms instead of just reading a file directly into memory, etc.  But off the top of my head, I view these things more as fascinating challenges than problems.

In parallel, I’d like to invite anyone who is at all interested in this idea to drop me an email or send me a tweet.  If there are others that feel the way I do, I think it’d be really cool to get something up on Github and maybe start brainstorming some initial work tasks or exploratory POCs for feasibility studies.  Also feel free to plus-like-tweet-whatever to others if you think they might be interested.

In conclusion I’ll just say that I feel like I’d really like to see this gain traction and that I’d probably ratchet this right to the top of my side projects list if people are interested (this being a bit large in scope for me alone in my spare time).  Now whenever I find myself editing source files in an IDE I feel like a bit of a barbarian, and I really don’t think we shouldn’t have to tolerate this state of affairs anymore.  Productivity tools designed to hide the file nature of our source code from us help, but they’re band-aids when we need disinfectants, antibiotics, and stitches.  I don’t know about you, but I’m ready to start writing my object-oriented code using an IDE paradigm that doesn’t support GOTO Line as if we were banging out QBasic in 1986.


A Better Metric than Code Coverage

My Chase of Code Coverage

Perhaps it’s because fall is upon us and this is the first year in a while that I haven’t been enrolled in a Master’s of CS program (I graduated in May), I’m feeling a little academic. As I mentioned in my last post, I’ve been plowing through following TDD by the letter, and if nothing else, I’m pleased that my code coverage is more effortlessly at 100%. I try to keep my code coverage around 100% whether or not I do TDD, so the main difference I’ve noticed is that TDD versus retrofitted tests seems to hit my use cases a lot harder, instead of just going through the code at least once.

Now, it’s important to me to get close to or hit that 100% mark, because I know that I’m at least touching everything going into production, meaning that I don’t have anything that would blow up if the stack pointer ever got to it, and I’m saved only by another bug preventing it from executing. But, there is a difference between covering code and exercising it.

More than 100% Code Coverage?

As I was contemplating this last night, I realized that some lines of my TDD code, especially control flow statements, were really getting pounded. There are lines in there that are covered by dozens of tests. So, the first flicker of an idea popped into my head — what if there were two factors at play when contemplating coverage: LOC Covered/Total LOC (i.e. our current code coverage metric) and Covering tests/LOC (I’ll call this coverage density).

High coverage is a breadth-oriented thing, while high density is depth — casting a wide net versus a narrow one deeply. And so, the ultimate solution would be to cast a wide net, deeply (assuming unlimited development time and lack of design constraints).

Are We Just Shifting the Goalposts?

So, Code Density sounded like sort of a heady concept, and I thought I might be onto something until I realized that this suffered the same potential for false positive feedback as code coverage. Specifically, I could achieve an extremely high density by making 50 copies of all of my unit tests. All of my LOC would get hit a lot more but my test suite would be no better (in fact, it’d be worse since it’s now clearly less efficient). So code coverage is weaker as a metric when you cheat by having weak asserts, and density is weaker when you cheat by hitting the same code with identical (or near identical) asserts.

Is there a way to use these two metrics in combination without the potential for cheating? It’s an interesting question and it’s easy enough to see that “higher is better” for both is generally, but not always true, and can be perverted by developers working under some kind of management edict demanding X coverage or, now, Y density.

Stepping Back a Bit

Well, it seems that Density is really no better than Code Coverage, and it’s arguably more obtuse, or at least it has the potential to be more obtuse, so maybe that’s not the route to go. After all, what we’re really after here is how many times a line of code is hit in a different scenario. For instance, hitting the line double result = x/y is only interesting when y is zero. If I hit it 45,000 times and achieve high density, I might as well just hit it once unless I try y at zero.

Now, we have something interesting. This isn’t a control flow statement, so code coverage doesn’t tell the whole story. You can cover that line easily without generating the problematic condition. Density is a slightly (but not much) better metric. We’re really driving after program correctness here, but since that’s a bit of a difficult problem, what we’ll generally settle for is notable, or interesting scenarios.

A Look at Pex

Microsoft Research made a utility called Pex (which I’ve blogged about here). Pex is an automated test generation utility that “finds interesting input-output values of your methods”. What this means, in practice, is that Pex pokes through your code looking for edge cases and anything that might be considered ‘interesting’. Often, this means conditions that causes control flow branching, but it also means things like finding our “y” div by zero exception from earlier.

What Pex does when it finds these interesting paths is it auto-generates unit tests that you can add to your suite. Since it finds hard-to-find edge cases and specializes in branching through your code, it boasts a high degree of coverage. But, what I’d really be interested in seeing is the stats on how many interesting paths your test suite cover versus how many there are or may be (we’d likely need a good approximation as this problem quickly becomes computationally unfeasible to know for certain).

I’m thinking that this has the makings of an excellent metric. Forget code coverage or my erstwhile “Density” metric. At this point, you’re no longer hoping that your metric reflects something good — you’re relatively confident that it must. While this isn’t as good as some kind of formal method that proves your code, you can at least be confident that critical things are being exercised by your test suite – manual, automated or both. And, while you can achieve this to some degree by regularly using Pex, I don’t know that you can quantify it other than to say, “well, I ran Pex a whole bunch of times and it stopped finding new issues, so I think we’re good.” I’d like a real, numerical metric.

Anyway, perhaps that’s in the offing at some point. It’d certainly be nice to see, and I think it would be an advancement in the field of static analysis.


App Development Strategy

At the moment, I own an Android phone and an IPod Touch.  I do a lot of work on home automation and have begun to integrate both devices into what I do, envisioning them as essentially remote controls for operating the various automated appliances and articles in my house.   Presently, this is done using the browsers on both devices, but I thought it couldn’t hurt to dip my toe into the waters of “app” development to better understand how to leverage those technologies.  I don’t, personally, think that the notion of “apps” will continue to be as en vogue over the next decade, as we’ve done this dance before in the late 90’s with shrinkwrap software on the PC versus web applications, but I digress. If downloadable applications for the phone are funneled toward the phone browser like desktop applications to the desktop browser, that’s at least a few years out and will be heavily influenced by the current state of today’s apps.

Because my IPod was closer when I decided to play around with app development, I decided to set up to write apps for it first.  I was and am not interested in publishing to Apple’s App store, but since I have jail-broken the device, I just wanted to run my code on it alone.  I was surprised to find out that Apple has made no provisions whatsoever to allow app developers to use any sort of development environment outside of the Mac suite.  That is to say, Apple’s official stance appears to be that if you are interested in developing apps for IDevices (including just your own), you need to pay Apple $100 per year for membership, and, assuming you don’t own a Mac (I don’t — all of my computers run Windows or various flavors of Linux), you need to purchase one.  For those keeping track at home, that means that a developer would need to pay at least $700 for the privilege of enriching the device experience.

Apparently, I’m not the only coder whose reaction to this was something short of sprinting to the nearest Apple Store waving my credit card. This site offers five work-arounds for the limitation. Dragonfire offers a pay-to-play system that will set you back a much more reasonable $50. No doubt, there are others as well.

None of that really appealed to me, so I put my IPod back in its charging spot and pulled out my Android phone. The experience was a full 180. I googled “develop android apps” or some such thing, and the first site that came up was the one offering a free download of the Android SDK, asking me whether I wanted to use Windows, Linux, or Mac, and then providing detailed instructions as to how to setup the IDE and get started. So, I did all of the above and shortly had my first real, live app running on my Android phone.

Now, I have my own opinions about various technologies, companies, and practices, but the purpose of this blog is not to engage in the typical “fanboy” debates, proselytize, or anything of that nature. I am generally pretty agnostic in such discussions and willing to use whatever gets the job done. So, what I’m saying here isn’t a knock on Apple or their products, but rather an explanation of why I find this disparity between accommodating developers to be rather curious.

In the early 2000’s, I was fresh out of college and struggling to find a job in the .com bubble burst. Anxious to keep my skills relevant, I decided to write some code on Windows XP, using Visual C++ 6.0. Much to my chagrin as an unemployed kid, I learned it would cost me at least $300 that I didn’t have. My solution? I formatted half of my Windows hard drive and dual booted Linux, where I did all of my development with GCC and friends for free.

A lot of people went this route–so many, in fact, that Java took off, and Linux took a huge bite out of Windows’ domination on the server front. Nothing gets an operating platform moving faster than a lot of people creating software for it, and this ushered in a Linux golden age of sorts. Granted, Linux isn’t rivaling Windows for end-user desktops by any stretch, but it’s a lot more prevalent than it would have been had Microsoft not discouraged developers from writing software to run on its OSs.

Microsoft tacitly recognized its former stance as a mistake and introduced developer express versions of Visual Studio, allowed open source plugins to the same, and generally made developing Windows applications a pleasure rather than an expensive chore. As a result, C# has gone from being Microsoft’s cute imitation of Java to a bona-fide and robust development option with substantial market share.

Fast forward to the present, and Apple seems to be imitating Microsoft’s blunder. And that’s what I find curious. To make matters more interesting, Microsoft did this when they had a monopoly on the desktop. Apple is doing it without even a majority on the smart phone. I understand that the strategy might be to boost Mac sales, but at what cost? It’s now established that alienating developers can give a toehold to otherwise irrelevant competitors. So, what happens in the long term when you alienate developers while already having very viable competitors–competitors who, I might add, are welcoming them with open arms?

I personally find it somewhat annoying that I can’t write apps for my IPod without buying hardware or software, but I don’t presume to think that this matters to anyone but me. But, it doesn’t really even matter to me all that much. I’ll just write apps for the Android and use my IPod’s browser in the future. And so will others. Android’s marketplace of apps will grow as quickly and robustly as the wild world allows while Apple’s grows as quickly as Apple’s current cachet allows. And history has shown that an operating platform is only as good as the software written for it. As the developers go, so goes the product. And Apple, in my opinion, would be wise to stop letting the developers go.