DaedTech

Stories about Software

By

A Better Metric than Code Coverage

My Chase of Code Coverage

Perhaps it’s because fall is upon us and this is the first year in a while that I haven’t been enrolled in a Master’s of CS program (I graduated in May), I’m feeling a little academic. As I mentioned in my last post, I’ve been plowing through following TDD by the letter, and if nothing else, I’m pleased that my code coverage is more effortlessly at 100%. I try to keep my code coverage around 100% whether or not I do TDD, so the main difference I’ve noticed is that TDD versus retrofitted tests seems to hit my use cases a lot harder, instead of just going through the code at least once.

Now, it’s important to me to get close to or hit that 100% mark, because I know that I’m at least touching everything going into production, meaning that I don’t have anything that would blow up if the stack pointer ever got to it, and I’m saved only by another bug preventing it from executing. But, there is a difference between covering code and exercising it.

More than 100% Code Coverage?

As I was contemplating this last night, I realized that some lines of my TDD code, especially control flow statements, were really getting pounded. There are lines in there that are covered by dozens of tests. So, the first flicker of an idea popped into my head — what if there were two factors at play when contemplating coverage: LOC Covered/Total LOC (i.e. our current code coverage metric) and Covering tests/LOC (I’ll call this coverage density).

High coverage is a breadth-oriented thing, while high density is depth — casting a wide net versus a narrow one deeply. And so, the ultimate solution would be to cast a wide net, deeply (assuming unlimited development time and lack of design constraints).

Are We Just Shifting the Goalposts?

So, Code Density sounded like sort of a heady concept, and I thought I might be onto something until I realized that this suffered the same potential for false positive feedback as code coverage. Specifically, I could achieve an extremely high density by making 50 copies of all of my unit tests. All of my LOC would get hit a lot more but my test suite would be no better (in fact, it’d be worse since it’s now clearly less efficient). So code coverage is weaker as a metric when you cheat by having weak asserts, and density is weaker when you cheat by hitting the same code with identical (or near identical) asserts.

Is there a way to use these two metrics in combination without the potential for cheating? It’s an interesting question and it’s easy enough to see that “higher is better” for both is generally, but not always true, and can be perverted by developers working under some kind of management edict demanding X coverage or, now, Y density.

Stepping Back a Bit

Well, it seems that Density is really no better than Code Coverage, and it’s arguably more obtuse, or at least it has the potential to be more obtuse, so maybe that’s not the route to go. After all, what we’re really after here is how many times a line of code is hit in a different scenario. For instance, hitting the line double result = x/y is only interesting when y is zero. If I hit it 45,000 times and achieve high density, I might as well just hit it once unless I try y at zero.

Now, we have something interesting. This isn’t a control flow statement, so code coverage doesn’t tell the whole story. You can cover that line easily without generating the problematic condition. Density is a slightly (but not much) better metric. We’re really driving after program correctness here, but since that’s a bit of a difficult problem, what we’ll generally settle for is notable, or interesting scenarios.

A Look at Pex

Microsoft Research made a utility called Pex (which I’ve blogged about here). Pex is an automated test generation utility that “finds interesting input-output values of your methods”. What this means, in practice, is that Pex pokes through your code looking for edge cases and anything that might be considered ‘interesting’. Often, this means conditions that causes control flow branching, but it also means things like finding our “y” div by zero exception from earlier.

What Pex does when it finds these interesting paths is it auto-generates unit tests that you can add to your suite. Since it finds hard-to-find edge cases and specializes in branching through your code, it boasts a high degree of coverage. But, what I’d really be interested in seeing is the stats on how many interesting paths your test suite cover versus how many there are or may be (we’d likely need a good approximation as this problem quickly becomes computationally unfeasible to know for certain).

I’m thinking that this has the makings of an excellent metric. Forget code coverage or my erstwhile “Density” metric. At this point, you’re no longer hoping that your metric reflects something good — you’re relatively confident that it must. While this isn’t as good as some kind of formal method that proves your code, you can at least be confident that critical things are being exercised by your test suite – manual, automated or both. And, while you can achieve this to some degree by regularly using Pex, I don’t know that you can quantify it other than to say, “well, I ran Pex a whole bunch of times and it stopped finding new issues, so I think we’re good.” I’d like a real, numerical metric.

Anyway, perhaps that’s in the offing at some point. It’d certainly be nice to see, and I think it would be an advancement in the field of static analysis.

By

Static Analysis — Spell Check for Code

A lot of people have caught onto certain programming trends: some agility in the process generally makes things better, unit testing a code base tends to make it more reliable, etc. One thing that, in my experience, seems to lag behind in popularity is the use of static checking tools. If these are used at all, it’s usually for some reason such as enforcing capitalization schemes of variables or some other such thing that guarantees code that is uniform in appearance.

I think this non-use or under-use of such tools is a shame. I recently gave a presentation on using some tools in C# and Visual Studio 2010 for static analysis, and thought I’d share my experience with some of the tools and the benefits I perceive here. In this development environment, there are six tools that I use, all to varying degrees and for different purposes. They are:

Before I get into that, I’ll offer a little background on the idea of static analysis. The age-old and time-tested way to write code is that you write some code, you compile it, and then you run it. If it does what you expected when you run it, then you declare yourself victorious and move on. If it doesn’t, then you write some more code and repeat.

This is all fine and good until the program starts getting complicated — interacting with users, performing file I/O, making network requests, etc. At that point, you get a lot of scenarios. In fact, you get more scenarios than you could have anticipated in one sitting. You might run it and everything looks fine, but then you hand it to a user who runs it, unplugs the computer, and wonders why his data wasn’t saved.

At some point, you need to be able to reason about how components of your code would behave in various scenarios, even if you might not easily be able to recreate these scenarios. Unit testing is helpful with this, but unit testing is just an automated way of saying, “run the code.” Static analysis automates the process of reasoning about the code without running it. It’s like you looking at the code, but exponentially more efficient and much less likely to make mistakes.

Doing this static analysis is adding an extra step to your development process. Make no mistake about that. It’s like unit testing in that the largest objection is going to be the ‘extra’ time that it takes. But it’s also like unit testing in that it saves you time downstream because it makes defects less likely to come back to bite you later. These two tasks are also complimentary and not stand-ins for one another. Unit testing clarifies and solidifies requirements and forces you to reason about your code. Static analysis lets you know if that clarification and reasoning has caused you to do something that isn’t good.

As I said in the title, it’s like a spell checker for your code. It prevents you from making silly and embarrassing mistakes (and often costly ones). To continue the metaphor, unit testing is more like getting someone bright to read your document. He’ll catch some mistakes and give you important feedback for how to improve the document, but he isn’t a spell checker.

So, that said, I’ll describe briefly each one and why I use and endorse it.

MS Analysis

MS Analysis encapsulates FX Cop for the weightier version of Visual Studio 2010 (Premium and up, I think). It runs a series of checks, such as whether or not parameters are validated by methods, whether or not you’re throwing Exception instead of SomeSpecificException, and whether your classes have excessive coupling. There are probably a few hundred checks in all. When you do a build with this enabled, it populates the error list with violations in the form of warnings.

On the plus side, this integrates seamlessly with Visual Studio since it’s a Microsoft product, and it catches a lot of stuff. On the down side, it can be noisy, and customizing it isn’t particularly straightforward. You can turn rules on and off, but if you want to tweak existing ones or create your own, things get a little more complicated. It also isn’t especially granular. You configure it per project and can’t get more fine grained feedback than that (i.e. per namespace, class, or method).

My general use of it is to run it periodically to see if my code is violating any of the rules that I care about. I usually turn off a lot of rules, and I have a few different rulesets that I plug in and out so that I can do more directed searches.

Style Cop

StyleCop is designed to be run between writing and building. Instead of using the VM/Framework to reflect on your code, it just parses the source code file looking for stylistic concerns (are all of your fields camel cased and documented and are you still using Hungarian notation and, if so, stop) and very basic mistakes (like, do you have an empty method). It’s lightning fast, and it runs on a per-class basis, which is cool.

On the downside, it can be a little annoying and invasive, but the designers are obviously aware of this. I recall reading some kind of caveat stating that the nature of these types of rules tends to be arbitrary and get opinionated developers into shouting matches.

I find it useful for letting me know if I’ve forgotten to comment things, if I’ve left fields as anything other than private, and if I have extra parentheses somewhere. I run Style Cop occasionally, but not as often as others. Swapping between the rule sets is a little annoying.

CodeRush

CodeRush is awesome for a lot of things, and its static analysis is really an ancillary benefit. It maintains an “issues list” for each file and highlights these issues in real time, right in the IDE. A few of them are a little bizarre (suggesting to always use “var” keyword if it is possible), but most of them are actually really helpful and not suggested by the MS Tools or anything else I use. It does occasionally false flag dead code and get a few things wrong, but it’s fairly easy to configure it to ignore issues on a per file, per namespace, per solution basis.

The only real downside here is that CodeRush has a seat licensing cost and that and the other overhead of CodeRush make it a little overkill-ish if you’re just interested in Static Analysis. I fully endorse getting CodeRush in general, however, for all of its features.

Code Contracts

Like CodeRush, this tool is really intended for something else, and it provides static analysis as a side effect. Code Contracts is an academically developed tool that facilitates design by contract. Pre- and post-conditions as well as class invariants can be enforced at build time. Probably because of the nature of doing this, it also just so happens to offer a feature wherein you can have warning squigglies pop up anywhere you might be dereferencing null, violating array bounds, or making invalid arithmetic assumptions.

To me, this is awesome, and I don’t know of other tools that do this. The only downside is that, on larger projects, this takes a long time to run. However, getting an automatic check for null dereferences is worth the wait!

I use it explicitly for the three things I mentioned, though, if I get a chance and enough time, I’d like to explore its design by contract properties as well.

NDepend

NDepend is really kind of an architectural tool. It lets you make assessments about different dependencies in your code, and it provides you with all kinds of neat graphs that report on them. But my favorite feature of NDepend is the static analysis in the form of Code Querying. It exposes SQL-like semantics that let you write queries against your code base, such as “SELECT ALL Methods WHERE CyclomaticComplexity > 25” (paraphrase). You can tweak these things, write your own, or go with the ones out of the box. They’re all commented, too, in ways that are helpful for understanding and modifying.

There is really no downside to NDepend aside from the fact that it costs some money. But if you have the money to spare, I highly recommend it. I use this all the time for querying my code bases in whatever manner strikes my fancy.

Nitriq

I think that Nitriq and NDepend are probably competitors, but I won’t do any kind of comparison evaluation because I only have the free version of Nitriq. Nitriq has the same kind of querying paradigm as NDepend, except that it uses LINQ semantics instead of SQL. That’s probably a little more C# developer friendly, as I suppose not everyone that writes C# code knows SQL (although it strikes me that all programmers ought to have at least passing familiarity with SQL).

In the free version you can only assess one assembly at a time, though I don’t consider that a weakness. I use Nitriq a lot less than NDepend, but when I do fire it up, the interface is a little less cluttered and it’s perhaps a bit more intuitive. Though, for all I know, the paid version may get complicated.

Conclusion

So, that’s my pitch for static analysis. The tools are out there, and I suspect that they’re only going to become more and more common. If this is new to you, check these tools out and try them! If you’re familiar with static analysis, hopefully there’s something here that’s new and worth investigating.