Your Code Is Data
This is a post that I originally wrote for the NDepend blog. If you haven’t already, go check it out! We’re building out some good content over there around static analysis, with lots more to follow.
A lot of programmers have some idea of what static analysis is, as least superficially. If I mention the term, what pops into your head? Automatic enforcement of coding standards? StyleCop or FXCop? Cyclomatic complexity and Visual Studio’s “maintainability index?” Maybe you’re deeply familiar with all of the subtleties and nuances of the technique.
Whatever your level of familiarity, I’d like to throw what might be a bit of a curve ball at you. Static analysis is the idea of analyzing source code and byte code for various properties and reporting on those properties, but it’s also, philosophically, the idea of treating code as data. This is deeply weird to us as application developers, since we’re very much used to thinking of source code as instructions, procedures, and algorithms. But it’s also deeply powerful.
When you think of source code this way, typical static analysis use cases make sense. FXCop asks questions along the lines of “How many private fields not prepended with underscores,” or, perhaps, “SELECT COUNT(class_field) FROM classes WHERE class_field NOT LIKE ‘_*’” More design-focused source code analysis tools ask questions like “What is the cyclomatic complexity of my methods,” or, perhaps, “SELECT cyclomatic_complexity FROM Methods.”
But if code is data, and static analysis tools are sets of queries against that data, doesn’t it seem strange that we can’t put together and execute ad-hoc queries the way that you would with a relational (or other) database? I mean, imagine if you built out some persistence store using SQL Server, and the only queries you were allowed were SELECT * from the various tables and a handful of others. Anything beyond that, and you would have to inspect the data manually and make notes by hand. That would seem arbitrarily and even criminally restrictive. So why doesn’t it seem that way with our source code? Why are we content not having the ability to execute arbitrary queries?
I say “we” but the reality is that I can’t include myself in that question, since I have that ability and I would consider having it taken away from me to be crippling. My background is that of a software architect, but beyond that, I’m also a software craftsmanship coach, teacher, and frequent analyzer of codebases in a professional capacity, auditing a wide variety of them for various properties, characteristics, and trends. If I couldn’t perform ad-hoc, situation-dependent queries against the source code, I would be far less effective in these roles.
My tools of choice for doing this are NDepend and its cousin JArchitect (for Java code bases). Out of the box, they’re standard static analysis and architecture tools, but they also offer this incredibly powerful concept called CQLinq that is, for all intents and purposes, SQL for the ‘schema’ of source code. In reality, CQLinq is actually a Linq provider for writing declarative code queries, but anyone that knows SQL (or functional programming or lamba expressions) will feel quite at home creating queries.
Let’s say, for instance, that you’re the architect for a C# code base and you notice a disturbing trend wherein the developers have taken to communicating between classes using global variables. What course of action would you take to nip this in the bud? I bet it would be something annoying for both you and them. Perhaps you’d set a policy for a while where you audited literally every commit and read through to make sure they weren’t doing it. Maybe you’d be too pressed for time and you’d appoint designated globals cops. Or, perhaps you’d just send out a lot of angry, threatening emails?
Do you know what I would do? I’d just write a single CQLinq query and add it to a step in my automated team build that executed static analysis code rules against all commits. If the count of global variable invocations in the code base was greater after the commit than before it, the build would fail. No need for anger, emails or time wasted checking over people’s shoulders, metaphorically or literally.
Want to see how easy a query like this would be to write? Why don’t I show you…
// Avoid global variables
warnif count > 0 JustMyCode.Fields.Where(f => f.WasAdded() && f.IsStatic && f.IsPublic)
That’s it. I write that query, set the build to run NDepend’s static analysis, and fail if there are warnings. No more sending out emails, pleading, nagging, threatening, wheedling, coaxing, or bottleneck code reviewing. And, most important of all, no more doing all of that and having problems anyway. One simple little piece of code, and you can totally automate preventing badness. And best of all, the developers get quick feedback and learn on their own.
As I’ve said, code is data at its core. This is especially true if you’re an architect, responsible for the long term health of the code base. You need to be able to assess characteristics and properties of that code, make decisions about it, and set precedent. To accomplish this, you need powerful tooling for querying your code, and NDepend, with its CQLinq, provides exactly that.