How to Evaluate Software Quality from the Outside In

Editorial note: I originally wrote this post for the Stackify blog. You can check out the original here, at their site. While you’re there, take a look at Prefix and Retrace, for all of your prod support needs.

In a sense, application code serves as the great organizational equalizer. Large or small, complex or simple, enterprise or startup, all organizations with in-house software wrangle with similar issues. Oh, don’t misunderstand. I realize that some shops write web apps, others mobile apps, and still others hodgepodge line of business software. But when you peel away domain, interaction, and delivery mechanism, software is, well, software.

And so I find myself having similar conversations with a wide variety of people, in a wide variety of organizations. I should probably explain just a bit about myself. I work as an independent IT management consultant. But these days, I have a very specific specialty. Companies call me to do custom static analysis on their codebases or application portfolios, and then to present the findings to leadership as the basis for important strategic decisions.

As a simple example, imagine a CIO contemplating the fate of a 10 year old Java codebase. She might call me and ask, “should I evolve this to meet our upcoming needs, or would starting from scratch prove more cost effective in the long run.” I would then do an assessment where I treated the code as data and quantified things like dependence on outdated libraries (as an over-simplified example). From there, I’d present a quantitatively-driven recommendation.

So you can probably imagine how I might call code a great equalizer. It may do different things, but it has common structural, underpinnings that I can quantify. When I show up, it also has another commonality. Something about it prompts the business to feel dissatisfied. I only get calls when the business has experienced bad outcomes as measured by software quality from the outside in.

Defining Quality

You might wonder whether such calls to me always stem from issues of code quality. The clients don’t necessarily think so. But in my experience, they usually do. Undesirable business outcomes, such as missed deadlines or high cost of change, can certainly arise from personnel or process issues. But I rarely find these without also finding issues of code quality.

So, let’s define software quality, from the business’s perspective. If you write code for a living, forget for a moment about how you might measure quality a granular level. Put aside your linting tool, static analyzers, notions about cyclomatic complexity and percent test coverage. Forget even about defect counts for definition purposes. The business only cares about these things if you make it care about them. Enough milestone misses or disastrous releases, and someone comes along to mandate 70% unit test coverage.

So let’s zoom out to the business’s level of understanding and look at how it reasons about code quality. I submit that all business concerns boil down to two main questions.

Does the software do what it’s supposed to do?
Can I easily change what the software does?

That’s it. Other outcomes derive from these core concerns. Huge defect lists happen because the software behaves in a way not intended. Code gets the pejorative label of legacy when everyone fears to touch it. It’s not doing what it’s supposed to and it’s hard to change.

Working from this definition, let’s then look at some heuristics for evaluating software quality from the business’s perspective — from the outside in. What can the business take as a sign of poor software quality?

Large, Diverse Defect Counts

Earlier, I described that defect counts represented a symptom of poor quality, rather than a defining element. So it stands to reason that I should mention them when listing symptoms of underlying quality issues. Of course, anyone with common sense will tell you that high defect rates correlate with poor quality.

But look beyond the obvious examples of “doesn’t do what it should.” Also consider how it does with so-called non-functional requirements. Does it do the correct thing but exhibit abysmal performance? Does it exhibit ridiculous behavior in the event of an unexpected hardware failure? Teams tend to fix these things when they come up. So, if you see many examples in the wild, you can take it as a sign that the team can’t. And consider this as a sign of hard-to-change software.

Rising Cost of Features

Developers tend to think of feature implementation in units of time or effort. And, reasonably so. The business, however, translates these into money without a second thought. So whether you quantify features by time or money, it all winds up in the same bucket.

You can often tell that a codebase has software quality issues by looking at the cost per feature as a function of time. In a really well crafted codebase, the cost of features remains relatively flat. I would argue that complete flatness is pretty much impossible because of the inherent nature of complexity with scale. But feature cost, for similarly sized features, should rise very gradually.

When you evaluate software quality for a less than stellar codebase, you will see sharp upticks in feature cost. As time goes by, the expense of a feature will grow more than linearly. In less than ideal situations, look for polynomial rise. In catastrophic codebases, you might see exponential cost growth. Those codebases generally don’t last very long.

Team Reaction to Feature Requests

I’ll switch gears for a moment. So far, I’ve talked about antiseptically quantifiable ways to evaluate software quality from the outside. Now I’ll offer a very human-oriented one. To get a feel for the nature of the codebase, see how the development team reacts to change.

Maybe you have a Scrum shop and developers pull cards out of the backlog. Or, perhaps you have more traditional assignment methodology and you assign them tasks. In either case, reactions give valuable information. If they refuse to give an estimate or give extremely high estimates, it signals a lack of confidence in accomplishing the task. Likewise, if they balk or suggest not implementing that feature, you should worry.

I see a lot of codebases with large pockets of metaphorical Jenga code. The developers understand this, and will react accordingly when you prompt them to touch the mess. You’ll get a truer reaction in this scenario, on average, than by simply asking them. I say this not because teams dissemble, but because they tend to feel the same sense of ownership and pride as anyone else. They may not tell you outright that you have a software quality problem, but if they run from feature development like the plague, you can infer it.

Inexplicable or Weird Application Behaviors

When I do a codebase assessment, I always look at cohesion and coupling in detail. I find these to represent a codebase’s overall quality quite well. In layman’s terms, think of them this way.

Cohesion answers the question, “do things that belong together actually appear together?”
Coupling answers the question, “do things that don’t belong together appear together?”

From those working definitions, you can reasonably conclude that these two properties tell you how well (or poorly) the codebase is organized. High quality means high cohesion and relatively low coupling. Codebases exhibiting these properties lend themselves to easy change.

On the flip side, codebases with low cohesion and high coupling become expensive to change. But, beyond that, they do weird things. To understand why, imagine an inappropriate coupling in real life. Say I took your oven and your car and turned them into a “super-device.” This does you no good. And now, when you have to take your car to the shop, you can’t make dinner. If you tried to explain this to someone, they would think you insane.

Look for this in code as a sign of software quality issues. “We changed the font on the login screen and that somehow deleted a bunch of production customer accounts.” Crazy stories like that indicate code quality problems.

Catch It Early

As I said before, software quality gives off signals (symptoms) to non-technical stakeholders. Some, like software defects, are obvious. But with others, you must look for subtler signs. With something like fear of changing the code or weird defects, you may chalk them up to something else.

Resist this impulse. Regardless of the symptoms, the underlying causes — wrongness and change resistance — calcify as time goes on. As with planting a tree, the best time to fix it is years ago, and the second best time is now. So when evaluating software quality from the outside in, don’t ignore symptoms. At worst, everyone has a clarifying conversation and perhaps wastes a bit of time. And, at best, you catch a minor problem before it becomes the sort of thing someone calls me to come look at.

4 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Matthieu Cneude

8 years ago

Very good article!

I would like to come back on this:

“If they [the developers] refuse to give an estimate or give extremely high estimates, it signals a lack of confidence in accomplishing the task. Likewise, if they balk or suggest not implementing that feature, you should worry.”

Messy codebase is not the only reason of this kind of reaction. For example: I am actually working on a software with very tight (almost impossible) deadlines. I suggest not to implement a lot of features because of those deadlines and not because of the codebase…

Author

Erik Dietrich

Reply to Matthieu Cneude

Sure, fair enough. My intent here was to describe things that are symptoms of poor code quality. But you’re absolutely right that this (or the others) could be caused by other things as well.

Juzer Ali

You are spot on Erik, I wish I could explain this to my manager.

Reply to Juzer Ali

Thanks! Feel free to show the article, if it helps 🙂

How to Evaluate Software Quality from the Outside In

Defining Quality

Large, Diverse Defect Counts

Rising Cost of Features

Team Reaction to Feature Requests

Inexplicable or Weird Application Behaviors

Catch It Early

Addicted to Unit Testing

Static Analysis — Spell Check for Code

Inverting Control

Testable Code is Better Code

Adventures in Pure Test-Driven Development

Static Methods: Time to Hit Rock Bottom

Defining Quality

Large, Diverse Defect Counts

Rising Cost of Features

Team Reaction to Feature Requests

Inexplicable or Weird Application Behaviors

Catch It Early

Similar Posts