Is Your Code Hard to Understand?
Editorial Note: I originally wrote this post for the Infragistics blog.
I’ve heard a bit of conventional wisdom in the software industry. It holds that people will read a given line of code an order of magnitude more times than they’ll modify it. Whether that wisdom results from a rigorous study or not, I do not know. But it certainly squares with my experience. And it also seems to square with the experience of just about everyone else.
Accepting this wisdom as axiomatic, code readability becomes an important business concern. If you optimize for ease of writing at the cost of ease of reading, your business will lose money. Better to spend some extra time in writing your code to ensure that future readers have an easy go of it. And easy for them means that they understand the code.
You know of obvious ways to promote reader understanding in code. Don’t give variables cryptic names or names that cannot be pronounced. Don’t write gigantic methods and classes. Limit method parameters and local declarations. Read through the code out loud to see if it seems clear. These guidelines will bring you a long way toward readability.
But other, subtle concerns can also chip away at your code’s readability. I’ll list some of these here, today. These are generally C# specific, but some are more broadly applicable than that. What all of them have in common is that they constitute sources of confusion for readers that may not seem immediately obvious.
Magic Numbers
A magic number in your code is a numeric literal that exists without any explanation or hint as to its meaning. These can feel like the most natural things in the world to write. But they stymie maintenance programmers searching for meaning in the code. Consider the following snippet.
public void ProcessInvoices(IList<Invoice> invoices) { if(invoices.Count > 7) { foreach(var invoice in invoices) { invoice.Tabulate(); _fileWriter.Write($"Invoice total is {invoice.Total} dollars."); } } }
This method doesn’t present any obvious comprehension problems, with one glaring exception. Cycle through all of the invoices, having them do internal tabulation and then writing the total out to a file. And, do all of that if there are at least 7 invoices. What, what? Why 7? Why is that particular number significant?
For a maintenance programmer, magic numbers render inscrutable the motivations for decisions made in code. To make matters worse, the rationale for such decisions may get lost over time, leaving everyone to guess what, if anything, was so special about 7. When you find yourself writing a magic number, spend a minute and declare a constant or a variable with a name like, “ProcessingBatchMinimum.”
Casting
While it may be controversial, one can certainly argue that casting is a code smell. As former C# language team member Eric Lippert pointed out in this StackOverflow question, casting can indicate that you’re writing code where you have special knowledge that the compiler lacks. And I don’t recommend pitting your C# knowledge against that of the C# compiler.
But, beyond possible maintenance and runtime correctness issues, casting adds a barrier to understanding. The operation makes no conceptual, domain sense. Rather, it represents language minutiae. “Take this thing, turn it into this other thing for some reason, and then, if that works, do what matters to the new thing.” That doesn’t exactly roll off the tongue in explanation. Avoid casting as much as you can.
Lengthy Linq Chains
Linq gives C# programmers a great deal of power — power to create terse, functional-style expressions. Linq brings a declarative paradigm to C# code, allowing developers to move away from comparably verbose imperative programming. I strongly encourage you to take advantage of that.
But, at the same time, chaining Linq extension method calls together can lead to some pretty dense, hard to read code. Have you seen declarations like this?
var distinctCoordinatesBelow = _window.GetAll<T>().Where(c => c.Location.Y > LabelYCoordinate).OrderBy(c => c.Location.Y).Select(c => c.Location.Y).Distinct();
Okay, let’s see. Get all of something from the window, where the Y location is greater than something called “LabelYCoordinate.” Then, order it by y coordinate. Then, select Y coordinate. Then, keep the ones that are distinct.
When you unpack this and read it aloud, you find yourself making several different statements. Squashing them all into a single, dense line does not change that fact. Linq is powerful, but I encourage you to break these sort of statements up using descriptive intermediate variable names.
Nested Conditionals
Have you ever read through code that contained an if within an else within another if, inside of a case statement? Did you find that you needed to start jotting things down on paper to keep track of what was going on? If so, rest assured that the fault lay with the author and not with your lack of understanding.
The human brain can store only so many “chunks” of information at one time, making it possible for you to remember a 7 digit phone number, perhaps recall a 10 digit one, but struggle with much beyond that. That free context space fills quickly.
When you write code with nested if conditionals, you ask readers to fill that space remembering how they go to the current line. “Oh, right, if x was equal to 7, but there are less than 15 entries in the list, and there are the global variable Y is equal to 4, and the op code Z is ‘ABC’, then…. wait, what where was I?” Keep that to a minimum by keeping the number of nested conditionals to a minimum. Try not to ask your reader to keep track of more than 1 reason for being in a particular context within a method.
The Ternary Operator
You probably know what I mean when I say “ternary operator,” but, in case you don’t, consider the following example code.
public void ProcessInvoices(IList<Invoice> invoices) { int x = invoices.Count > 7 ? invoices.Sum(i => i.Total) : 0; }
The ternary operator involves the ? and the : and it gets its name from the fact that it takes 3 arguments: a boolean conditional, a value if true, and a value if false. So, reading this as a story, we have “Set x equal to, if there are more than 7 invoices, the sum of the totals of the invoices, otherwise, 0.” Another example that fails to roll nicely off of the tongue.
The ternary operator offers the handy ability to compact code and save yourself a verbose conditional. Without it here, you would need to declare x, set it equal to 0, then update it to be equal to the sum of the invoice totals if the invoice count was greater than 7. But in its own way, that verbose construct would present a more readable situation for the reader.
The ternary operator’s organization categorically eliminates it from reading left to right like common English in the way that many other statements do. I am not advocating that you avoid it altogether, but I do suggest you bend over backward to make it obvious. Extract the conditional into a method and add an extension method for the invoices to give you something that eliminates all noise.
public void ProcessInvoices(IList<Invoice> invoices) { int x = ThereAreEnough(invoices) ? invoices.Total() : 0; }
It still fails to read like conversational English, but now all that remains is single, clear arguments to the ternary operator. Set x equal to, enough invoices, then total, otherwise 0. You avoid the distraction presented by a bunch of other operators sprinkled among the ternary’s two.
All Food for Thought
I mentioned this when talking about the ternary operator, but it bears expanding to the whole conversation. I am not trying to tell you exactly how to code, nor am I suggesting that your code is somehow irretrievably bad if it contains these things. I am, however, suggesting that you keep an eye out for these concerns and understand the subtle drag effect they have on your code’s readability. Because in a world where code is read so frequently, every little bit of readability counts.
Great article. It’s good to have some reinforcement from time to time on things we tend to forget.
Thanks — glad you liked!
I’d sort of disagree regarding your LINQ advice, specifically about the part where you say to give descriptive intermediate variable names. I submit intermediate variables can certainly increase code comprehension under normal circumstances but I also believe that they *can* create extra noise and distract from the truly important variables. Stepping outside of LINQ for a moment, have you ever read code with far too many intermediate variables? It leads to those “ah-ha” moments where you say, “oh, amid a sea of declarations, *this* is the variable I actually care about.” In your example, you can say “I need all… Read more »
I agree, I’ve had plenty of times where I made this issue (hey, linq is cool but easy to abuse), and I’ve solved it by using a couple intermediate variables, but that just made it harder to read.
I’d advocate for making a dedicated method to do think linq for you, exactly like his solution for ternary. You could also pass that variable in via the filter method, for example:
`IEnumerable DistinctYCordsBelow(int value, Window window)`
then you could call that inline in your method to get what you need.
I find a reasonable compromise is to just split the chain into several lines, this makes it pretty readable and understandable in my opinion:
var distinctCoordinatesBelow =
_window.GetAll()
.Where(c => c.Location.Y > LabelYCoordinate)
.OrderBy(c => c.Location.Y)
.Select(c => c.Location.Y)
.Distinct();
I don’t know how different my take actually is than yours. As best I can recall, when writing this, I was thinking of my own tendency to string a lot of Linq together in a single statement. So, if you will, I was kind of admonishing myself to go back through every now and then and consider the person coming along after me and that maybe an intermediate variable declaration might help their comprehension. I *definitely* empathize with your story about a “sea of variable” declarations. In fact, I usually process such code by using the automated “inline local” refactoring… Read more »
I think my take was a bit reactionary. I see a lot of people say things like, “LINQ is great, but don’t abuse it,” then label *every* use as abuse. While I know you were trying to create an overwrought case for the purpose of an example, you actually stumbled upon a really great use of LINQ [in my mind]: var distinctCoordinatesBelow = _window.GetAll() .Select(c => c.Location.Y) .Where(y => y > LabelYCoordinate) .Distinct() .OrderBy(y => y); That seems to me to be very straightforward and expressive, exactly what LINQ is meant for, and saves you from having to declare yet-another… Read more »
Perhaps you should mention the Specification pattern for capturing business rules. Specifically… the LinqSpecs library is very useful (although not complete) for this. By composing the final rule from fragments you ensure each small chunk is understandable. Specs also help to make your code DRYer. In one project I work on we put the specs in a separate class from the domain object. Say the domain object is called “Car” then the Specs for it go in a static class called TheCar. You can now reuse rules in code that read like… ” if(TheCar.IsFast.IsSatisfiedBy(aParticularCar)”. Or more important in the case… Read more »
I know Roedy Green’s classic “How To Write Unmaintainable Code” is a little Java-centric, but it still seems oh, so relevant…
https://www.thc.org/root/phun/unmaintain.html
FYI, I edited the link you supplied to a different one containing the Roedy Green piece. The one you supplied was linking to something google was kicking back as a phishing site.
That’s interesting. I used that URL because Bing or Google (don’t remember which I used) had it higher on the list than the copy on Roedy’s own MindProducts website (where it’s buried as an item in his Java glossary).
¯_(ツ)_/¯
Does the link I put in go to the right place? (I wouldn’t want to change the intent of your comment)
Close enough — Roedy started with a 1-page essay, but it grew with all sorts of perverse suggestions over the years, so there are multiple versions out there.
As for the intent of my comment, most any version will do. Only the most severely humor-challenged can make it through even a few paragraphs without shaking your head and either grinning or groaning.
Great article! Especially the point regarding linq.
Thanks! The Linq one seems to be prompting the most discussion, anyway.
Thanks for sharing your ideas on improving code readibility. These are good suggestions that should be in the back of the mind of all software developers. There is nothing more dreadful is coming across a magic number. Even with my disdain for magic numbers, I find myself naturally creating them when not paying attention. It takes a certain amount of discipline to write code that both the computer and a human can understand. Sometimes you have to pick your poison. I’ve seen many immediate variables cause bugs. I try to avoid immediate variable at all cost. The innocent variable place… Read more »
I definitely agree about the tradeoff and balance theme. I think about that anytime I discuss or write about the idea of “readability” of code, and I think I have to try to go after “what 90% of people will find readable 90% of the time.” After all, someone could come along and say, “I find it easier to read classes without any newlines in them at all because they’re nice and vertically compact.” Phrased this way, all I could really respond with would be, “wow, I don’t think that’s for everyone, but I’m in no position to tell you… Read more »
I really like that motto: go after “what 90% of people will find readable 90% of the time.” I guess we’ve all known “That guy” who complains “Why do you break your code into all these classes and short methods? It makes me waste time jumping around. It’s so much easier to read if you just put it all in one class and a few subroutines. That way, I can find anything I want by just hitting Ctrl+f in notepad. And make sure you declare all your variables in a #region at the top.” But kidding aside, there does seem… Read more »
Excellent point about the “magic” numbers! I be guilty of that habit way too often. My recommendation would be advocate “do not skimp on the whitespace” such as blank lines and spaces before/after a language’s keywords (when allowable).
Is spaces around keywords a physical optics thing? (As in, you have to squint less to read it?) Just idle curiosity — not making any kind of rhetorical point.
Dear Erik Dietrich,
ALL code is hard to understand. Picking apart language idiomatic syntax is irrelevant because any and all language syntax obscures, dissects, decouples, entangles, abstracts, simplifies, modifies, and otherwise complicates the initial intent, which is usually relatively simple to state in any spoken language of your choice.
If it were not so, we wouldn’t need unit testing, QA departments, and lawyers (at least for software bugs that kill people and waste millions of dollars of tax payer’s money when Mars satellites and rovers go boom or missing.)
Marc
I’m trying to find common ground with what you’ve said here, and I just can’t get there. I wouldn’t dispute that it’s harder on average to express concepts in code than in spoken language. But the idea that language semantics in the code you write don’t matter sounds hyper-relativistic to the point complete fatalism. Are you really saying that there’s no point to making your code clearer because all code is all code is equally inscrutable? If you encountered this legal C# method in your codebase, would you just shrug, leave it, and say, “meh, no worse than anything else?”… Read more »
Article is okay, but it seems to pay attention to its word count more than it should. Luckily, this is a damn superficial topic: “How to write better code.” No one can learn imitation as quickly as they can gain experience.
With regard to magic numbers, aside from declaring the constant, I’d want the developer to add a small comment line, explaining why that constant has to have that value. For example:
// invoices lists larger than 7 are treated separately (@See FooClass)
As to Linq, or rather it’s java equivalent (object::stream), there I’d always advise extracting methods and using multilining wisely. For example:
window.getAll().stream()
.filter(this::isLabeledButton)
.filter(this::sufficientlyHigh)
.filter(this::sufficientlyToTheRight)
.map(Button::getLabel)
.sort()
.collect(Collectors.toList);
My experience with “if (length>7)” renders the comment redundant, (c.f. ‘X := 1 // set X to 1’ and avoiding the real question: why 7?
Granted, the comment saying what the code does is exceptionally pointless, but I don’t see how that would generally render a comment redundant. A useful comment for “if (length>7)” would not be to simply say “// length needs to be larger than 7” but would be “// lengths less than 8 do not apply because of foo and bar”.
The named constant idea _is_ useful, it covers the ‘Why 7?’ question very well. I’m not sure where the general case entered the discussion, however.
The general case was there from the start, since the example in the post is exactly that: an example.
ProcessingBatchMinimum says _what_ that 7 is, but says nothing about _why_ someone can’t simple change it to 9, or what _would_ allow it to be changed to another value.
The general case, many general cases, may be inferred. None of them are here.
Personally, I leave the rationale for the 7 to the TDD phase.
There is an example here, of the general case. That can be easily inferred from the fact that the section title is “Magic numbers” rather than “7” or “ProcessingBatchMinimum”.
Having it in the tests is also not bad, even though I myself have a preference for keeping those things in the business code: the unit tests will make sure that it works as expected; they won’t inherently give information about why it should work that way (e.g. that that concern is handled elsewhere).
The why, in my experience, is a better place for the why`s than the code. IMO, a why in the code is a smell.
The Ternary Operator is better know as the Elvis operator, and it has a ring to it.
..perhaps it should leave the building?
I’m not sure those are synonyms. If I recall correctly, the Elvis operator takes the form x = y ?: z whereas the ternary operator takes the form x = if(condition()) ? y : z
I think a more apples to apples comparison in C# would be the null coalescing operator x = y ?? z, but that’s a special case of the concept, I think.
?. is also called the Elvis operator.