DaedTech

Stories about Software

By

Please Don’t Recycle Local Variables

I think there’s a lot of value to the conservation angle of the green movement. In general, it’s a matter of efficiency–if you can heat/light/whatever your house with the same quality of life, using less energy and fewer resources, that’s a win for everyone. This applies to a whole lot of things beyond just eco-concerns, however. Conserving heat when you’re cold, conserving energy when you’re running a marathon, conserving your dollars when making a budget–all good ideas. Cut down, conserve, reuse when you can.

Recycle

Except please don’t do it with your local variables. For example:

Here, we initialize a local variable, count, and use it to keep track of the results of some processing of customers. When we’re done, we reset count and use it to keep track of the apparently unrelated concept of machines. What I’m saying is that there shouldn’t be just one count, but rather customerCount and machineCount.

Does this seem like nitpicking? You could certainly make that argument, but this code is not going to age well. First of all, this method should clearly be two methods, so we’re starting right off the bat with a bit of technical debt. It would be cleaner if each loop had its own method.

But an interesting thing happens if we use the refactoring tools to try to do that–the refactoring tool wants return values or input parameters. Yikes, that was unexpected, so we just move on. Later, when the time comes to iterate over movies, we see that there’s a ‘design pattern’ in place, so we modify the code to look like this:

Now this thing should really be split up, so we start selecting parts of it to see what we can refactor. Ew, now we’re getting ref parameters to boot. This thing is getting even more painful to try to refactor, and we’re in a hurry, so no time for that. And to make matters worse, if you add in a few other aggregator variables this way, you’ll start to have all kinds of barriers in place when you want to pull this thing apart, such as crazy sets of out parameters. I’ve posted before about how I feel about ref and out.

All of this mounting technical debt could easily be avoided by giving each loop its own count variable. Having them recycle the same one creates a compile-time dependency of what’s going on in each loop with what happened in the loop before, even though there are no other similar dependencies in evidence. In other words, recycling this local variable is the only thing that’s creating a coupling in your code–there’s no logical reason to do it.

This is the height of procedural programming and baking in temporal dependencies that I cautioned you to avoid here. It’s a completely useless dependency that will inhibit refactoring and dirty up your code in a hurry. It may not seem like much yet, but this will be a huge pain point later as the lines of code in this method balloon from the dozens to the hundreds, and you rely heavily on automated tools to help with cleanup. Flag variables used over and over in sequence throughout a method are like pebbles in your shoe when you’re trying to refactor.

So my advice is to avoid this practice completely. There’s really no advantage to coding this way and the potential downside is enormous.

By

How to Keep Method Size Under Control

Do you ever open a source code file and see a method that starts at the top of your screen and kind of oozes its way to the bottom with no end in sight? When you find yourself in that situation, imagine that you’re reading a ticker tape and try to guess at where the method actually ends. Is it a foot below the monitor? Three feet? Does it plummet through the floor and into the basement, perhaps down past the water table and into the earth’s mantle?

TickerMonitor

Visualized like this, I think everyone might agree that there’s some point at which the drop is too far, though there’s likely some disagreement on where exactly this is. Personally, I used to subscribe to the “fits on a screen” heuristic and would only start looking to pull out methods if it got beyond that. But in more recent years, I think even smaller. How small? I dunno–five or six lines, max. Small enough that you’ll only ever see one try-catch or control flow statement in there. Yeah, seriously, that small. If you’re thinking it sounds kind of crazy, I get that, but give it a try for a while. I can almost guarantee that you’ll lose your patience for looking at methods that cause you to think, “wait, where was loopCounter declared again–before the second or third while loop?”

If you accept the premise that this is a good way to do things or that it might at least be worth a try, the first thing you’ll probably wonder is how to go about doing this from a practical standpoint. I’ve definitely encountered people and even whole groups who considered method sizes like this to be impractical. The first thing you have to do is let go of the notion that classes are in some kind of limited supply and you have to be careful not to use too many. Same with modules, if your project gets big enough. The reason I say this is that having small methods means that you’re going to have a lot of them. This in turn means that they’re going to need to be spread to multiple classes, and those classes will occupy more namespaces and modules. But that’s okay. If you encounter a large application that’s well designed and factored, it’s that way because the application is actually a series of small, focused components working together. Monolithic doesn’t scale well.

Getting Down to Business

If you’ve prepared yourself for the reality of needing more classes organized into more namespaces and modules, you’ve really overcome the biggest obstacle to being a small-method coder. Now it’s just a question of mechanics and practice. And this is actually important–it’s not sufficient to just say, “I’m going to write a lot of methods by stopping at the fifth line, no matter what.” I guarantee you that this is going to create a lot of weird cross-coupling, unnecessary state, and ugly things like out parameters. Nobody wants that. So it’s time to look to the art of creating abstractions.

As a brief digression, I’ve recently picked up a copy of Uncle Bob Martin’s Clean Code: A Handbook of Agile Software Craftsmanship and been tearing my way through it pretty quickly. I’d already seen most of the Clean Coder video series, which covers some similar ground, but the book is both a good review and a source of new and different information. To be blunt, if you’re ever going to invest thirty or forty bucks in getting better at your craft, this is the thing to buy. It’s opinionated, sometimes controversial, incredibly specific, and absolute mandatory reading. It will change your outlook on writing code and make you better at what you do, even if you don’t agree with every single point in it (though I don’t find much with which to take issue, personally).

The reason I mention this book and series is that there is an entire section in the book about functions/methods, and two of its fundamental points are that (1) functions should do one thing and one thing only, and (2) that functions should have one level of abstraction. To keep those methods under control, this is a great place to start. I’d like to dive a little deeper, however, because “do one thing” and “one level of abstraction per function” are general instructions. It may seem a bit like hand-waving without examples and more concrete heuristics.

Extract Finer-Grained Details

What Uncle Bob is saying about mixed abstractions can be demonstrated in this code snippet:

Do you see what the issue is? We have a method here that describes (via sub-methods that are not pictured) how to open a door. The first two calls talk in terms of actions between you and the door, but the next three calls suddenly dive into the specifics of how to pull the door open in terms of actions taken by your muscles, joints, tendons, etc. These are two different layers of abstractions: one about a person interacting with his or her surroundings and the other detailing the mechanics of body movement. To make it consistent, we could get more detailed in the first two actions in terms of extending arms and tightening fingers. But we’re trying to keep methods small and focused, so what we really want is to do this:

Create Coarser Grained Categories

What about a different problem? Let’s say that you have a method that’s long, but it isn’t because you are mixing abstraction levels:

These items are all at the same level of abstraction, but there are an awful lot of them. In the previous example, we were able to tighten up the method by making the abstraction levels consistent, but here we’re going to actually need to add a layer of abstraction. This winds up looking a little better:

In essence, we’ve created categories and put the actions from the long method into them. What we’ve really done here is create (or add to) a tree-like structure of methods. The public method is the root, and it had thirteen children. We gave it instead four children, and each of those children has between two and five children of its own. To tighten up methods, it’s perfectly viable to add “nodes” to the “tree” of your call stack. While “do one thing” is still a little elusive, this seems to be carrying us in that direction. There’s no individual method that you look at and think, “boy, that’s a lot of stuff going on.” Certainly its a matter of some art and taste, but this is probably a good way to think of it–organize stuff into hierarchical method categories until you look at each method and think, “I could probably memorize what that does if I needed to.”

Recognize that Control Flow Uses Up an Abstraction

So far we’ve been conceptually figuring out how to organize families of methods into well-balanced tree structures, and that’s taken us through some pretty mundane code. This code has involved none of the usual stuff that sends apps careening off the rails into bug land, such as conditionals, loops, assignment, etc. Let’s correct that. Looking at the code above, think of how you’d modify this to allow for the preparation of an arbitrary number of quesadillas. Would it be this?

Well, that makes sense, right? Just like the last version, this is something you could read conversationally while in the kitchen just as easily as you do in the code. Prep your ingredients, then prep your equipment, then for some integer index equal to zero and less than the number of quesadillas you want to cook, increment the integer by one. Each time you do that, cook the quesadilla. Oh, wait. I think we just went careening into the nerdiest kitchen narrative ever. If Gordon Ramsey were in charge, he’d have strangled you with your apron for that. Hmm… how ’bout this?

Well, I’d say that the CookQuesadillas method looks a lot better, but do we like “PerformActualCooking?” The whole situation is an improvement, but I’m not a huge fan, personally. I’m still mixing control flow with a series of domain concepts. PerformActualCooking is still both a story about for-loops and about cooking. Let’s try something else:

We’ve added a node to the tree that some might say is one too many, but I disagree. What I like is the fact that we have two methods that contain nothing but abstractions about the domain knowledge of cooking and we have a bridging method that brings in the detailed realities of the programming language. We’re isolating things like looping, counting, conditionals, etc. from the actual problem solving and story telling that we want to do here. So when you have a method that does a few things and you think about adding some kind of control flow to it, remember that you’re introducing a detail to the method that is at a lower level of abstraction and should probably have its own node in the tree.

Adrift in a Sea of Tiny Methods

If you’re looking at this cooking example, it probably strikes you that there are no fewer than eighteen methods in this class, not counting any additional sub-methods or elided properties (which are really just methods in C# anyway). That’s a lot for a class, and you may think that I’m encouraging you to write classes with dozens of methods. That isn’t the case. So far what we’ve done is started to create trees of many small methods with a public method and then a ton of private methods, which is a code smell called “Iceberg Class.” What’s the cure for iceberg classes? Extracting classes from them. Maybe you turn the first two methods that prepare ingredients and equipment into a “Preparer” class with two public methods, “PrepareIngredients” and “PrepareEquipment.” Or maybe you extract a quesadilla cooking class.

It’s really going to vary based on your situation, but the point is that you take this opportunity pick nodes in your growing tree of methods and sub-methods and convert them into roots by turning them into classes. And if doing this leads you to having what seems to be too many classes in your namespace? Create more namespaces. Too many of those in a module? Create more modules. Too many modules/projects in a solution? More solutions.

Here’s the thing: the complexity exists no matter how many or few methods/classes/namespaces/modules/solutions you have. Slamming them all into monolithic constructs together doesn’t eliminate or even hide that complexity, though many seem to take the ostrich approach and pretend that it does. Your code isn’t somehow ‘simpler’ because you have one solution with one project that has ten classes, each with 300 methods of 7,000 lines. Sure, things look simple when you fire up the IDE, but they sure won’t be simple when you try to debug. In fact, they’ll be much more complicated because your functionality will be hopelessly interwoven with weird temporal couplings, ad-hoc references, and hidden dependencies.

If you create large trees of functionality, you have the luxury of making the structure of the tree the representative of the application’s complexity, with each node an island of simplicity. It is in these node-methods that the business logic takes place and that getting things right is most important. And by managing your abstractions, you keep these nodes easy to reason about. If you structure the tree correctly and follow good OOP design and practice, you’ll find that even the structure of the tree is not especially complicated since each node provides a good representative abstraction for its sub-tree.

Having small, readable, self-documenting methods is no pipe dream. Really, with a bit of practice, it’s not even very hard. It just requires you to see code a little bit differently. See it as a series of hierarchical stories and abstractions rather than as a bunch of loops, counters, pointers, and control flow statements, and the people that maintain what you write, including yourself, will thank you for it.

By

The Way We Write Code is Stupid: Source Code Files Considered Harmful

Order Doesn’t Matter

Please pardon the loaded phrasing in the title, but that’s how the message came to me from my subconscious brain: bluntly and without ceremony. I was doing a bit of work in Apex, the object-oriented language specific to Salesforce.com, and it occurred to me that I had no idea what idiomatic Apex looked like. (I still don’t.) In C++, the convention (last time I was using it much, anyway) is to first define public members in class headers and then the private members at the bottom. In C#, this is inverted. I’ve seen arguments of all sorts as to which approach is better and why. Declaring them at the top makes sense since the first thing you want to see in the class is what its state will consist of, but declaring the public stuff at the top makes sense since that’s what consumers will interact with and it’s like the above-water part of your code iceberg.

When programming in any of the various programming languages I know, I have this mental cache of what’s preferred in what language. I attempt to ‘speak’ it without an accent. But with Apex, I have no idea what the natives ‘sound’ like, not having seen it in use before. Do I declare instance variables at the bottom or the top? Which is the right way to eat bread: butter side up or butter side down? I started googling to see what the ‘best practice’ was for Apex when the buzzing in my subconscious reached some kind of protesting critical mass and morphed into a loud, clear message: “this is completely stupid.”

I went home for the day at that point–it was late anyway–and wondered what had prompted this visceral objection. I mean, it obviously didn’t matter from a compiled code perspective whether instance variables or public methods come first, but it’s pretty well established and encouraged by people as accomplished and prominent as “Uncle” Bob Martin that consistency of source code layout matters, if not the layout specifics (paraphrased from my memory of his video series on Clean Coders). I get it. You don’t want members of your team writing code that looks completely different from class to class because that creates maintenance headaches and obscures understanding. So what was my problem?

I didn’t know until the next morning in the shower, where I seem to do my most abstract thinking. I didn’t think it was stupid to try to make my Apex code look like ‘standard’ Apex. I thought it was stupid that I needed to do so at all. I thought it was stupid to waste any time thinking about how to order code elements in this file when the only one whose opinion really matters–the compiler–says, “I don’t care.” Your compiler is trying to tell you something. Order doesn’t matter to it, and you shouldn’t care either.

Use Cases: What OOP Developers Want

But the scope of my sudden, towering indignation wasn’t limited to the fact that I shouldn’t have to care about the order of methods and fields. I also shouldn’t have to care about camel or Pascal casing. I shouldn’t have to care about underscores in front of field names or inside of method names. It shouldn’t matter to me if public methods come before private or how much indentation is the right amount of indentation. Should methods be alphabetized or should they be in some other order? I don’t care! I don’t care about any of this.

Let’s get a little more orderly about this. Here are some questions that I ask frequently when I’m writing source code in an OOP language:

  • What is the public API of this type?
  • What private methods are in the ‘tree’ of this public method?
  • What methods of this type mutate or reference this field?
  • What are the types in this namespace?
  • What are the implementations of this interface in this code base?
  • Let’s see this method and any methods that it overrides.
  • What calls this method?

Here are some questions that I never ask out of actual interest when writing source code.  These I either don’t ask at all or ask in exasperation:

  • What’s the next method in this file?
  • How many line feed characters come before the declaration of this variable?
  • Should I use tabs or spaces?
  • In what region is this field’s declaration?
  • Did the author of this file alphabetize anything in it?
  • Does this source file have Windows or *NIX line break characters?
  • Is this a field or a method or what?

With the first set of questions, I ask them because they’re pieces of information that I want while reasoning about code.  With the second set of questions, they’re things I don’t care about.  I view asking these questions as an annoyance or failure.  Do you notice a common pattern?  I certainly do.  All of the questions whose answers interest me are about code constructs and all the ones that I don’t care about have to do with the storage medium for the code: the file.

But there’s more to the equation here than this simple pattern.  Consider the first set of questions again and ask yourself how many of the conventions that we establish and follow are simply ham-fisted attempts to answer them at a glance because the file layout itself is incapable of doing so.  Organizing public and private separately is a work-around to answer the first question, for example.  Regions in C#, games with variable and method naming, “file” vs “type” view, etc. are all attempts to overcome the fact that files are actually really poor communication media for object-oriented concepts.  Even though compilers are an awful lot different now than they were forty years ago, we still cling to the storage medium for source code best suited to those old compilers.

Not Taking our own Advice

If you think of an ‘application’ written in MS Access, what comes to mind?  How about when you open up an ASP web application and find wizard-generated data sources in the markup, or when you open up a desktop application and find SQL queries right in your code behind?  I bet you think “amateurs wrote this.”  You are filled with contempt for the situation–didn’t anyone stop to think about what would happen if data later comes in some different form?  And what about some kind of validation?  And, what the–ugh… the users are just directly looking at the tables and changing the column order and default sorting every time they look at the data.  Is everyone here daft?  Don’t they realize how ridiculous it is to alter the structure of the actual data store every time someone wants a different ordering of the data?

OldYoungAnd you should see some of the crazy work-arounds and process hacks they have in place. They actually have a scheme where the database records the name of everyone who opens up a table and makes any kind of change so that they can go ask that person why they did it.  And–get this–they actually have this big document that says what the order of columns in the table should be.  And–you can’t make this stuff up–they fight about it regularly and passionately.  Can you believe the developers that made this system and the suckers that use it? I mean, how backward are they?

In case you hadn’t followed along with my not-so-subtle parallel, I’m pointing out that we work this way ourselves even as we look with scorn upon developers who foist this sort of thing on users and users who tolerate it.  This is like when you finally see both women in the painting for the first time–it’s so clear that you’ll never un-see it again.  Why do we argue about where to put fields and methods and how to order things in code files when we refuse to write code that sends users directly into databases, compelling them to bicker over the order of column definition in the same?  RDBMS (or any persistence store) is not an appropriate abstraction for an end user–any end user–whether he understands the abstraction or not.  We don’t demand that users fight, decide that there is some ‘right’ way to order invoices to be printed, and then lock the Invoice table in place accordingly for all time and pain of shaming for violations of an eighty-page invoice standard guideline document.  So why do that to ourselves?  When we’re creating object-oriented code, sequential files, and all of the particular orderings, traversings and renderings thereof are wildly inappropriate abstractions for us.

What’s the Alternative?

Frankly, I don’t know exactly what the alternative is yet, but I think it’s going to be a weird and fun ride trying to figure that out.  My initial, rudimentary thoughts on the matter are that we should use some sort of scheme in which the Code DOM is serialized to disk for storage purposes.  In other words, the domain model of code is that there is something called Project, and it has a collection of Namespace.  Namespace has a collection of Type, which may be Interface, Enum, Struct, Class (for C# anyway–for other OOP languages, it’s not hard to make this leap).  Class has one collection each of Field, Method, Property, Event.  The exact details aren’t overly important, but do you see the potential here?  We’re creating a hierarchical model of code that could be expressed in nested object or relational format.

In other words, we’re creating a domain model entirely independent of any persistence strategy.  Can it be stored in files?  Sure. Bob’s your uncle.  You can serialize these things however you want.  And it’ll need to be written to file in some form or another for the happiness of the compiler (at least at first).  But those files handed over to the compiler are output transforms rather than the lifeblood of development.

Think for a minute of us programmers as users of a system with a proper domain, one or more persistence models, and a service layer.  Really, stop and mull that over for a moment.  Now, go back to the use cases I mentioned earlier and think what this could mean.  Here are some properties of this system:

  1. The basic unit of interaction is going to be the method, and you can request methods with arbitrary properties, with any filtering and any ordering.
  2.  What appears on your screen will probably be one or more methods (though this would be extremely flexible).
  3. It’s unlikely that you’d ever be interested in “show me everything in this type.”  Why would you?  The only reason we do this now is that editing text files is what we’re accustomed to doing.
  4. Tracing execution paths through code would be much easier and more visual and schemes that look like Java’s “code bubbles” would be trivial to create and manipulate.
  5. Most arguments over code standards simply disappear as users can configure IDE preferences such as “prepend underscores to all field variables you show me,” “show me everything in camel casing,” and, “always sort results in reverse alphabetical order.”
  6. Arbitrary methods from the same or different types could be grouped together in ad-hoc fashion on the screen for analysis or debugging purposes.
  7. Version/change control could occur at the method or even statement level, allowing expression of “let’s see all changes to this method” or “let’s see who made a change to this namespace” rather than “let’s see who changed this file.”
  8. Relying on IDE plugins to “hop” to places in the code automatically for things like “show all references” goes away in favor of an expressive querying syntax ala NDepend’s “code query language.”
  9. New domain model allows baked-in refactoring concepts and makes operations like “get rid of dead code” easier or trivial, in some cases.

Longer Reaching Impact

If things were to go in this direction, I believe that it would have a profound impact not just on development process but also on the character and quality of object oriented code that is written in general.  The inherently sequential nature of files and the way that people reason about file parsing, I believe, lends to or at least favors the dogged persistence of procedural approaches to object oriented programming (static methods, global state, casting, etc.).  I think that the following trends would take shape:

  1. Smaller methods.  If popping up methods one at a time or in small groups becomes the norm, having to scroll to see and understand a method will become an anomaly, and people will optimize to avoid it.
  2. Less complexity in classes.  With code operations subject to a validation of sorts, it’d be fairly easy to incorporate a setting that warns users if they’re adding the tenth or twentieth or whatever method to a class.  In extreme cases, it could even be disallowed (and not through the honor system or ex post facto at review or check in–you couldn’t do it in the first place).
  3. Better conformance to Single Responsibility Principle (SRP).  Eliminating the natural barrier of “I don’t want to add a new file to source control” makes people less likely awkwardly to wedge methods into classes in which they do not belong.
  4. Better cohesion.  It becomes easy to look for fields hardly used in a type or clusters of use within a type that could be separated easily into multiple types.
  5. Better test coverage.  Not only is this a natural consequence of the other items in this list, but it would also be possible to define “meta-data” to allow linking of code items and tests.

What’s Next?

Well, the first things that I need to establish is that this doesn’t already exist somewhere in the works and that I’m not a complete lunatic malcontent.  I’d like to get some feedback on this idea in general.  The people to whom I’ve explained a bit so far seem to find the concept a bit far-fetched but somewhat intriguing.

I’d say the next step, assuming that this passes the sanity check would be perhaps to draw up a white paper discussing some implementation/design strategies with pros and cons in a bit more detail.  There are certainly threats to validity to be worked out such as the specifics of interaction with the compiler, the necessarily radical change to source control approaches, the performance overhead of performing code transforms instead of just reading a file directly into memory, etc.  But off the top of my head, I view these things more as fascinating challenges than problems.

In parallel, I’d like to invite anyone who is at all interested in this idea to drop me an email or send me a tweet.  If there are others that feel the way I do, I think it’d be really cool to get something up on Github and maybe start brainstorming some initial work tasks or exploratory POCs for feasibility studies.  Also feel free to plus-like-tweet-whatever to others if you think they might be interested.

In conclusion I’ll just say that I feel like I’d really like to see this gain traction and that I’d probably ratchet this right to the top of my side projects list if people are interested (this being a bit large in scope for me alone in my spare time).  Now whenever I find myself editing source files in an IDE I feel like a bit of a barbarian, and I really don’t think we shouldn’t have to tolerate this state of affairs anymore.  Productivity tools designed to hide the file nature of our source code from us help, but they’re band-aids when we need disinfectants, antibiotics, and stitches.  I don’t know about you, but I’m ready to start writing my object-oriented code using an IDE paradigm that doesn’t support GOTO Line as if we were banging out QBasic in 1986.

By

Hilarious Conditional Bloopers!

For this Friday, I thought I’d do something a little more lighthearted and, in the tradition of bad television (or Robot Chicken’s satire thereof) post some programming bloopers. These are actual things that I’ve personally seen in source code as opposed to some kind of specific sampling of CodeSOD from the Daily WTF. Doing this series of posts about Boolean algebra made me think conditional logic fails I’ve seen both recently and long in the past.

For each one, I’ve tried my best to give it a catchy name, an explanation of the problem, an example of what it translates to in simple English (i.e. why it doesn’t “read like well written prose”), and what it ought to look like. So, without further ado, here are the bloopers:

The Ingrown Branch

I call this The Ingrown Branch because of what it does. It introduces a conditional — a fork in the road, if you will — and it winds up in the same spot no matter which branch you take. In conversational English, this says “if the number of milk cartons is not 12, make it 12”. While this doesn’t sound ridiculous in the same way that others here will, consider what’s being done. If x is equal to 12, well, then do nothing because x is equal to 12. Otherwise, if it’s not equal to 12, set it to 12. What would be a simpler way to do this?

The Tautology

In conversational English, a tautology is something that is vacuously true or redundant. In logic, this is something that is always true, ipso facto, such as “A or NOT(A)”, for instance. In terms of conversational English, this is like saying “If I’m out of milk or if I’m not out of milk, I’m going to go buy some milk.” Why not drop the spurious conditionals and get to the point:

The Contradiction

The opposite of a tautology, a contradiction is something that is vacuously false, such as primitive type not being equal to itself. With instances like this and the tautology, I usually see more complex incarnations that are harder to spot or else I give the benefit of the doubt and assume that manipulation of a more complex conditional occurred in the past and the thing was accidentally left in this condition. But this doesn’t alter the fact that I have seen code like this and that, in plain English, this would translate to “If I’m both completely out of milk and I have some milk, I’m going to buy milk.” It’s mind-bending nonsense that would best be described as:

The Double Negative

I realize that this may be largely a product of speaking English as a first language, since double (and more) negatives are acceptable in some other languages. But you have to look at code like this and think, did anyone read this to themselves? “If I it’s false that I’m not out of milk, I will go to the store.” Wat? Okay, so not out of milk means that you have it, so if it’s false that you’re not out of milk, it’s false that you have it, and you are out of milk… aha! Why didn’t you just say so:

Ifception

An if within an if within an if… (credit to Dan Martin for this term). This is another mind-bending way of writing things that is rather jarring to the reader of the code, like saying “If I’m out of milk if I’m out of eggs if I’m out of beer, then I’m going to the store.” Dude, wat? Oh, you mean “If you’re out of milk AND you’re out of eggs AND you’re out of beer, then you’re going to the store? Well, nice to see what your breakfast priorities are, but at least that doesn’t read like the mumblings of a lunatic.”

The Robot

Perhaps this is a little nitpicky, but this explicit formation of Boolean conditionals bothers me. “If it equals true that I am out of milk, I will go to the store” sounds like some robot helper from the Jetsons or one of those shows that features a preposterous token “genius” whose intelligence is conveyed by having him speak like some robot helper from the Jetsons. Why not “If I’m out of milk, I will go to the store?”

The Yoda

If program in C you do, sense this makes and a clever trick to avoid assignment instead of comparison errors this is. If program in C you don’t, annoying and indicative that you’re not adopting the thinking of the language you’re working this is. When you speak English and try to sound like a native speaker, you don’t say “If missing is the milk, go to the store”. You say “If the milk is missing, go to the store.”

The Existential No-Op

Or, see variations where the comment is replaced by “return;” or some other similar thing. This is a conditional where, true or false, you do nothing. It sort of makes you question the very nature of (its) existence. This is like me saying to you “If I’m out of milk…” When you wait patiently for a moment and say “yes…?” I then say “nothing — that was all.” What should this be replaced with? How about just deleting the whole bit of nonsense?

Growing Pains

See what’s going on here? This conditional is growing so unwieldy that you forget by the end of it that you already mentioned being out of milk again. “If I’m out of milk, eggs, beer and milk, I’m going to the store.” “You said milk twice.” “I like milk.” How about dividing it up a bit and saying “If I am out of staples and I’m out of snacks, then I’m going to the store.”

The Mad Scoper

I think we’ve all seen one of these — someone on the team or in the group has a few too many cups of coffee and really goes to town on the old 9 and 0 keys. This is probably done to make sure that order of operations is being followed when you don’t understand the order of operations. Conversational equivalent? “If I am out of staples, and I mean staples and not whatever is coming next until I’m ready to talk about that and now I’m ready so I’m going to talk about that and that is snacks and not staples we’re not talking about staples anymore we’re talking about snacks which if I’m out of I’m also not going to the store, okay done.” Let’s dial it back to the last solution:

The Fly Swallower (aka The Train Wreck)

This is formally known as design with violations of the Law of Demeter, but it’s easier just to think of it as a train wreck. But the name I’m giving it comes from a nursery rhyme, which is how this starts to sound in conversational English. “There was an old lady who if she swallowed a horse who if it swallowed a cow who if it swallowed a hog who if it swallowed a dog…” How should this sound? There’s no easy fix. You need a different object model.

And with that, I’ll conclude my fun Friday post. This is meant to be light-hearted and in jest, but I’d say there’s definitely a good bit of truth here. You may not agree entirely with my assessment, but I think we’d all be well served to do the occasional double check to make sure we’re not leaving conditional bloopers in the code for others to read, triggering pointing and laughter. If you have other names for these or other conditional bloopers to mention (or you think I’m completely off base with one of these) please feel free to chime in.

By the way, if you liked this post and you're new here, check out this page as a good place to start for more content that you might enjoy.

By

Casting is a Polymorphism Fail

Have you ever seen code that looked like the snippet here?

You probably have seen code like this, and I hope that it makes you sad. I know it makes me sad. It makes me sad because it’s clearly the result of a fundamental failure to understand (or at least implement) polymorphism. Code written like this follows an inheritance structure, but it completely misses the point of that structure, which is the ability to do this instead:

What’s so great about this? Well, consider what happens if I want to add “Bird” or “Bear” to the mix. In the first example with casting, I have to add a class for my new animal, and then I have to crack open the menagerie class and add code to the MakeNoise() method that figures out how to tell my new animal to make noise. In the second example, I simply have to add the class and override the base class’s MakeNoise() method and Menagerie will ‘magically’ work without any source code changes. This is a powerful step toward the open/closed principle and the real spirit of polymorphism — the ability to add functionality to a system with a minimum amount of upheaval.

But what about more subtle instances of casting? Take the iconic:

Is this a polymorphism failure? It can’t be, can it? I mean, this is the pattern for event subscription/handling laid out by Microsoft in the C# programming guide. Surely those guys know what they’re doing.

As a matter of fact, I firmly believe that they do know what they’re doing, but I also believe that this pattern was conceived of and created many moons ago, before the language had some of the constructs that it currently does (like generics and various frameworks) and followed some of the patterns that it currently does. I can’t claim with any authority that the designers of this pattern would ask for a mulligan knowing what they do now, but I can say that patterns like this, especially ones that become near-universal conventions, tend to build up quite a head of steam. That is to say, if we suddenly started writing even handlers with strongly typed senders, a lot of event producing code simply wouldn’t work with what we were doing.

So I contend that it is a polymorphism failure and that casting, in general, should be avoided as much as possible. However, I feel odd going against a Microsoft standard in a language designed by Microsoft. Let’s bring in an expert on the matter. Eric Lippert, principal developer on the C# compiler team, had this to say in a stack overflow post:

Both kinds of casts are red flags. The first kind of cast raises the question “why exactly is it that the developer knows something that the compiler doesn’t?” If you are in that situation then the better thing to do is usually to change the program so that the compiler does have a handle on reality. Then you don’t need the cast; the analysis is done at compile time.

The “first kind” of cast he’s referring to is one he defines earlier in his post as one where the developer “[knows] the runtime type of this expression but the compiler does not know it.” That is the kind that I’m discussing here, which is why I chose that specific portion of his post. In our case, the developer knows that “sender” is a button but the compiler does not know that. Eric’s point, and one with which I wholeheartedly agree, is “why doesn’t the compiler know it and why don’t we do our best to make that happen?” It just seems like a bad idea to run a reality deficit between yourself and the compiler as you go. I mean, I know that the sender is a button. You know the sender is a button. The method knows the sender is a button (if we take its name, containing “ButtonClicked” at face value). Maintainers know the sender is a button. Why does everyone know sender is a button except for the compiler, who has to be explicitly and awkwardly informed in spite of being the most knowledgeable and important party in this whole situation?

But I roll all of this into a broader point about a polymorphic approach in general. If we think of types as hierarchical (inheritance) or composed (interface implementation), then there’s some exact type that suits my needs. There may be more than one, but there will be a best one. When writing a method and accepting parameters, I should accept as general a type as possible without needing to cast so that I can be of the most service. When returning something, I should be as specific as possible to give clients the most options. But when I talk about “possible” I’m talking about not casting.

If I start casting, I introduce error possibilities, but I also necessarily introduce a situation where I’m treating an object as two different things in the same scope. This isn’t just jarring from a readability perspective — it’s a maintenance problem. Polymorphism allows me to care only about some public interface specification and not implementation details — as long as the thing I get has the public API I need, I don’t really care about any details. But as soon as I have to understand enough about an object to understand that it’s actually a different object masquerading as the one I want, polymorphism is right out the window and I suddenly depend on knowing the intricate relationship details of the class in question. Now I break not only if my direct collaborators change, but also if some inheritance hierarchy or interface hierarchy I’m not even aware of changes.

The reason I’m posting all of this isn’t to suggest that casting should never happen. Clearly sometimes it’s necessary, particularly if it’s forced on you by some API or framework. My hope though is that you’ll look at it with more suspicion — as a “red flag”, in the words of Eric Lippert. Are you casting because it’s forced on you by external factors, or are you casting to communicate with the compiler? Because if it’s the latter, there are other, better ways to achieve the desired effect that will leave your code more elegant, understandable, and maintainable.

By the way, if you liked this post and you're new here, check out this page as a good place to start for more content that you might enjoy.