DaedTech

Stories about Software

By

Subtle Naming and Declaration Violations of DRY

It’s pretty likely that, as a software developer that reads blogs about software, you’ve heard of the DRY Principle. In case you haven’t, DRY stands for “Don’t Repeat Yourself,” and the gist of it is that there should only be one source of a given piece of information in a system. You’re most likely to hear about this principle in the context of copy-and-paste programming or other, similar rookie programming mistakes, but it also crops up in more subtle ways. I’d like to address a few of those today, and, more specifically, I’m going to talk about violations of DRY in naming things while coding.

DRY

Type Name in the Instance Name

Since the toppling of the evil Hungarian regime from the world of code notation, people rarely do things like naming arrays of integers “intptr_myArray,” but more subdued forms of this practice still exist. You often see them appear in server-templated markup. For instance, how many codebases contain text box tags with names/ids like “CustomerTextBox?” In my experience, tons.

What’s wrong with this? Well, the same thing that’s wrong with declaring an integer by saying “int customerCountInteger = 6.” In static typing schemes, the compiler can do a fine job of keeping track of data types, and the IDE can do a fine job, at any point, of showing you what type something is. Neither of these things nor anyone maintaining your code needs any help in identifying what the type of the thing in question is. So, you’ve included redundant information to no benefit.

If it comes time to change the data type, things get even worse. The best case scenario is that maintainers do twice the work, diligently changing the type and name of the variable. The worst case scenario is that they change the type only and the name of the variable now actively lies about what it points to. Save your maintenance programmers headaches and try to avoid this sort of thing. If you’re having trouble telling at a glance what the datatype of something is, download a plugin or a productivity tool for your IDE or even write one yourself. There are plenty of options out there without baking redundant and eventually misleading information into your code.

Inheritance Structure Baked Into Type Names

In situations where object inheritance is used, it can be temping to name types according to where they appear in the hierarchy. For instance, you might define a base class named BaseFoo and then a child of that named SpecificFoo, and a child of that named EvenMoreSpecificFoo. So EvenMoreSpecificFoo : SpecificFoo : BaseFoo. But what happens if during a refactor cycle you decide to break the inheritance hierarchy or rework things a bit? Well, there’s a good chance you’re in the same awkward position as the variable renaming in the last section.

Generally you’ll want inheritance schemes to express “is a” relationships. For instance, you might have Sedan : Car : Vehicle as your grandchild : child : parent relationship. Notice that what you don’t have is SedanCarVehicle : CarVehicle : Vehicle. Why would you? Everyone understands these objects and how they relate to one another. If you find yourself needing to remind yourself and maintainers of that relationship, there’s a good chance that you’d be better off using interfaces and composition rather than inheritance.

Obviously there are some exceptions to this concept. A SixCylinderEngine class might reasonably inherit from Engine and you might have a LoggingFooRetrievalService that does nothing but wrap the FooRetrievalService methods with calls to a logger. But it’s definitely worth maintaining an awareness as to whether you’re giving these things the names that you are because those are the best names and/or the extra coupling is appropriate or whether you’re codifying the relationship into the names to demystify your design.

Explicit Typing in C#

This one may spark a bit of outrage, but there’s no denying that the availability of the var keyword creates a situation where having the code “Foo foo = new Foo()” isn’t DRY. If you practice TDD and find yourself doing a lot of directed or exploratory refactoring, explicit typing becomes a real drag. If I want to generalize some type reference to an interface reference, I have to do it and then track down the compiler errors for its declarations. With implicit typing, I can just generalize and keep going.

I do recognize that this is a matter of opinion when it comes to readability and that some developers are haunted by the variant in VB6 or are reminded of dynamic typing in Javascript, but there’s really no arguing that this is technically a needless redundancy. For the readability concern, my advice would be to focus on writing code where you don’t need the crutch of specific type reminders inline. For the bad memories of other languages concern, I’d suggest trying to be more idiomatic with languages that you use.

Including Namespace in Declarations

A thing I’ve seen done from time to time is fully qualifying types as return values, parameters, or locals. This usually seems to occur when some automating framework or IDE goody does it for speculative disambiguation in scoping. (In other words, it doesn’t know what namespaces you’ll have so it fully qualifies the type during code generation to minimize the chance of potential namespace collisions.) What’s wrong with that? You’re preemptively avoiding naming problems and making your dependencies very obvious (one might say beating readers over the head with them).

Well, one (such as me) might argue that you could avoid namespace collisions just as easily with good naming conventions and organization and without a DRY violation in your code. If you’re fully scoping all of your types every time you use them, you’re repeating that information everywhere in a file that you use the type when just once with an include/using/import statement at the top would suffice. What happens if you have some very oft-used type in your database and you decide to move it up a level of namespace? A lot more pain if you’ve needlessly repeated that information all over the place. Perhaps enough extra pain to make you live with a bad organization rather than trying to fix it.

Does It Matter?

These all probably seem fairly nit-picky, and I wouldn’t really dispute it for any given instance of one or even for the practices themselves across a codebase. But practices like these are death by a thousand cuts to the maintainability of a code base. The more you work on fast feedback loops, tight development cycles, and getting yourself in the flow of programming, the more that you notice little things like these serving as the record skip in the music of your programming.

When NCrunch has just gone green, I’m entering the refactor portion of red-green-refactor, and I decide to change the type of a variable or the relationship between two types, you know what I don’t want to do? Stop my thought process related to reasoning about code, start wondering if the names of things are in sync with their types, and then do awkward find-alls in the solution to check and make sure I’m keeping duplicate information consistent. I don’t want to do that because it’s an unwelcome and needless context shift. It wastes my time and makes me less efficient.

You don’t go fast by typing fast. You don’t go fast, as Uncle Bob often points out, by making a mess (i.e. deciding not to write tests and clean code). You really don’t go fast by duplicating things. You go fast by eliminating all of the noise in all forms that stands between you and managing the domain concepts, business logic, and dependencies in your application. Redundant variable name changes, type swapping, and namespace declaring are all voices that contribute to that noise that you want to eliminate.

By

How to Make Your Code More Readable

When I was in high school, I had a tendency to procrastinate when it came to assignments and, well, pretty much everything in life. I think that this is sort of a staple of the teenage condition. As a result, I’d often stay up late writing some paper the night before it was due and turn it in.

Working my way through life, I became less of a procrastinator and started to complete tasks on or ahead of schedule and without heroic all-night efforts at the 11th hour. This gave rise to a situation in which I’d frequently have something finished but not turned in for days or even weeks. And I discovered an interesting consequence of this situation–re-reading a paper or presentation with some time had elapsed allowed me to read the paper almost as if someone else had written it. To put it a more computer-science-y way, it nudged me off of the happy path of thinking “of course all of this phrasing makes sense exactly as is, or I wouldn’t have written it.”

Of late, I’ve been steadily working two or three projects and dividing my time up on a per-day basis to still allow myself to get in “the flow.” That is, I spend Monday on project A, Tuesday on project B, Wednesday back on A, and Thursday on C, for instance. The actual number of days and motivation for switching varies, but you get the idea. And like my improved work ethic in school as I grew up, I’ve noticed an interesting consequence: more readable code.

What seems to happen is that I get into a sweet spot between unconscious familiarity and complete lack of insight with the code. Enough time has elapsed since I last looked at it that I don’t simply take it for granted and see without looking, but not enough time has elapsed that I have no idea what’s going on. To illustrate what I mean, consider the following gist (modified from actual code that I was writing for the sake of anonymizing):

public class SomeClass
{
    private readonly List _cusotomers = new List();

    public void AddToQueue(CustomerModel cm)
    {
        if (cm.IsActive && !_cusotomers.Any(c => c.Name == cm.Name))
            _cusotomers.Add(new Customer() { Name = cm.Name });
    }
}

As I write code these days, I rely increasingly on various refactorings during the course of TDD: extract method, extract local variable, add to new class, declare type, etc. In this case, during a refactor cycle, I had extracted the method “AddToQueue” from another method where the “CustomerModel” argument had been a very tightly scoped local variable, such as the “c” variable in the “Any()” call. The result is a method that’s nicely short in scope, but suffers from some naming and semantics that tend to hide a bit about what this method is actually doing. If I’m going to put my money where my mouth is in my various posts deriding comments, I need to do better when it comes to making code self-documenting.

What could be improved here? Well, for starters, the name of the method is a little misleading since the result isn’t always an add. Secondly, what is “cm”? And, if you’ll notice, there’s a typo in the “customers” field–not particularly important, but sloppy. In short, this could use a bit of tightening up–a bit of editing for readability, if you will. How about this:

public class SomeClass
{
    private readonly List _customers = new List();

    public void AddCustomerForModelIfValid(CustomerModel modelOfCustomer)
    {
        if (IsModelValidForAdding(modelOfCustomer))
            AddCustomerForModel(modelOfCustomer);
    }

    private bool IsModelValidForAdding(CustomerModel modelOfCustomer)
    {
        return modelOfCustomer.IsActive && !_customers.Any(c => c.Name == modelOfCustomer.Name);
    }

    private void AddCustomerForModel(CustomerModel modelOfCustomer)
    {
        var customerBasedOnModel = new Customer()
        {
            Name = modelOfCustomer.Name
        };
        _customers.Add(customerBasedOnModel);
    }
}

I don’t know about you, but I’d say the latter version is a lot clearer. In fact, I think it’s so clear with the naming of variables and methods that adding comments would just be awkward. While the first one wasn’t terrible (it’s hard to be too confusing with methods that short), it definitely left something to be desired in the “self-documenting” category. The latter, I’d say, gets there.

The “sweet spot” that I mentioned earlier is the best time to make that kind of thing happen. It’s a little cumbersome in the middle of a refactoring to stop and say, “okay, let’s focus on naming and spelling.” That’s sure to interrupt your flow. But let that code sit for, say, four to seven days, and then give it another look. If you find yourself mumbling things like “what’s cm… oh, that’s right,” then you can be pretty sure others will mumble that as well, but without the “that’s right.” Don’t let that moment slip away–it’s an opportunity that only happens when you read your code a few days after the fact. Use it to fix the code right then and there when you’re reading for understanding instead of creating. You won’t regret it, and maintainers of your code will thank you.

By

A Group Interview With Some OOP Compilers

I started this as an aside to my post about source files as harmful, but axed it as sort of awkward and too much of a digression. Like a weird sweater that you get for your birthday, though, I couldn’t bring myself to toss it, so I saved it as its own draft. Good thing too because it turns out that my surplus of ready drafts has run dry, and I have nothing else to publish. So, without further ado, please enjoy an admittedly weird bit of Friday humor.

Interviewer:     Thanks for joining me, guys. Today I have with me, in alphabetical order, C# C++ and Java Compilers.
C++ Compiler:     How does a pound come before a plus?
Java Compiler:    (Guffaws)
C# Compiler:     Don’t hate.

Interviewer:     Because it has fewer characters…? Ok, amend that to in no particular order.
C++ Compiler:     That’s better.
Java Compiler:    I was last and I didn’t complain.
C# Compiler:     You’re a good citizen.

Interviewer:     So let’s get right to it. Should methods go above variable declarations or below?
C# Compiler:     It doesn’t matter to me.
Java Compiler:    Me neither.
C++ Compiler:     Same here.

Interviewer:     Okay, but surely you have some preference.
(Pause)
C# Compiler:     I mean, our users seem to get pretty worked up about it, but we really don’t care.

Interviewer:     What about declaring variables all at the beginning of a method or just declaring them where you need them?
Java Compiler:    Up to the user, but I prefer the latter because users that do this have less of a tendency to get angry at me due to their mistakes cropping up at runtime.
C++ Compiler:     Well, I don’t care these days, but that issue has some history for me.
C# Compiler:      History?
C++ Compiler:     Yeah… I don’t care, but my dad gets pretty worked up–
(Muffled voice from the other room): I heard that! You kids with your multi-pass compiling. Back in my day, we only needed one pass to compile and we we liked it!
C# Compiler:     lol…C
C++ Compiler:     Dude, not cool–that’s my dad.

Interviewer:     So what about having multiple classes defined in a single file?
C# Compiler:     Whatever, I don’t care.
Java Compiler:    Not allowed. I don’t like this unless they’re nested or private.
C# Compiler:     That’s pretty anal of you.
Java Compiler:    I have standards, which is probably why my users don’t use awful variable names like lpstr_theThing.
C# Compiler:     Yeah, the kind of standards that are responsible for checked exceptions.
Java Compiler:    Wow, ouch. That was a pretty low blow.
C# Compiler:     I’m sorry, that was uncalled for.

Interviewer:     How about Camel casing versus Pascal casing? Underscores in method names? Underscores in front of fields? How methods and such are ordered?
Java Compiler:    We don’t care about any of those things.
C++ Compiler:     Yeah, as long as you don’t use illegal tokens that would confuse us, we really don’t care what you call things.

Interviewer:     So why do people spend so much time arguing about which way is the right way?
C# Compiler:     I can interpret millions of lines of code, but that’s beyond me.

Interviewer:     Is it possible for those arguments to be productive?
Java Compiler:    No —
C++ Compiler:     Look what they did to poor C#!
C# Compiler:     What’s that supposed to mean?
C++ Compiler:     Regions, that’s what. I mean, dude, really?
Java Compiler:    Yeah, I don’t mean to be rude, but that’s kind of gross.
C# Compiler:     Yeah… I feel like the closet that a kid crams all of his stuff into when he’s supposed to clean his room.

Interviewer:     How so?
C# Compiler:     So I provide a language feature that I thought users would like, and they do. It lets them hide code they don’t want to look at, but I didn’t really anticipate all the consequences of that…

Interviewer:     Meaning what, exactly?
(Pause)
Java Compiler:    (Chuckles)
C# Compiler:     Well, meaning people stuff tons of terrible code into my regions and then point at how nice and clean things look, like the kid with his room. He didn’t clean the room — he just crammed everything in the closet.
C++ Compiler:     I don’t know if I have anything that egregious, but it does seem that we do a lot of weird things and field a lot of weird work-arounds because of the file-based nature of code storage and interaction.

Interviewer: So would you say that the continued use of files as the principle means for storing code is a problem?
C# Compiler: I don’t think it matters how the code is stored, but rather how users interact with it.
Java Compiler: Yeah, I mean the way the actual text of the code is stored is an implementation detail, but it doubles as data and user interface, even in sophisticated IDEs like Eclipse.
C# Compiler: (Snorts)
Java Compiler: Oh, sorry – I forgot that someone here likes to be invoked from IDEs that require no less than 12 gigs of memory.
C++ Compiler: You guys sicken me.

Interviewer: What do you think the future of source code is?
C++ Compiler: Separation of the storage details from the presentation to the user.
Java Compiler: Yeah, we’ve beaten around the bush long enough: syntax highlighting, quick navigation, automated refactoring–
C++ Compiler: Most IDEs and productivity tools do everything they can to hide the fact that you’re editing a file, like MS Word hides the fact that you’re more or less editing text.
C# Compiler: It might get more complicated with certain advanced language features: closures, lambda expressions–
Java Compiler: We’ve got those things too now, buddy. Quit showing off.
C# Compiler: What?
C++ Compiler: Those things don’t make it any more complicated — you just wanted to bring them up.
C# Compiler: They create scoping challenges! If you’re storing hierarchical structures of code elements, that stuff matters!
Java Compiler: Sure there are complexities with all kinds of features, but those aren’t significant barriers to the effort. This really should go forward.

Interviewer: Well, that’s about all of the time that we have, so I’d like to thank our interviewees. For their time. Gentle-compilers, always a pleasure.

By

The Way We Write Code is Stupid: Source Code Files Considered Harmful

Order Doesn’t Matter

Please pardon the loaded phrasing in the title, but that’s how the message came to me from my subconscious brain: bluntly and without ceremony. I was doing a bit of work in Apex, the object-oriented language specific to Salesforce.com, and it occurred to me that I had no idea what idiomatic Apex looked like. (I still don’t.) In C++, the convention (last time I was using it much, anyway) is to first define public members in class headers and then the private members at the bottom. In C#, this is inverted. I’ve seen arguments of all sorts as to which approach is better and why. Declaring them at the top makes sense since the first thing you want to see in the class is what its state will consist of, but declaring the public stuff at the top makes sense since that’s what consumers will interact with and it’s like the above-water part of your code iceberg.

When programming in any of the various programming languages I know, I have this mental cache of what’s preferred in what language. I attempt to ‘speak’ it without an accent. But with Apex, I have no idea what the natives ‘sound’ like, not having seen it in use before. Do I declare instance variables at the bottom or the top? Which is the right way to eat bread: butter side up or butter side down? I started googling to see what the ‘best practice’ was for Apex when the buzzing in my subconscious reached some kind of protesting critical mass and morphed into a loud, clear message: “this is completely stupid.”

I went home for the day at that point–it was late anyway–and wondered what had prompted this visceral objection. I mean, it obviously didn’t matter from a compiled code perspective whether instance variables or public methods come first, but it’s pretty well established and encouraged by people as accomplished and prominent as “Uncle” Bob Martin that consistency of source code layout matters, if not the layout specifics (paraphrased from my memory of his video series on Clean Coders). I get it. You don’t want members of your team writing code that looks completely different from class to class because that creates maintenance headaches and obscures understanding. So what was my problem?

I didn’t know until the next morning in the shower, where I seem to do my most abstract thinking. I didn’t think it was stupid to try to make my Apex code look like ‘standard’ Apex. I thought it was stupid that I needed to do so at all. I thought it was stupid to waste any time thinking about how to order code elements in this file when the only one whose opinion really matters–the compiler–says, “I don’t care.” Your compiler is trying to tell you something. Order doesn’t matter to it, and you shouldn’t care either.

Use Cases: What OOP Developers Want

But the scope of my sudden, towering indignation wasn’t limited to the fact that I shouldn’t have to care about the order of methods and fields. I also shouldn’t have to care about camel or Pascal casing. I shouldn’t have to care about underscores in front of field names or inside of method names. It shouldn’t matter to me if public methods come before private or how much indentation is the right amount of indentation. Should methods be alphabetized or should they be in some other order? I don’t care! I don’t care about any of this.

Let’s get a little more orderly about this. Here are some questions that I ask frequently when I’m writing source code in an OOP language:

  • What is the public API of this type?
  • What private methods are in the ‘tree’ of this public method?
  • What methods of this type mutate or reference this field?
  • What are the types in this namespace?
  • What are the implementations of this interface in this code base?
  • Let’s see this method and any methods that it overrides.
  • What calls this method?

Here are some questions that I never ask out of actual interest when writing source code.  These I either don’t ask at all or ask in exasperation:

  • What’s the next method in this file?
  • How many line feed characters come before the declaration of this variable?
  • Should I use tabs or spaces?
  • In what region is this field’s declaration?
  • Did the author of this file alphabetize anything in it?
  • Does this source file have Windows or *NIX line break characters?
  • Is this a field or a method or what?

With the first set of questions, I ask them because they’re pieces of information that I want while reasoning about code.  With the second set of questions, they’re things I don’t care about.  I view asking these questions as an annoyance or failure.  Do you notice a common pattern?  I certainly do.  All of the questions whose answers interest me are about code constructs and all the ones that I don’t care about have to do with the storage medium for the code: the file.

But there’s more to the equation here than this simple pattern.  Consider the first set of questions again and ask yourself how many of the conventions that we establish and follow are simply ham-fisted attempts to answer them at a glance because the file layout itself is incapable of doing so.  Organizing public and private separately is a work-around to answer the first question, for example.  Regions in C#, games with variable and method naming, “file” vs “type” view, etc. are all attempts to overcome the fact that files are actually really poor communication media for object-oriented concepts.  Even though compilers are an awful lot different now than they were forty years ago, we still cling to the storage medium for source code best suited to those old compilers.

Not Taking our own Advice

If you think of an ‘application’ written in MS Access, what comes to mind?  How about when you open up an ASP web application and find wizard-generated data sources in the markup, or when you open up a desktop application and find SQL queries right in your code behind?  I bet you think “amateurs wrote this.”  You are filled with contempt for the situation–didn’t anyone stop to think about what would happen if data later comes in some different form?  And what about some kind of validation?  And, what the–ugh… the users are just directly looking at the tables and changing the column order and default sorting every time they look at the data.  Is everyone here daft?  Don’t they realize how ridiculous it is to alter the structure of the actual data store every time someone wants a different ordering of the data?

OldYoungAnd you should see some of the crazy work-arounds and process hacks they have in place. They actually have a scheme where the database records the name of everyone who opens up a table and makes any kind of change so that they can go ask that person why they did it.  And–get this–they actually have this big document that says what the order of columns in the table should be.  And–you can’t make this stuff up–they fight about it regularly and passionately.  Can you believe the developers that made this system and the suckers that use it? I mean, how backward are they?

In case you hadn’t followed along with my not-so-subtle parallel, I’m pointing out that we work this way ourselves even as we look with scorn upon developers who foist this sort of thing on users and users who tolerate it.  This is like when you finally see both women in the painting for the first time–it’s so clear that you’ll never un-see it again.  Why do we argue about where to put fields and methods and how to order things in code files when we refuse to write code that sends users directly into databases, compelling them to bicker over the order of column definition in the same?  RDBMS (or any persistence store) is not an appropriate abstraction for an end user–any end user–whether he understands the abstraction or not.  We don’t demand that users fight, decide that there is some ‘right’ way to order invoices to be printed, and then lock the Invoice table in place accordingly for all time and pain of shaming for violations of an eighty-page invoice standard guideline document.  So why do that to ourselves?  When we’re creating object-oriented code, sequential files, and all of the particular orderings, traversings and renderings thereof are wildly inappropriate abstractions for us.

What’s the Alternative?

Frankly, I don’t know exactly what the alternative is yet, but I think it’s going to be a weird and fun ride trying to figure that out.  My initial, rudimentary thoughts on the matter are that we should use some sort of scheme in which the Code DOM is serialized to disk for storage purposes.  In other words, the domain model of code is that there is something called Project, and it has a collection of Namespace.  Namespace has a collection of Type, which may be Interface, Enum, Struct, Class (for C# anyway–for other OOP languages, it’s not hard to make this leap).  Class has one collection each of Field, Method, Property, Event.  The exact details aren’t overly important, but do you see the potential here?  We’re creating a hierarchical model of code that could be expressed in nested object or relational format.

In other words, we’re creating a domain model entirely independent of any persistence strategy.  Can it be stored in files?  Sure. Bob’s your uncle.  You can serialize these things however you want.  And it’ll need to be written to file in some form or another for the happiness of the compiler (at least at first).  But those files handed over to the compiler are output transforms rather than the lifeblood of development.

Think for a minute of us programmers as users of a system with a proper domain, one or more persistence models, and a service layer.  Really, stop and mull that over for a moment.  Now, go back to the use cases I mentioned earlier and think what this could mean.  Here are some properties of this system:

  1. The basic unit of interaction is going to be the method, and you can request methods with arbitrary properties, with any filtering and any ordering.
  2.  What appears on your screen will probably be one or more methods (though this would be extremely flexible).
  3. It’s unlikely that you’d ever be interested in “show me everything in this type.”  Why would you?  The only reason we do this now is that editing text files is what we’re accustomed to doing.
  4. Tracing execution paths through code would be much easier and more visual and schemes that look like Java’s “code bubbles” would be trivial to create and manipulate.
  5. Most arguments over code standards simply disappear as users can configure IDE preferences such as “prepend underscores to all field variables you show me,” “show me everything in camel casing,” and, “always sort results in reverse alphabetical order.”
  6. Arbitrary methods from the same or different types could be grouped together in ad-hoc fashion on the screen for analysis or debugging purposes.
  7. Version/change control could occur at the method or even statement level, allowing expression of “let’s see all changes to this method” or “let’s see who made a change to this namespace” rather than “let’s see who changed this file.”
  8. Relying on IDE plugins to “hop” to places in the code automatically for things like “show all references” goes away in favor of an expressive querying syntax ala NDepend’s “code query language.”
  9. New domain model allows baked-in refactoring concepts and makes operations like “get rid of dead code” easier or trivial, in some cases.

Longer Reaching Impact

If things were to go in this direction, I believe that it would have a profound impact not just on development process but also on the character and quality of object oriented code that is written in general.  The inherently sequential nature of files and the way that people reason about file parsing, I believe, lends to or at least favors the dogged persistence of procedural approaches to object oriented programming (static methods, global state, casting, etc.).  I think that the following trends would take shape:

  1. Smaller methods.  If popping up methods one at a time or in small groups becomes the norm, having to scroll to see and understand a method will become an anomaly, and people will optimize to avoid it.
  2. Less complexity in classes.  With code operations subject to a validation of sorts, it’d be fairly easy to incorporate a setting that warns users if they’re adding the tenth or twentieth or whatever method to a class.  In extreme cases, it could even be disallowed (and not through the honor system or ex post facto at review or check in–you couldn’t do it in the first place).
  3. Better conformance to Single Responsibility Principle (SRP).  Eliminating the natural barrier of “I don’t want to add a new file to source control” makes people less likely awkwardly to wedge methods into classes in which they do not belong.
  4. Better cohesion.  It becomes easy to look for fields hardly used in a type or clusters of use within a type that could be separated easily into multiple types.
  5. Better test coverage.  Not only is this a natural consequence of the other items in this list, but it would also be possible to define “meta-data” to allow linking of code items and tests.

What’s Next?

Well, the first things that I need to establish is that this doesn’t already exist somewhere in the works and that I’m not a complete lunatic malcontent.  I’d like to get some feedback on this idea in general.  The people to whom I’ve explained a bit so far seem to find the concept a bit far-fetched but somewhat intriguing.

I’d say the next step, assuming that this passes the sanity check would be perhaps to draw up a white paper discussing some implementation/design strategies with pros and cons in a bit more detail.  There are certainly threats to validity to be worked out such as the specifics of interaction with the compiler, the necessarily radical change to source control approaches, the performance overhead of performing code transforms instead of just reading a file directly into memory, etc.  But off the top of my head, I view these things more as fascinating challenges than problems.

In parallel, I’d like to invite anyone who is at all interested in this idea to drop me an email or send me a tweet.  If there are others that feel the way I do, I think it’d be really cool to get something up on Github and maybe start brainstorming some initial work tasks or exploratory POCs for feasibility studies.  Also feel free to plus-like-tweet-whatever to others if you think they might be interested.

In conclusion I’ll just say that I feel like I’d really like to see this gain traction and that I’d probably ratchet this right to the top of my side projects list if people are interested (this being a bit large in scope for me alone in my spare time).  Now whenever I find myself editing source files in an IDE I feel like a bit of a barbarian, and I really don’t think we shouldn’t have to tolerate this state of affairs anymore.  Productivity tools designed to hide the file nature of our source code from us help, but they’re band-aids when we need disinfectants, antibiotics, and stitches.  I don’t know about you, but I’m ready to start writing my object-oriented code using an IDE paradigm that doesn’t support GOTO Line as if we were banging out QBasic in 1986.

By

Switch Statements are Like Ants

Switch statements are often (and rightfully, in my opinion) considered to be a code smell. A code smell, if you’ll recall, is a superficial characteristic of code that is often indicative of deeper problems. It’s similar in concept to the term “red flag” for interpersonal relationships. A code smell is like someone you’ve just met asking you to help them move and then getting really angry when you don’t agree to do it. This behavior is not necessarily indicative of deep-seated psychological problems, but it frequently is.

Consequently, the notion that switch statements are a code smell indicates that if you see switch statements in code, there’s a pretty good chance that design tradeoffs with decidedly negative consequences have been made. The reason I say this is that switch statements are often used to simulate polymorphism for those not comfortable with it:

public void Attack(Animal animal)
{
    switch (animal.Type)
    {
        case AnimalType.Cat: 
            Console.WriteLine("Scratch"); 
            break;
        case AnimalType.Dog: 
            Console.WriteLine("Bite"); 
            break;
        case AnimalType.Wildebeest: 
            Console.WriteLine("Headbutt"); 
            break;
        case AnimalType.Landshark: 
            Console.WriteLine("Running-bite");
            break;
        case AnimalType.Manticore: 
            Console.WriteLine("Eat people");
            break;
    }
}

Clearly a better design from an OOP perspective would be an Animal base class/interface and an Attack() method on that, overridable by children/implementers. This design has the advantage of requiring less code, scaling better, and conforming to the Open/Closed principle–if you want to add some other animal later, you just add a new class to the code base and probably tweak a factory method somewhere and you’re done.

This method isn’t really that bad, though, compared to how bad it could be. The design is decidedly procedural, but its consequences aren’t far reaching. From an execution perspective, the switch statement is hidden as an implementation detail. If you were to isolate the console (or wrap and inject it), you could unit test this method pretty easily. It’s ugly and it’s a smell, but it’s sort of like seeing an ant crawling in your house–a little icky and potentially indicative of an infestation, but sometimes life deals you lemons.

But what about this:

public string Attack(Animal animal)
{
    switch (animal.Type)
    {
        case AnimalType.Cat:
            return GetAttackFromCatModule();
        case AnimalType.Dog:
            return GetAttackFromDogModule();
        case AnimalType.Wildebeest:
            return GetAttackFromWildebeestModule();
        case AnimalType.Landshark:
            return GetAttackFromLandsharkModule();
        case AnimalType.Manticore:
            return GetAttackFromManticoreModule();
    }
}

Taking the code at face value, this method figures out what the animal’s attack is and returns it, but it does so by invoking a different module for each potential case. As a client of this code, your path of execution can dive into any of five different libraries (and more if this ‘pattern’ is followed for future animals). The fanout here is out of control. Imagine trying to unit test SavageAntthis method or isolate it somehow. Imagine if you need to change something about the logic here. The violence to the code base is profound–you’d be changing execution flow at the module level.

If the first switch state was like an ant in your code house, this one is like an ant with telephone poles for legs, carving a swath of destruction. As with that happening in your house, the best short-term strategy is scrambling to avoid this code and the best long-term strategy is to move to a new application that isn’t a disaster zone.

Please be careful with switch statements that you use. Think of them as ants crawling through your code–ants whose legs can be tiny ant legs or giant tree trunk legs. If they have giant tree trunk legs, then you’d better make sure they’re the entirety of your application–that the “ant” is the brains ala an Ioc container–because those massive swinging legs will level anything that isn’t part of the ant. If they swing tiny legs, then the damage only occurs when they come in droves and infest the application. But either way, it’s helpful to think of switch statements as ants (or millipedes, depending on the number of cases) because this forces you to think of their execution paths as tendrils fanning out through your application and creating couplings and dependencies.

By the way, if you liked this post and you're new here, check out this page as a good place to start for more content that you might enjoy.