“Classworthiness” and Poor Cohesion
Finding Seams in Code
The other day, someone asked me about how to “get at” certain things that he wanted to unit test in a class that he was writing. That is, he had some code that he wanted to extract from a public method and test, but his only solution was to expose the new helper method as public in order to do this. My response to this strategy is that it doesn’t make sense to alter the public interface of your class to accommodate testing. Your class has some ‘natural’ set of inputs and outputs, and those are what you want to test – elevating visibility, reflection, and other trickery is not the way to test the class, since that’s not how it will be used, eventually.
When asked what I would do instead, I replied that for simple helper methods, I just let the tests of their callers stand. For more complex ones, I pull functionality out into a new class and give the old class a private reference to the new class. This is where I got an interesting response — he didn’t think that there was “enough” to justify a new class.
I asked what constituted “enough”, and prefaced the question by saying that his was a common and understandable sentiment. Over the course of time, I’ve grown used to people having certain heuristics for when code is “classworthy”. Perhaps if it’s somewhere between 3 methods and 20 methods. Or, maybe it’s a matter of lines of code — 50 lines is too little, but 1000 is too many. With enough of these heuristics in place, we could say that all methods should be 20 lines and all classes should be 10 methods and therefore 200 lines.
His response was what people’s response always seems to be (paraphrased): I can’t describe “classworthiness”, but I know it when I see it, and this isn’t it. I countered by offering a different kind of heuristic for “classworthiness”, but I’ll get to that later.
Cleaving Classes in Two
The next day, I found myself looking at the following code structure:
public class MyClass : IDoSomething, IDoSomethingUnrelated
{
#region IDoSoemthingStuff
//IDoSomething fields
//IDoSomething properties
//IDoSomething methods
#endregion
#region IDoSomethingUnrelated
//IDoSomethingUnrelated fields
//IDoSomethingUnrelated properties
//IDoSomethingUnrelated methods
#endregion
}
What we had here was two classes inexplicably crammed into a single class. But, that wasn’t what was concerning me (though I wasn’t thrilled about it). My task was to add the IDoSomethingUnrelated interface implementation to the following:
public class MySecondClass : IDoSomething
{
#region IDoSoemthingStuff
//IDoSomething fields
//IDoSomething properties
//IDoSomething methods
#endregion
}
My first thought was the same thing that I hope yours is. Since there were two completely separate concerns in the first class, I would simply extract one of them to its own class and take it from there. Both classes still had to implement both interfaces (this was a non-negotiable constraint imposed on me), but at least I wouldn’t need to duplicate code, and I could make the internals of the class cleaner. The Boy Scout Rule for clean coding is an excellent one to follow.
However, there was a snag. Over the course of time, a handful of the fields had bled between the two classes masquerading as one. Not enough to give you the sense that it was anything but an accident and certainly not enough to make the concerns even remotely cohesive. It appeared to have happened by accident – expediency, one flag serving two purposes, laziness, that kind of thing.
As I was scowling at this and realizing that my exorcism of the possessed class was going to be more involved than I thought, a relationship between this task and my earlier conversation struck me.
A Root Cause of Poor Cohesion
So, how did this state of affairs come to be? Surely nobody looked at this code and thought, as they added the most recent handful of lines, “this is clean and awesome — go me!” Instead, somebody probably thought, “yuck, this is a pretty big class, but I’ll hold my nose and pile on just a touch” (the class was in the 1500 line neighborhood). Was I the first boy scout to get here and say “enough!” Apparently, I was, or it wouldn’t have looked like that.
But, surely, someone else must have considered it. My guess is that they considered extracting a class, but that they decided against it when they found, as I had, that the mess wasn’t quite as easy to detangle as it appeared. This class had become just cohesive enough that refactoring was a pain, but not cohesive enough to be considered anything other than “almost completely lacking cohesion”.
It wasn’t always like that. At some point, this thing was easily separable (if at no other time, at least when it was mashed together initially). But, between then and now, regioning was created, and fields/flags were carelessly and haphazardly re-appropriated depending on context. So, the real question is why someone didn’t see this as two separate classes right from the get-go. I traced the history of it back some and saw that when the two-classes-as-one paradigm was introduced, the class was relatively small in size compared to the rest of the code base. It has since grown steadily.
To think of it another way, when the class was first conceived, neither of its main responsibility was “classworthy” on its own. Not enough methods in the interfaces, or not enough lines of code, or something. Better to reduce the class footprint and combine them, and we can always separate them when the chimera gets too big. Of course, the best laid plans of mice and men… the person who thought that was probably not the same person that started introducing unnecessary cross-coupling, and neither of those people was me, looking at this class thinking that it should be two classes, and realizing that performing the separation surgery wasn’t going to be as easy as one might think.
Getting “Classworthy” Wrong
To return to my conversation about when to break out a new class, I responded that, to me, “classworthiness” had nothing to do with the number of methods, properties or fields, nothing to do with too many or too few lines of code, and nothing to do with how many classes exist in the solution at the moment. These are all red herrings as far as the decision as to what a class should be. A concept is “classworthy” to me if it encapsulates behavior and (usually) state, does one thing and does that thing well, and has only one reason to change (see Single Responsibility Princple).
If you read about the SRP, you’ll notice that there’s nothing in there about how many lines of code or how many methods are in the class. Don’t get me wrong — I’m very leery of classes that are 500+ lines of code and have 20+ methods, but this is a symptom rather than the illness. Those classes are (usually) bad not because of any number, but because something with that much code and that many methods generally isn’t focused and cohesive — it’s generally not doing one thing and doing it well. It’s a Swiss Army Knife Class.
(Image from Derick Bailey at Los Techies).
I’ve never really understood the reluctance to create a new class. It’s as if they were a limited resource like oil, and people want to conserve them. This isn’t limited to the person that I was talking to — it’s fairly pervasive, in my experience. I do understand that you can take my sentiment to an absurd extreme and create a class for every line of code or some such thing (though this would result in different metrics problems by raising coupling), but that isn’t what I’m talking about. I’m talking about reluctance to factor out a class because it will only have 70 lines of code or it will only have two methods.
If you find yourself thinking this way, pull back for a minute and ask yourself whether the code you’re considering pulling out is a separate behavior or not. If you have a class that receives data transfer objects from the GUI and validates them, the “and” tells you that you are, in fact, doing two separate things. Perhaps the validation logic just checks two fields to see if they’re empty or not. Is it really worth a separate class for that?
Yes! Why not decouple validation from staging? You’re not going to deplete the world’s class reserves with that one class creation. But, what you will get is a scheme where the validation of your object is decoupled from the staging of that object and where, if one of those things changes, the other will not be affected. And, to tie this all back together, you also get a nice, new seam for unit testing.
[…] desire to use polymorphism at all levels, even when the object I’m making doesn’t seem “classworthy”. That is, why would I create a “CardSuit” enum when I can just create a first class […]
I must admit that I have had the tendency to create a lot of classes with way too many “ands”. I feel very liberated now that I’ve read this post (and your blog in general–your wisdom and points of view have radically changed my way of doing things).
Glad to hear it and that the blog is helpful for you. I appreciate the feedback!