Recall, Retrieval, and the Scientific Method

In my series on building a Chess game using TDD I’ve defined a value type called BoardCoordinate that I introduced instead of passing around X and Y coordinate integer primitives everywhere. It’s a simple enough construct:

```public struct BoardCoordinate
{
public int X { get { return _x; } }

public int Y { get { return _y; } }

public BoardCoordinate(int x, int y)
{
_x = x;
_y = y;
}

public bool IsCoordinateValidForBoardSize(int boardSize)
{
return IsDimensionValidForBoardSize(X, boardSize) && IsDimensionValidForBoardSize(Y, boardSize);
}

private static bool IsDimensionValidForBoardSize(int dimensionValue, int boardSize)
{
return dimensionValue > 0 && dimensionValue <= boardSize;
}
}
```

This was a win early on the series to get me away from a trend toward Primitive Obsession, and I haven't really revisited it since. However, I've found myself in the series starting to think that I want a semantically intuitive way to express equality among BoardCoordinates. Here's why:

```[TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
public void Returns_1_2_For_1_1()
{
Assert.IsTrue(MovesFrom11.Any(bc => bc.X == 1 && bc.Y == 2));
}

[TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
public void Returns_2_2_For_1_1()
{
Assert.IsTrue(MovesFrom11.Any(bc => bc.X == 2 && bc.Y == 2));
}

[TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
public void Returns_3_3_For_1_1()
{
Assert.IsTrue(MovesFrom11.Any(bc => bc.X == 3 && bc.Y == 3));
}

[TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
public void Does_Not_Return_0_0_From_1_1()
{
Assert.IsFalse(MovesFrom11.Any(bc => bc.X == 0 || bc.Y == 0));
}
```

This is a series of unit tests of the "Queen" class that represents, not surprisingly, the Queen piece in chess. The definition of "MovesFrom11" is elided, but it's a collection of BoardCoordinate that represents the possible moves a queen has from piece 1, 1 on the chess board.

This series of tests was my TDD footprint for driving the functionality of determining the queen's moves. So, I started out saying that she should be able to move from (1,1) to (1,2), then had her also able to move to (2,2), etc. If you read the test, what I'm doing is saying that this collection of BoardCoordinates to which she can move should have in it one that has X coordinate of 1 and Y coordinate of 2, for instance.

What I don't like here and am making mental note to change is this "and". That's not as clear as it could be. I don't want to say, "there should be a coordinate in this collection with X property of such and such and Y property of such and such." I want to say, "the collection should contain this coordinate." This may seem like a small semantic difference, but I value readability to the utmost. And readability is a journey, not a destination -- the more you practice it, the more naturally you'll write readable code. So, I never let my foot off the gas.

During the course of the series, this nagging readability hiccup has caused me to note and refer to a TODO of implementing some kind of concept of equals. In the latest post, Sten asks in the comments, referring to my desire to implement equals, "isn't that unnecessary since structs that doesn't contain reference type members does a byte-by-byte in-memory comparison as default Equals implementation?" It is this question I'd like to address here in this post.

Not directly, mind you, because the assessment is absolutely spot on. According to
MSDN:

If none of the fields of the current instance and obj are reference types, the Equals method performs a byte-by-byte comparison of the two objects in memory. Otherwise, it uses reflection to compare the corresponding fields of obj and this instance.

So, the actual answer to that question is simply, "yes," with nothing more to say about it. But I want to provide my answer to that question as it occurred to me off the cuff. I'm a TDD practitioner and a C# veteran, for context.

My answer, when I read the question was, "I don't remember what the default behavior of Equals is for value types -- I have to look that up." What surprised me wasn't my lack of knowledge on this subject (I don't find myself using value types very often), but rather my lack of any feeling that I should have known that. I mean, C# has been my main language for the last 4 years, and I've worked with it for more years than that besides. Surely, I just failed some hypothetical job interview somewhere, with a cabal of senior developers reviewing my quiz answers and saying, "for shame, he doesn't even know the default Equals behavior for value types." I'd be laughed off of stack overflow's C# section, to be certain.

And yet, I don't really care that I don't know that (of course, now I do know the answer, but you get what I'm saying). I find myself having an attitude of "I'll figure things out when I need to know them, and hopefully I'll remember them." Pursuing encyclopedic knowledge of a language's behavior doesn't much interest me, particularly since those goalposts may move, or I may wind up coding in an entirely different language next month. But there's something deeper going on here because I don't care now, but that wasn't always true -- I used to.

The Scientific Method

When I began to think back on this, I think the drop off in valuing this type of knowledge correlated with my adoption of TDD. It then became obvious to me why my attitude had changed. One of the more subtle value propositions of TDD is that it basically turns your programming into an exercise in the Scientific Method with extremely rapid feedback. Think of what TDD has you doing. You look at the code and think something along the lines of, "I want it to do X, but it doesn't -- why not?" You then write a test that fails. Next, you look at the code and hypothesize about what would make it pass. You then do that (experimentation) and see if your test goes green (testing). Afterward, you conduct analysis (do other tests pass, do you want to refactor, etc).

Now you're probably thinking (and correctly) that this isn't unique to TDD. I mean, if you write no unit tests ever, you still presumably write code for a while and then fire up the application to see if it's doing what you hypothesized that it would while writing it. Same thing, right?

Well, no, I'd argue. With TDD, the feedback loop is tight and the experiments are more controlled and, more importantly, isolated. When you fire up the GUI to check things out after 10 minutes of coding, you've doubtless economized by making a number of changes. When you see a test go green in TDD, you've made only one specific, focused change. The modify and verify application behavior method has too many simultaneous variables to be scientific in approach.

Okay, fine, but what does this have to do with whether or not I value encyclopedic language knowledge? That's a question with a slightly more nuanced answer. After years of programming according to this mini-scientific method, what's happened is that I've devalued anything but "proof is in the pudding" without even realizing it. In other words, I sort of think to myself, "none of us really knows the answer until there's a green test proving it to all of us." So, my proud answer to questions like, "wouldn't it work to use the default equals method for value types" has become, "dunno for certain, let's write a test and see."

False Certainty

Why proud? Well, I'll tell you a brief story about a user group I attended a while back. The presenter was doing a demonstration on Linq, closures, and deferred execution and he made the presentation interactive. He'd show us methods that exposed subtle, lesser known behaviors of the language in this context and the (well made) point was that these things were complex and trying to get the answers right was humbling.

It's generally knowledgeable people that attend user groups and often even more knowledgeable people that brave the crowd to go out on a limb and answer questions. So, pretty smart C# experts were shouting out their answers to "what will this method return" and they were getting it completely wrong because it was hard and it required too much knowledge of too many edge cases in too short a period of time. A friend of mine said something like, "man, I don't know -- slap a unit test on it and see." And... he's absolutely right, in my opinion. We're not language authors, much less compilers and runtimes, and thus the most expedient answer to the question comes not from applying amassed language knowledge but from experimentation.

Think now of the world of programming over the last 50 years. In times where compiles and executions were extremely costly or lengthy, you needed to be quite sure that you got everything right ahead of time. And doing so required careful analysis that could only be done well with a lot of knowledge. Without prodigious knowledge of the libraries and languages you were using, you would struggle mightily. But that's really no longer true. We're living in an age of abundant hardware power and lightning fast feedback where knowing where to get the answers quickly and accurately is more valuable than knowing them. It's like we've been given the math textbook with the answers in the back and the only thing that matters is coming up with the answers. Yeah, it's great that you're enough of a hotshot to get 95% of the answers right by hand, but guess what -- I can get 100% of them right and much, much faster than you can. And if the need to solve new problems arises, it's still entirely possible for me to work out a good way to do it by using the answer to deduce how the calculation process works.

Caveats

In the course of writing this, I can think of two valid objections/comments that people might have critiquing what I'm saying, so I'd like to address them. First of all, I'm not saying that you should write production unit tests to answer questions about how the framework/language works. Unit testing the libraries and languages that you use is an anti-pattern. I'm talking about writing tests to see how your code will behave as it uses the frameworks and languages. (Although, a written and then deleted unit test is a great, fast-feedback way to clarify language behavior to yourself.)

Secondly, I'm not devaluing knowledge of the language/framework nor am I taking pride in my ignorance of it. I didn't know how the default Equals behavior worked for value types yesterday and today I do. That's an improvement. The reason it's an improvement is that the knowledge is now stored in a more responsive cache. I maintain having the knowledge is trumped by knowing how to acquire it, and I look at reaching into my own personal memory stores as like having it in a CPU cache versus the memory of writing a quick test to see versus the disk space location of looking it up on the internet or asking a friend.

The more knowledge you have of the way the languages and frameworks you use work, the less time you'll have to sink into proving behaviors to yourself, so that's clearly a win. To continue the metaphor, what I'm saying is that there's no value or sense in going out preemptively and loading as much as you can from disk into the CPU cache so that you can show others that it's there. In our world, memory and disk lookups are just no longer expensive enough to make that desirable.

Inline Feedbacks
sten
9 years ago

Nice post! To be honest I did not know about the default Equals behavior for value types until just recently when I was doing some operator overloading (==) and sort of stumbled upon it. There are many tutorials out there that overrides Equals for structures. I am pro-overriding-Equals for 2 reasons: #1 I want to be 100% clear with how I compare values for my types and I want it to show in code. Not many people know about the default Equals behavior so why not eliminate the “magic”? #2 MSDN states that overriding Equals and implementing an Equals(specificType) increases… Read more »

9 years ago

If memory serves, I think that string is an immutable reference type, by framework implementation. I agree on the desire for expressing intention explicitly when reasonable. I think this goes along the lines of “principle of least surprise.” If you’re defining small, data-oriented value types, it makes sense to express what you mean when you say that two of them are “equal” rather than relying on what the framework means. I do remember that things can get a little dicey if you don’t also override GetHashCode() along with equality, because of the way that using it as an index in… Read more »

[…] Recall, Retrieval, and the Scientific Method – Erik Dietrich […]

Medo
9 years ago

Good argument, but I still think there is a lot of value in deep knowledge of the language you are working with. I made the mistake in one of my jobs to basically learn a language only by basic tutorials and looking at the code that was already there, and that I had to build on. I would only look up enough to get the job done. Months later, I learned about features of the language that I never suspected, and that would have made my life easier and my code cleaner in many places. So, knowing more about the… Read more »

9 years ago

You bring up the logical counter-point to a lot of what I’m saying, which is that, while knowledge for the sake of oneupsmanship is silly, deep knowledge of a language or framework will probably lead to more elegant solutions, all things being equal. It will certainly lead to more idiomatic ones, and I’m a big proponent of seeking to become idiomatic in languages that I use. I think, perhaps, that the line to walk lies in continuously asking, “is there a better/more elegant way to do what I’m doing.” By going this route of never being satisfied with your solutions,… Read more »

Cameron
9 years ago

This, this and this. It’s important to know basic language concepts, but what I like most about this approach is that it boils the question down to a “does it work / does it not work” response.

9 years ago