Getting Too Cute with C# Yield Return
I ran across a method that returned an IEnumerable<T> recently, and I implicitly typed its return value. During the course of a series of method extractions, code movement, and general refactoring, I wound up with some code that passed the various unit tests in place but failed curiously at runtime. After peering at it for a few minutes and going through once in the debugger, I traced it to a problem that you don’t see every day, and one that probably would have had me tearing my hair out if I didn’t have a good working understanding of what the “yield” keyword in C# does. So today, I’ll present the essence of this problem in the hopes that, if you weren’t aware of it, you are now.
Here is an entire class that contains a nested type and a couple of methods, for illustration purposes. At the bottom is a unit test that will, if you copy this into your scratchpad, fail.
public class MiscTest
{
public class Point
{
public int X { get; set; }
public int Y { get; set; }
}
private IEnumerable GetPoints()
{
for (int index = 1; index < 20; index++)
yield return new Point() { X = index, Y = index * 2 };
}
private void DoubleXValue(IEnumerable points)
{
foreach (var point in points)
point.X *= 2;
}
[TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
public void Asdf()
{
var points = GetPoints();
DoubleXValue(points);
Assert.AreEqual(2, points.ElementAt(0).X);
}
}
It seems pretty straightforward. You have some method that returns a bunch of points, and then you take those points and pass them to a method that iterates through them, performing an operation on each one. So what gives? Why does this fail? Everything looks pretty simple (unlike my situation, where this became removed through a few layers of indirection), and yet we get back 1 when we’re expecting 2.
To understand this, it’s important to understand what yield actually does. At its core, the yield keyword is syntactic sugar that tells the compiler to generate a state machine under the hood. Let that sink in for a moment, because it’s actually kind of a wild concept. You’re used to methods that return references to object instances or primitives or collections, but this is something fundamentally different. A method that returns an IEnumerable and does so using yield return isn’t defining a return value–it’s defining a protocol for interacting with client code.
Consider the code example above. The obvious (and, as it turns out, wrong) way to understand the GetPoints() method is, “it generates a collection of points from (1, 2) to (19, 38) and returns it.” But GetPoints() doesn’t return any such thing. In fact, it doesn’t return anything but a promise–a promise to generate points later if asked. So when we say “var points = GetPoints();” what we’re actually saying is, “the points variable references some kind of points machine that will generate points when I ask for them.”
If we think of it this way, we start to get to the bottom of what’s going wrong here. On the next line, we pass this oracle into the DoubleXValue() method. The DoubleXValue() method iterates through all of the states of the points (state) machine, retrieving points as per the promise. Once it retrieves the point, it does something to the X coordinate and then promptly discards the point. Why? Because nothing else refers to it. When you change one of the points that the points machine spits out, you’re not changing anything about the points machine–you’re not feeding it some kind of new mechanism for point generation. You could think of this as being similar to a method that takes a class factory, requests a bunch of instances from it, modifies them, and then returns. Nothing about the factory is different, and you wouldn’t expect the factory to behave differently if the caller subsequently passed it to another method.
So once the DoubleXValue() method gets done doing, well, nothing of significance, the Assert() call requests the first sequential element–the first state–from the points machine. The points machine dutifully spits out its first state, (1, 2), and the unit test fails. So how do we get it to pass? Well, here’s one way:
[TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
public void Asdf()
{
var points = GetPoints().ToList();
DoubleXValue(points);
Assert.AreEqual(2, points.ElementAt(0).X);
}
Notice the added ToList() call. This is very important because it means that we’re no longer storing a reference to some kind of points machine but rather to a list of points. This line now says, “Go get me a points machine, iterate through all the states of it, and store those states locally in a list.” Now, the rest of the code behaves in a way that you’re used to because you’re storing an actual, tangible collection instead of a promise to generate a sequence.
There is no shortage of posts, documents, and articles explaining the yield return state machine concept or the idea of deferred execution. I encourage you to read those to get a better understanding of the inner mechanics and usage scenarios, respectively. But hopefully this gives you a bit of practical insight that’s easy to wrap your head around into (1) why the code behaves this way and (2) why you have to be careful of providing and consuming IEnumerables. It can be tempting to get too cute with how you provide IEnumerables or too careless with how you consume them, particularly when usage and implementation are separated by inversion of control. So be aware when using IEnumerables that you may not have a list/collection, and be aware when providing them that you’re leaving it up to your clients to decide when to get and store sequence members.
By the way, if you liked this post and you're new here, check out this page as a good place to start for more content that you might enjoy.