DaedTech

Stories about Software

By

Getting Too Cute with C# Yield Return

I ran across a method that returned an IEnumerable<T> recently, and I implicitly typed its return value. During the course of a series of method extractions, code movement, and general refactoring, I wound up with some code that passed the various unit tests in place but failed curiously at runtime. After peering at it for a few minutes and going through once in the debugger, I traced it to a problem that you don’t see every day, and one that probably would have had me tearing my hair out if I didn’t have a good working understanding of what the “yield” keyword in C# does. So today, I’ll present the essence of this problem in the hopes that, if you weren’t aware of it, you are now.

CuteYieldReturn

Here is an entire class that contains a nested type and a couple of methods, for illustration purposes. At the bottom is a unit test that will, if you copy this into your scratchpad, fail.

public class MiscTest
{
    public class Point
    {
        public int X { get; set; }
        public int Y { get; set; }
    }

    private IEnumerable GetPoints()
    {
        for (int index = 1; index < 20; index++)
            yield return new Point() { X = index, Y = index * 2 };
    }

    private void DoubleXValue(IEnumerable points)
    {
        foreach (var point in points)
            point.X *= 2;
    }

    [TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
    public void Asdf()
    {
        var points = GetPoints();
        DoubleXValue(points);
            
        Assert.AreEqual(2, points.ElementAt(0).X);
    }
}

It seems pretty straightforward. You have some method that returns a bunch of points, and then you take those points and pass them to a method that iterates through them, performing an operation on each one. So what gives? Why does this fail? Everything looks pretty simple (unlike my situation, where this became removed through a few layers of indirection), and yet we get back 1 when we’re expecting 2.

To understand this, it’s important to understand what yield actually does. At its core, the yield keyword is syntactic sugar that tells the compiler to generate a state machine under the hood. Let that sink in for a moment, because it’s actually kind of a wild concept. You’re used to methods that return references to object instances or primitives or collections, but this is something fundamentally different. A method that returns an IEnumerable and does so using yield return isn’t defining a return value–it’s defining a protocol for interacting with client code.

Consider the code example above. The obvious (and, as it turns out, wrong) way to understand the GetPoints() method is, “it generates a collection of points from (1, 2) to (19, 38) and returns it.” But GetPoints() doesn’t return any such thing. In fact, it doesn’t return anything but a promise–a promise to generate points later if asked. So when we say “var points = GetPoints();” what we’re actually saying is, “the points variable references some kind of points machine that will generate points when I ask for them.”

If we think of it this way, we start to get to the bottom of what’s going wrong here. On the next line, we pass this oracle into the DoubleXValue() method. The DoubleXValue() method iterates through all of the states of the points (state) machine, retrieving points as per the promise. Once it retrieves the point, it does something to the X coordinate and then promptly discards the point. Why? Because nothing else refers to it. When you change one of the points that the points machine spits out, you’re not changing anything about the points machine–you’re not feeding it some kind of new mechanism for point generation. You could think of this as being similar to a method that takes a class factory, requests a bunch of instances from it, modifies them, and then returns. Nothing about the factory is different, and you wouldn’t expect the factory to behave differently if the caller subsequently passed it to another method.

So once the DoubleXValue() method gets done doing, well, nothing of significance, the Assert() call requests the first sequential element–the first state–from the points machine. The points machine dutifully spits out its first state, (1, 2), and the unit test fails. So how do we get it to pass? Well, here’s one way:

[TestMethod, Owner("ebd"), TestCategory("Proven"), TestCategory("Unit")]
public void Asdf()
{
    var points = GetPoints().ToList();
    DoubleXValue(points);
            
    Assert.AreEqual(2, points.ElementAt(0).X);
}

Notice the added ToList() call. This is very important because it means that we’re no longer storing a reference to some kind of points machine but rather to a list of points. This line now says, “Go get me a points machine, iterate through all the states of it, and store those states locally in a list.” Now, the rest of the code behaves in a way that you’re used to because you’re storing an actual, tangible collection instead of a promise to generate a sequence.

There is no shortage of posts, documents, and articles explaining the yield return state machine concept or the idea of deferred execution. I encourage you to read those to get a better understanding of the inner mechanics and usage scenarios, respectively. But hopefully this gives you a bit of practical insight that’s easy to wrap your head around into (1) why the code behaves this way and (2) why you have to be careful of providing and consuming IEnumerables. It can be tempting to get too cute with how you provide IEnumerables or too careless with how you consume them, particularly when usage and implementation are separated by inversion of control. So be aware when using IEnumerables that you may not have a list/collection, and be aware when providing them that you’re leaving it up to your clients to decide when to get and store sequence members.

By the way, if you liked this post and you're new here, check out this page as a good place to start for more content that you might enjoy.
26 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
trackback

[…] Développeurs C#, un petit rappel des dangers de l’exécution différée et du mot clé yield. […]

Steve Gilham
Steve Gilham
11 years ago

It’s not just yield return that does this — anything out of a LINQ expression is similarly lazily evaluated. And if your enumeration had been something stateful, like reading bytes from a stream, or a random number generator, the second evaluation would not give the same results as the first.

In general, data qualified as just IEnumerable, regardless of source, should be regarded as a read-once data structure — so transform it through LINQ to your heart’s content, but reify it as an array or a list before handing it on.

Erik Dietrich
11 years ago
Reply to  Steve Gilham

Good point about the broader based applicability and yield returning something beyond the creation scope of the method. The inspiration for this particular post started out as “this is something specific that happened and here’s why,” but there are certainly more far-reaching complexities with the deferred evaluation paradigm.

dasjestyr
dasjestyr
7 years ago
Reply to  Steve Gilham

It’s implementation of the iterator pattern. LINQ is literally just a builder pattern that builds decorators which gate the result set. The IEnumerable that is returned from Where(…) is a custom iterator that wraps itself around the GetEnumerator() method of the source collection. So in the end, the logic used to deliver the next element of the collection is still defined within the source collection. I think what a method that yields actually does is return an anonymous iterator; basically the heart of an IEnumerable implementation, so as you iterate over that iterator, it just continues to deliver elements as… Read more »

Timothy Boyce
Timothy Boyce
11 years ago

Deferred execution can certainly cause some problems if you aren’t careful. ReSharper is great at warning you about most cases where there could be a problem. When I pasted in your code, it warned me about possible multiple enumerations of an IEnumerable.

Erik Dietrich
11 years ago
Reply to  Timothy Boyce

That’s really cool. Another piece of feature envy that I have for R#. Fingers crossed that it makes the Code Rush issues list in an upcoming release.

Michael Paterson
Michael Paterson
11 years ago
Reply to  Erik Dietrich

What is the Code Rush issue?

James Curran
James Curran
11 years ago

It’s the “issue list” (bug reports and feature requests) for Code Rush (Developers’ Express’s alternative to Resharper)

Toni Petrina
11 years ago
Reply to  Timothy Boyce

R# pointed out immediately that the enumeration is enumerated multiple times, a general no-no 🙂

James Curran
James Curran
11 years ago

The ToList() is merely a band-aid. The problem is with DoubleXValue(), which modifies that values, and then throws them away. The “correct” solution would be: var points = GetPoints(); points = DoubleXValue(points); // : // : private IEnumerable DoubleXValue(IEnumerable points) { foreach (var point in points) { point.X *= 2; yield return point; } } Alternately: private IEnumerable DoubleXValue(IEnumerable points) { return points.Select(p=> new Point {X = p.X * 2, Y = p.Y}); } or we could componentize it: private Point DoubleXValue(Point p) { return new Point { X= p.X * 2, Y = p.Y};} // : //: var points… Read more »

Erik Dietrich
11 years ago
Reply to  James Curran

The ToList() call was purely instructional — to highlight the difference between storing a deferred execution enumerable as a local and storing the list resulting from walking the enumeration (I thought that would be the best way to contrast them). I definitely like your solution with the return enumeration that also uses yield return — that’s what I wound up doing in the actual code that inspired this post 🙂

Jonathan C Dickinson
11 years ago

This does have quite a bit to do with `yield`, agreed – but I think it’s also about understanding pointers correctly (pointers in C# you exclaim? Yes guys, reference types are pointers).

James Curran
James Curran
11 years ago

Reference types are IMPLEMENTED AS pointers (but as is the case with all of OO design — Implementation Is Irrelevant)

Jonathan C Dickinson
11 years ago
Reply to  James Curran

Actually implementation is not irrelevant, hence the reason for this blog post. A developer needs to understand that passing reference values around is passing the same piece of memory around. Making a toy OO system in plain ol’ C is a must for any developer (even if it lands up being bad, leaky and whatnot). You need to **understand** the systems that lie underneath your abstraction level, so that you don’t get bitten by issues like this one (and potentially waste time with them).

Carsten König
11 years ago

welll this is what you get if you mix “side effects” with struff from functional programming … you see: just don’t mess with this stuff (use immutable data and pure functions) and you would not run into trouble …

Erik Dietrich
11 years ago
Reply to  Carsten König

Agreed. That’s the approach I take and prefer to take in reality here, myself. Unfortunately, we don’t always have complete control over the APIs and libraries that we use…. 🙁

Justin
11 years ago

Part of the problem is use of the ‘var’ keyword masking types. We are so comfortable with ‘Lists are IEnumerables’ and treating them interchangeably as such, but if you actually had to write IEnumerable as the declared type of a variable, that should immediately give you pause to think very carefully about what you’re doing.

Erik Dietrich
11 years ago
Reply to  Justin

I can’t speak for anyone else, but I’m not sure if the act of typing the type (as opposed to using CodeRush to flip between explicit/implicit or hovering the mouse over var) would really have an effect on my thinking. Typing the first “Foo” in “Foo foo = GetFoo()” doesn’t really engage my brain to think of the ramifications of the type — it’s just noise. That said, if I’m reading someone else’s code (or leaving this code for someone else I suppose), I see your point — you have a better piece of self-documenting code for someone who understands… Read more »

Firehawk70
Firehawk70
8 years ago
Reply to  Justin

I agree with Justin. I know this is old, but if anyone else comes across this article, refer to Microsoft’s coding conventions regarding “var” – https://msdn.microsoft.com/en-us/library/ff926074.aspx. Your usage is not compliant with “Do not use var when the type is not apparent from the right side of the assignment.”.

I work with someone who lazily uses “var” for everything now and it’s truly annoying. It makes code harder to read because I can’t figure out what type I’m dealing with, or be able to evaluate what methods or properties might be more appropriate per the code written.

Erik Dietrich
8 years ago
Reply to  Firehawk70

I won’t argue about personal readability preferences, since I’m not really in a position to do that, obviously. But I will offer a devil’s advocate argument as food for thought, using the MS coding standards you linked to. Their “don’t use var” examples are “int var4 = ExampleClass.ResultSoFar();” and “var inputInt = Console.ReadLine();” Neither of those lines is anything I would write. What if, instead, these read: var countOfCustomerRecords = ExampleClass.CustomerRecordsSoFar(); and var lineReadFromConsole = Console.ReadLine(); When writing code, I always strive to make the member names as clear as possible. Personally, I’d argue that both are easier to read… Read more »

Erik Dietrich
8 years ago
Reply to  Firehawk70

As an aside, the Microsoft code example just gave me an interesting idea for a new blog post. So, thanks 🙂

trackback

[…] I just read an interesting article called Getting too cute with c# yield return […]

Michiel Staessen
10 years ago

Working with IEnumerable and yield return can be tricky and one should indeed understand the mechaniscs of deferred execution. I experienced this yesterday. I started with .NET only a couple of months ago. I come from Java, so for me, yield return is quite “magical” in the awesome kind of way. I started playing around with it and used it in a performance test where I need to do a nested iteration of 15M and 80 entities. Running the test took very, very long (I started with a smaller number of entities) and I had no clue what was going… Read more »

Erik Dietrich
10 years ago

Hi Michael, Thanks for reading. Like you, I came to C# from Java (and C/C++ before that), but some years ago now, back when C# current version was 2.0. My personal impression over these years has been to fall in love with C# since it seems to be identical to Java but time-warped about 2 years in the future. I believe Java just recently introduced lambdas and closures with Java 1.7, whereas C# has had these build in since 3.0 a few years ago, IIRC. Your tale does seem to serve as a good cautionary tale for transplants from other… Read more »

Michiel Staessen
10 years ago
Reply to  Erik Dietrich

Using other return types than IEnumerable is indeed the best solution. It is also a more specific contract for your code. In Java, I would have never used the Collection interface (Java’s equivalent for IEnumerable) as a return type but rather used the List or Set interface. Seems like I should correct myself and start using IList and ISet instead… 🙂

trackback

[…] blogged about IEnumerable in the past and talked about how this is really a unique concept. Tl;dr version is that IEnumerable […]