Tuesday, June 9, 2009

How do closures / captured variables work under the hood?

First, a little disclaimer. I'll be using 3.0 syntax with Lambda's but this has been around since .NET 2.0 and anonymous methods.

Ok, so first a little demonstration:



namespace ClosureDemo
{
public delegate void StuffDoer();

class Program
{
static void Main(string[] args)
{
var stuffDoerDelegate = GetDelegate();
stuffDoerDelegate();
stuffDoerDelegate();

Console.ReadKey(true);

}

private static StuffDoer GetDelegate()
{
int counter = 1;
StuffDoer result = () =>
{
counter++;
Console.WriteLine(counter);
};
result();
return result;
}

}
}


Simple little app. First, I create a delegate type called "StuffDoer" that has a return type of void, and takes no parameters. Then, I have a private method that news up a StuffDoer delegate, and attaches an anonymous method to it. Here's the interesting thing to note though: I declared a variable called counter outside of the anonymous method. The obvious question that comes to mind is how? Shouldn't counter be outside of scope? If you didn't declare this as an anonymous method, but rather as a concrete method:



private static StuffDoer GetDelegate()
{
int counter = 1;
StuffDoer result = new StuffDoer(DoStuff);
return result;
}

private static void DoStuff()
{
counter++;
Console.WriteLine(counter);
}


This obviously wouldn't compile because counter is undefined in the DoStuff method. So how can it work in the anonymous method? Furthermore, you'll notice, that if you run this program, it prints out 2, 3 and 4 which shows us that the counter variable somehow stuck around?? How? We declared it once inside our GetDelegate method, shouldn't it have been lost after we left that method? How did it retain it's state?

This is knows as "closures" or "captured variables" and was always just one of those "black magic it just works" kinda thing for me. I researched it, thought I kinda understood it, but never QUITE got it. Recently, however, I picked up a copy of Jon Skeet's C# In Depth book (I CANNOT stress enough how great this book is! I'm only halfway through, didn't even start with the C# 3.0 / .NET 3.5 stuff yet, and already I've learned tons of stuff. I HIGHLY recommend it!) and it opened my eyes as to what exactly is going on under the covers. I'll admit, it may not really matter at the end of the day, you could just stick with the "it just works" attitude, but I like trying to understand how stuff works under the covers.

The first thing to understand is that the compiler is smart. Very smart. It notices that the counter variable is being "captured" and does some funky stuff for us. If you open reflector, you can see what exactly happened here:



You'll notice that there's a class there that I never created. You'll see it there as "<>c__DisplayClass1". And if you look at the code, it looks something like this:



[CompilerGenerated]
private sealed class <>c__DisplayClass1
{
// Fields
public int counter;

// Methods
public void <GetDelegate>b__0()
{
this.counter++;
Console.WriteLine(this.counter);
}
}



Ok, we're getting somewhere; a class was created for us with the counter variable as a public member, as well as a public method that looks just like our anonymous method. Very cool.....but how exactly does that help us? Now, if we look back in our "GetDelegate" method in Reflector, this is what we see:



private static StuffDoer GetDelegate()
{
<>c__DisplayClass1 CS$<>8__locals2 = new <>c__DisplayClass1();
CS$<>8__locals2.counter = 1;
StuffDoer result = new StuffDoer(CS$<>8__locals2.<GetDelegate>b__0);
result();
return result;
}


Looks like a mess with all the compiler generated stuff, but we can finally see the big picture. I'll rewrite the code in "plain english" so you can see what's going on:

First, I create a counter class: (this is instead of the "<>c__DisplayClass1" the compiler generated.)



public class CounterClass
{
public int counter;

public void DoSomething()
{
counter++;
Console.WriteLine(counter);
}
}


Then, back in the Program.cs, let's change the GetDelegate method to this:



private static StuffDoer GetDelegate()
{
CounterClass c1 = new CounterClass();
c1.counter = 1;
StuffDoer result = new StuffDoer(c1.DoSomething);
result();
return result;
}


Here's the key. As I've pointed out in the past, delegates are objects, and can hold references to other objects. So, what's happening here is, an instance of our CounterClass is created, and we pass in one of it's methods to a new instance of the StuffDoer delegate. This now causes the delegate to hold a reference to this CounterClass object. Then, every time after that, when you invoke the delegate, it's still holding a reference to the same object it had in the beginning so you're constantly calling the methods on the same object!

Now it all makes sense; that's how it can reference the counter variable inside the anonymous method, because it's actually a method in a class that's referencing it's own public member. And now also we understand how it maintains state, because it's just a regular object that's being kept around. Smart smart compiler :)

No comments: