Thursday, July 30, 2009

Passing objects using the ref keyword...Wait, aren't objects *always* passed by reference???

For the past month I've been on the job market and have been exposed to all kinds of technical questions. Most of them were run of the mill, but every once and a while I'd get asked a question that made me stop and think. Perfect for a guy with a blog, because now you have tons more stuff to blog about right?

P.S. For those interested, I accepted an offer earlier this week at a company called BIA. They are a computer forensics firm and I'm real excited to start!

So one question that I really liked, was the point of this blog post. The interviewer actually informed me that I was only one out of one hundred that got this question right. Not believing that these numbers were true, I asked everyone on my team at my old job, and only one of them sort of half knew. The others were completely stumped. I guess it is something that many developers just glance over.

OK, so what is this question already?? Well it goes something like this: "What's the difference between passing an object to a method the standard way, and passing an object to a method using the ref keyword?"

Let's break it down. Let's first talk about what the ref keyword does when it comes to value types. Simple example:



static void Main(string[] args)
{
int number = 10;

Add5(number);
Console.WriteLine(number);

Add5(ref number);
Console.WriteLine(number);

Console.ReadKey(true);
}

public static void Add5(int x)
{
x += 5;
}

public static void Add5(ref int x)
{
x += 5;
}


Before trying to run this, see if you can guess what the output would be.....


Answer: The first Console.Writline outputs 10 and the second one outputs 15. Why? Well, the definition of a "value type" is that when passing it to a method, it's passed by value. Meaning, not the actual variable itself is passed to the method, rather a copy of the value is passed to the method. Therefore, when calling the first method, just adding 5 to the variable that was passed in, does NOT affect the variable in the main method, because we only added 5 to the COPY of the x variable not the actual one.

In the second method call though, we're passing the variable using the ref keyword. This changes the behavior and actually DOES pass the actual variable itself to the method. Therefore, in the second method, when adding 5 to the value being passed in, you're actually messing with the same exact variable that's in the main method. Therefore, it outputs 15 showing the changes DID take effect.

So now we have an understanding of how value types work, and how ref changes the behavior. Let's talk about objects now, which are passed by reference. I'll start with another example:

First a simple Person class:



public class Person
{
public Person(string name, int age)
{
this.Name = name;
this.Age = age;
}

public override string ToString()
{
return String.Format("{0} is {1} years old.", this.Name, this.Age);
}

public string Name { get; set; }
public int Age { get; set; }

}


Simple class with two properties. Name and Age. Also, the ToString is overridden to make it easier to demonstrate.

Now, let's say we had something like this in our Program.cs:



static void Main(string[] args)
{
var alex = new Person("Alex", 27);

ChangePerson(alex);
Console.WriteLine(alex);

Console.ReadKey(true);
}


public static void ChangePerson(Person p)
{
p.Age += 5;
}


Try guessing what the output would be for this program.

Answer: Alex is 32 years old.

If you're paying attention you'll notice that this is different than how it was with value types. The change in the method DID affect the one in the main method! That's because objects are passed by reference, meaning that a reference (pointer) to the SAME object is being passed to the method. So in the method itself, you still have a reference to the same object that exists in the main method. Therefore, when you make changes, it does show up back in the main.

So back to the original question: If all objects are passed by reference, what's the point of the ref keyword when passing an object to a method??

So here's the deal. Let's talk in terms of the stack and the heap. In the current version of the CLR value types are stored on the stack, and reference types are stored on the heap. However, and this is key, for reference types, a *pointer* to that object is ALSO stored on the stack! Ok, so why is this so important? Well, I sort of mispoke earlier when I said objects are passed by refernece. I didn't give you the whole picture. What's actually happening is that a copy of the pointer is being passed to the the method. So while you are referencing the same object in memory in the method and the one in the main, the POINTER variable on the stack, is NOT the same. HOWEVER, when using the ref keyword, the pointer itself is passed to the method, not a copy of it! So when you're inside the method itself, you're dealing with the exact same pointer variable that's in the main.

The only way to explain is with an example:



static void Main(string[] args)
{
var alex = new Person("Alex", 27);

ChangePerson(alex);
Console.WriteLine(alex);

ChangePerson(ref alex);
Console.WriteLine(alex);

Console.ReadKey(true);
}


public static void ChangePerson(Person p)
{
p = new Person("Alex", 35);
}

public static void ChangePerson(ref Person p)
{
p = new Person("Alex", 45);
}


Here we have two methods that look the same. The only difference is the ref keyword. What the method is doing, is it's assigning a NEW person object to the Person p (the pointer) that was passed in to the method. However, if you run this program you'll see that the first Console.Writeline still outputs 27 years old, even though we assigned it to a person object that's 35 years old! The reason for this is because the pointer itself was passed by VALUE so when you're assigning a new person object, you're not assigning it to the same pointer referenced in the main.

In the second case however, since we're using the ref keyword, the pointer in the method is the SAME one that's in the main method. Therefore, the second Console.Writeline outputs 45 years old, because the pointer in the main, is now pointing to the object that was assigned to it in the method.

Personally I've never used this yet in production code, but if you understand this, then that means you understand the nitty gritty details of how parameters are passed around. Very impressive on interviews :-)

4 comments:

Jon Skeet said...

It's quite sad if understanding something as fundamental as how parameters are passed truly counts as "impressive".

However, your blog post incorrectly states that "objects are passed by reference". They're not. The reference is passed by value. Also, value types *aren't* always stored on the stack.

See http://pobox.com/~skeet/csharp/parameters.html and http://pobox.com/~skeet/csharp/memory.html for more details.

A.Friedman said...

Wow the great Jon Skeet criticized my blog, I'm honored. It's like in Hollywood, you know you've made it when there's a huge smear story about you on "Page Six" :)

Jon Skeet said...

I didn't mean to be as harsh as that came out in the end. (My excuse is that I'm not well right now :)

The first bit really was a comment on the software industry in general. I'm genuinely saddened by how often I find developers who have been working with C# for ages, but don't really understand such core things :(

The other two points just happen to be my pet peeves...

A.Friedman said...

No problem :)

In my defense though, while I did say that "objects are passed by reference" if you read the whole thing I wrote, I think I did a decent enough job of explaining what exactly that means, that yes the reference to the object is being passed by value.

As for the point about value types not always being stored on the stack, point well taken, but I think that's kind of outside the scope of this blog post anyway.