Why do we need boxing and unboxing in C#?

In the .NET framework, there are two species of types--value types and reference types. This is relatively common in OO languages.

One of the important features of object oriented languages is the ability to handle instances in a type-agnostic manner. This is referred to as polymorphism. Since we want to take advantage of polymorphism, but we have two different species of types, there has to be some way to bring them together so we can handle one or the other the same way.

Now, back in the olden days (1.0 of Microsoft.NET), there weren't this newfangled generics hullabaloo. You couldn't write a method that had a single argument that could service a value type and a reference type. That's a violation of polymorphism. So boxing was adopted as a means to coerce a value type into an object.

If this wasn't possible, the framework would be littered with methods and classes whose only purpose was to accept the other species of type. Not only that, but since value types don't truly share a common type ancestor, you'd have to have a different method overload for each value type (bit, byte, int16, int32, etc etc etc).

Boxing prevented this from happening. And that's why the British celebrate Boxing Day.


Why

To have a unified type system and allow value types to have a completely different representation of their underlying data from the way that reference types represent their underlying data (e.g., an int is just a bucket of thirty-two bits which is completely different than a reference type).

Think of it like this. You have a variable o of type object. And now you have an int and you want to put it into o. o is a reference to something somewhere, and the int is emphatically not a reference to something somewhere (after all, it's just a number). So, what you do is this: you make a new object that can store the int and then you assign a reference to that object to o. We call this process "boxing."

So, if you don't care about having a unified type system (i.e., reference types and value types have very different representations and you don't want a common way to "represent" the two) then you don't need boxing. If you don't care about having int represent their underlying value (i.e., instead have int be reference types too and just store a reference to their underlying value) then you don't need boxing.

where should I use it.

For example, the old collection type ArrayList only eats objects. That is, it only stores references to somethings that live somewhere. Without boxing you cannot put an int into such a collection. But with boxing, you can.

Now, in the days of generics you don't really need this and can generally go merrily along without thinking about the issue. But there are a few caveats to be aware of:

This is correct:

double e = 2.718281828459045;
int ee = (int)e;

This is not:

double e = 2.718281828459045;
object o = e; // box
int ee = (int)o; // runtime exception

Instead you must do this:

double e = 2.718281828459045;
object o = e; // box
int ee = (int)(double)o;

First we have to explicitly unbox the double ((double)o) and then cast that to an int.

What is the result of the following:

double e = 2.718281828459045;
double d = e;
object o1 = d;
object o2 = e;
Console.WriteLine(d == e);
Console.WriteLine(o1 == o2);

Think about it for a second before going on to the next sentence.

If you said True and False great! Wait, what? That's because == on reference types uses reference-equality which checks if the references are equal, not if the underlying values are equal. This is a dangerously easy mistake to make. Perhaps even more subtle

double e = 2.718281828459045;
object o1 = e;
object o2 = e;
Console.WriteLine(o1 == o2);

will also print False!

Better to say:

Console.WriteLine(o1.Equals(o2));

which will then, thankfully, print True.

One last subtlety:

[struct|class] Point {
    public int x, y;

    public Point(int x, int y) {
        this.x = x;
        this.y = y;
    }
}

Point p = new Point(1, 1);
object o = p;
p.x = 2;
Console.WriteLine(((Point)o).x);

What is the output? It depends! If Point is a struct then the output is 1 but if Point is a class then the output is 2! A boxing conversion makes a copy of the value being boxed explaining the difference in behavior.


The best way to understand this is to look at lower-level programming languages C# builds on.

In the lowest-level languages like C, all variables go one place: The Stack. Each time you declare a variable it goes on the Stack. They can only be primitive values, like a bool, a byte, a 32-bit int, a 32-bit uint, etc. The Stack is both simple and fast. As variables are added they just go one on top of another, so the first you declare sits at say, 0x00, the next at 0x01, the next at 0x02 in RAM, etc. In addition, variables are often pre-addressed at compile-time, so their address is known before you even run the program.

In the next level up, like C++, a second memory structure called the Heap is introduced. You still mostly live in the Stack, but special ints called Pointers can be added to the Stack, that store the memory address for the first byte of an Object, and that Object lives in the Heap. The Heap is kind of a mess and somewhat expensive to maintain, because unlike Stack variables they don't pile linearly up and then down as a program executes. They can come and go in no particular sequence, and they can grow and shrink.

Dealing with pointers is hard. They're the cause of memory leaks, buffer overruns, and frustration. C# to the rescue.

At a higher level, C#, you don't need to think about pointers - the .Net framework (written in C++) thinks about these for you and presents them to you as References to Objects, and for performance, lets you store simpler values like bools, bytes and ints as Value Types. Underneath the hood, Objects and stuff that instantiates a Class go on the expensive, Memory-Managed Heap, while Value Types go in that same Stack you had in low-level C - super-fast.

For the sake of keeping the interaction between these 2 fundamentally different concepts of memory (and strategies for storage) simple from a coder's perspective, Value Types can be Boxed at any time. Boxing causes the value to be copied from the Stack, put in an Object, and placed on the Heap - more expensive, but, fluid interaction with the Reference world. As other answers point out, this will occur when you for example say:

bool b = false; // Cheap, on Stack
object o = b; // Legal, easy to code, but complex - Boxing!
bool b2 = (bool)o; // Unboxing!

A strong illustration of the advantage of Boxing is a check for null:

if (b == null) // Will not compile - bools can't be null
if (o == null) // Will compile and always return false

Our object o is technically an address in the Stack that points to a copy of our bool b, which has been copied to the Heap. We can check o for null because the bool's been Boxed and put there.

In general you should avoid Boxing unless you need it, for example to pass an int/bool/whatever as an object to an argument. There are some basic structures in .Net that still demand passing Value Types as object (and so require Boxing), but for the most part you should never need to Box.

A non-exhaustive list of historical C# structures that require Boxing, that you should avoid:

  • The Event system turns out to have a Race Condition in naive use of it, and it doesn't support async. Add in the Boxing problem and it should probably be avoided. (You could replace it for example with an async event system that uses Generics.)

  • The old Threading and Timer models forced a Box on their parameters but have been replaced by async/await which are far cleaner and more efficient.

  • The .Net 1.1 Collections relied entirely on Boxing, because they came before Generics. These are still kicking around in System.Collections. In any new code you should be using the Collections from System.Collections.Generic, which in addition to avoiding Boxing also provide you with stronger type-safety.

You should avoid declaring or passing your Value Types as objects, unless you have to deal with the above historical problems that force Boxing, and you want to avoid the performance hit of Boxing it later when you know it's going to be Boxed anyway.

Per Mikael's suggestion below:

Do This

using System.Collections.Generic;

var employeeCount = 5;
var list = new List<int>(10);

Not This

using System.Collections;

Int32 employeeCount = 5;
var list = new ArrayList(10);

Update

This answer originally suggested Int32, Bool etc cause boxing, when in fact they are simple aliases for Value Types. That is, .Net has types like Bool, Int32, String, and C# aliases them to bool, int, string, without any functional difference.

Tags:

C#

.Net

Boxing