Value and reference are a fundamental division in the way C# treats data. It is important that you understand the differences and most importantly when to use a struct and when to use a class. These aren't just differences in efficiency, they affect the semantics too.
There is a newer version of this article:
Deep C# - Value And Reference
Value and Reference
C# is an object oriented strongly typed language. The “object oriented” part of the definition simply means that, in principle at least, everything is derived from the object class. However all object-oriented languages suffer from the need to treat some types of very simple values as something other than objects.
The reason for this is mostly efficiency – do you really want to wrap a simple integer value in all the machinery needed to implement a full class? But there is also a fundamental difference in the way programmers think about simple data types and class based types. C# tackles this distinction in a very reasonable way that combines good efficiency, a natural approach and an object-oriented outlook.
Pointers to references
C# defines two different types of variable, value and reference. (There is also a pointer type which acts much like an untyped reference but this needs special treatment - see later.)
A value type usually corresponds to the simplest of data types, i.e. an int or a float, and the data is stored on the stack or within the code itself. The amount of storage used by a value type depends on its exact type.
On the other hand a reference type is essentially a sophisticated pointer to the actual data and so always takes the same, usually eight bytes, of storage. A reference type is also stored on the stack but the object that it “points” at is allocated on the heap. You will notice the use of the term “point” and “pointer” in connection with reference types and this isn’t strictly correct. Pointers are variables that generally contain the raw address of another variable. This is a very low level concept and in practice it leads to all sorts of problems and bad practices because it leads on to uncontrolled access to the machines entire memory.
A reference type is a way of providing the same behaviour but in a controlled or managed way. Put simply references are safe - pointers are not!
However, despite all of the reservations thinking about a reference as a sort of typed pointer does help understand how everything works and thinking about references as pointers is recommended - just don’t admit to it in polite company!
What you should have in mind is a picture that a value type stores its value and a reference type stores a “pointer” to its value.
This declares and creates an integer variable which in common with all value types is initialised to a sensible default value, zero in this case. C# enforces the rule that you can’t make use of an uninitialised value type but never the less the integer variable exists and is initialised.
However if you declare a reference type, e.g. a class:
public int x,y
you can then create a reference variable of the same type:
This declares a reference type b which can reference an object of the type Point but at the moment no such object exists and the reference is set to its default value null. This is clear difference between the types you can create a reference type that doesn’t actually reference anything i.e. doesn’t correspond to any valid data that you can work with.
To create a Point object we need the additional step:
Now we have a Point object created on the heap and b is set to reference or point at it. Notice that the reference variable b is just like the value variable a in that they are both stored on the stack and both store immediate values - the difference is that a’s value is the data and b’s value is a reference to the data.
Of course we often combine these two steps together to create the familiar idiom:
Point b=new Point();
This often seems to the beginner as redundant because of the way it uses “Point” twice.
The first use of Point declares a reference to a point object, i.e. b, and the “new Point” part actually creates the point object. It doesn’t take long for this to seem so familiar that you don’t give it a second thought.
Another important difference is that an object can correspond to multiple reference variables. For example:
Point b=new Point();
Point c = b;
This creates a single Point object but two reference variables both of which “point” at the same object.
It is often said that an important difference between value and reference types is the their life cycle. In fact both types of variable have exactly the same behaviour with respect to when they are created and destroyed.
That is a value or a reference type is destroyed as soon as the variable is clearly no longer accessible - i.e. out of scope. This means, for example, that a variable defined in a method is destroyed as soon as the function terminates. It is this behaviour that makes local variables truly local to the method or block that they were declared in. Notice that there can be exceptions to this rule such as static variables with aren’t destroyed until the application terminates. However it is true to say that the vast majority of variables do behave in this way.
What is different between value and reference types is what happens to their data when the variable is destroyed. In the case of a value type variable the variable and its data are one and the same and so when a value type variable is destroyed so is its data.
However a reference type variable only contains a reference to its data and while the variable and the reference it contains is destroyed - the object that it references isn’t. This is the source of the statement that value and reference variables have different lifetimes - they don’t but the data associated with them can have.
Obviously we can’t leave unwanted objects on the heap forever and this is where the system garbage collector comes in. This is a service that periodically scans the heap looking for object that no longer have any references to them. An object with no references too it is clearly no longer required and using this fact the garbage collector eventually gets round to clearing up the heap.
Notice that this difference in lifetime is entirely to do with the way that things are stored. The value and reference variables are stored on the stack and this is naturally self managing in the sense that when a method returns all of its local variables are destroyed by the adjustment of the stack pointer. Anything stored on the heap has no such natural cleaning process and we have to implement a garbage collection system to determine when they are no longer required and when they should be removed.
How and when to tidy the heap is entirely a matter of efficiency - garbage collect too often and you use up processor power when the is plenty of heap waiting to be used. Garbage collect too little and you risk bringing the application to a halt while the garbage collector has to work overtime freeing up memory by deleting objects and consolidating free space.
See “Stack and Heap ”.
- Next >>