Deep C# - Value And Reference
Written by Mike James   
Monday, 04 October 2021
Article Index
Deep C# - Value And Reference
Heap and Stack

The beauty of stack allocation is that to implement garbage disposal all that has to happen is that when each method terminates it simply clears the stack of its stack frame, i.e. it adjusts the stack pointer to point to the end of its stack frame, so returning the memory to use. Of course, when a method ends, control returns to the method that called it and it finds the stack in just the state it needs to access its own local variables.

In this way each method gets access to its local variables while it is running, without any need to keep track of where they are. The stack is also used to store parameters and return values passed between methods in a fairly obvious way.


The stack in action


The stack works well for storing all manner of data that is created and destroyed following the pattern of method calls, but for global data and for very large or complex data we need something else.

The alternative is the "heap". This is a very descriptive name for an area of memory that is set aside simply for the purpose of storing global data. When a global object is created, by the use of new in C#, an area of the heap is allocated big enough to store it at its initial size and a reference to its location is returned - often to be stored in a local variable on the stack. However, unlike local variables which are created and destroyed along with the method they belong to, there is no clear signal that a global object is finished with. It doesn't necessarily become redundant when the local variable that it was initially created with goes out of scope because there could be many variables referencing it. So to deal with this problem a system-wide garbage collector is provided, which keeps track of what is stored on the heap and how many references there are to each item. When an object on the heap has no references to it then it is deallocated and the memory is recovered.

This the general idea of heap management, but in practice there are many subtle problems. For example, as the heap is used and released, it slowly becomes fragmented into small blocks of heap in use separating blocks of free space. The solution is to make the garbage collector consolidate memory every now and again.

It is generally better to adopt a throwaway approach to heap management. For example, if an object, a string say, needs to increase in size then, rather than try to open up some space to make the current allocation bigger, it is generally better to mark the current allocation as garbage ready for collection and allocate a whole new block of memory, even though this involves copying the existing object. This strange fact, that it is faster to create new storage rather than extend the existing, leads on to other ideas. For example, in most languages, including C#, strings are immutable. That is, once defined you cannot change a string. All you can do is apply operations that make new strings. You can think of immutability as a high-level concept motivated by philosophical considerations or just a good idea given the way storage allocation and deallocation behaves.

Storage on the stack fits in with the idea of local variables and the call and return pattern of methods. Storage on the heap gives rise to objects that are regarded as global, but with local references to them. When all of the references to an object are destroyed the object is no longer of any use and may be garbage collected.


Thinking About References

What you should have in mind is the idea that a value type stores its value and a reference type stores a “pointer” to its value.


int a;

This declares and creates an integer variable which, in common with all value types, isn't initialized to a sensible value. However, for the sake of a simple explanation let's assume it is set to zero. C# enforces the rule that you can’t make use of an uninitialized value type but nevertheless the integer variable exists and is ready to store something.



In contrast, if you declare a reference type, e.g. a class:

class Point
 	public int x,y

you can then create a reference variable of the same type:

Point b;

This declares a reference type b which can reference an object of the type Point, but at the moment no such object exists and the reference is set to its default value null.




This way of thinking has a nice tidy symmetry, even if it is spoiled by C#'s insistence on not letting you access an undefined variable - which is very reasonable.

To create a Point object we need the additional step:

b = new Point();

Now we have a Point object created on the heap and b is set to reference or “point” at it.



Notice that the reference variable b is just like the value variable a in that they are both stored on the stack and both store immediate values - the difference is that a’s value is the data and b’s value is a reference to the data.

Of course we often combine these two steps together to create the familiar idiom:

Point b=new Point();

This often seems to the beginner as redundant because of the way it uses “Point” twice.

The first use of Point declares a reference to a point object, i.e. b, and the “new Point” part actually creates the point object. It doesn’t take long for this to seem so familiar that you don’t give it a second thought.

Another important difference is that an object can correspond to multiple reference variables. For example:

Point b=new Point();
Point c = b;

This creates a single Point object but two reference variables both of which “point” at the same object.


Last Updated ( Monday, 04 October 2021 )