Deep C# - Value And Reference
Written by Mike James   
Thursday, 27 August 2015
Article Index
Deep C# - Value And Reference
Thinking About References

Value and reference are a fundamental division in the way C# treats data. It is important that you understand the differences and most importantly when to use a struct and when to use a class. These aren't just differences in efficiency, they affect the semantics too.

This is Chapter 1 of our ebook on C# - a work in progress.

Deep C#

 Buy Now From Amazon


 Chapter List

  1. Why C#?

    I Strong Typing & Type Safety
  2. Strong Typing
    Why Strong Typing ***NEW!
  3. Value & Reference
  4. Structs & Classes
  5. Inheritance
  6. Interfaces & Multiple Inheritance
  7. Controlling Inheritance

    II Casting & Generics
  8. Casting - The Escape From Strong Typing
  9. Generics
  10. Advanced Generics
  11. Anonymous & Dynamic

    III Functions
  12. Delegates
  13. Multicast Delegates
  14. Anonymous Methods, Lambdas & Closures

    IV Async
  15. Threading, Tasks & Locking
  16. The Invoke Pattern
  17. Async Await
  18. The Parallel For

    V Data - LINQ, XML & Regular Expressions
  19. The LINQ Principle
  20. XML
  21. LINQ To XML
  22. Regular Expressions

    VI Unsafe & Interop
  23. Interop
  24. COM
  25. Custom Attributes
  26. Bit Manipulation
  27. Advanced Structs
  28. Pointers 



Value and Reference

C# is an object oriented strongly typed language.

The “object oriented” part of the definition simply means that, in principle at least, everything is derived from the object class. However all object-oriented languages suffer from the need to treat some types of very simple values as something other than objects.

The reason for this is mostly efficiency – do you really want to wrap a simple integer value in all the machinery needed to implement a full class?

But there is also a fundamental difference in the way programmers think about simple data types and class based types. C# tackles this distinction in a very reasonable way that combines good efficiency, a natural approach and an object-oriented outlook.

Pointers Become References

C# defines two different types of variable, value and reference.

(There is also a pointer type which acts much like an untyped reference but this needs special treatment - see later.)

A value type usually corresponds to the simplest of data types, i.e. an int or a float, and the data is stored on the stack or within the code itself. The amount of storage used by a value type depends on its exact type.

On the other hand a reference type is essentially a sophisticated pointer to the actual data and so always takes the same, usually eight bytes, of storage.

A reference type is also stored on the stack but the object that it “points” at is allocated on the heap. You will notice the use of the term “point” and “pointer” in connection with reference types and this isn’t strictly correct even if it is so much easy to say.

Pointers are variables that generally contain the raw address of another variable. This is a very low level concept and in practice it leads to all sorts of problems and bad practices because it leads on to uncontrolled access to the machines entire memory.

A reference type is a way of providing the same behaviour but in a controlled or managed way.

Put simply references are safe - pointers are not!

However, despite all of the reservations thinking about a reference as a sort of typed pointer does help understand how everything works and thinking about references as pointers, preferably some sort of abstract pointer which has nothing to do with an "address" is recommended - just don’t admit to it in polite company!

Before we move on it is vital that you are 100% clear that you know what stack and heap storage is.

Heap And Stack 

If you do know about heap and stack then skip to the next section.  

When you declare a new variable the compiler has to generate code that allocates sufficient memory to store the data it is to hold.

The whole subject of memory allocation is a complicated and interesting one but every programmer should know about the two very general approaches - the stack and the heap.

Stack based memory is a natural match for the way that variables are allocated and created by a program constructed as a set of nested method or function calls - which most are.

What happens is that when a method is called all of its local variables are created on the top of the stack - creating its so called stack frame.

While the method executes it has access to only its local variables i.e. its entire environment is limited to the data on the stack frame. Exceptions to this local only rule are global variables and objects but these are allocated on the heap.

If the method calls another method then the new method creates its stack frame on the top of the stack. In this way each new method can allocate its own local variables in its own portion of the memory allocated to the stack.

The beauty of stack allocation is that to implement garbage disposal all that has to happen is that when each method terminates its simply clears the stack of its stack frame i.e. it adjusts the stack pointer to point to the end of its stack frame so returning the memory to use. Of course when a method ends control returns to the method that called it and it finds the stack in just the state it needs to access its own local variables.

In this way each method gets access to its local variables while it is running without any need to keep track of where they are. The stack is also use to store parameters and return values passed between methods in a fairly obvious way.

If you would like to know more about the theory of how a stack works then see "Stacks" in Babbage's Bag.


The stack in action


The stack works well for storing all manner of data that is created and destroyed following the pattern of method calls - but for global data and for very large or complex data we need something else.

The alternative is the "heap".

This is a very descriptive name for an area of memory that is set aside simply for the purpose of storing global data. When a global object is created, by the use of new in c#, an area of the heap is allocated big enough to store it at its initial size and a reference to its location is returned - often to be stored in a local variable on the stack.

However unlike local variables which are created and destroyed along with the method they belong to there is no clear signal that a global object is finished with. It doesn't necessary become redundant when the local variable that it was initially created with goes out of scope because there could be many variables referencing it.

So to deal with this problem a system wide garbage collector is provided which keeps track of what is stored on the heap and how many references there are to each. When an object on the heap has no references to it then it is deallocated and the memory recovered.

This the general idea of heap management but in practice there are many subtle problems. For example as the heap is used and released is slowly becomes fragmented into small blocks of heap in use separating blocks of free space.

The solution is to make the garbage collector consolidate memory every now and again.

Notice also that it is generally better to adopt a throw away approach to heap management. For example, if an object, a string say, needs to increase in size then, rather than try to open up some space to make the current allocation bigger, it is generally better to mark the current allocation as garbage ready for collection and allocate a whole new block of memory even though this involves copying the existing object.

Storage on the stack fits in with the idea of local variables and the call and return pattern of methods. Storage on the heap gives rise to objects that are better regarded as global but with local references to them. When all of the references to an object are destroyed the object is no longer of any use and may be garbage collected. 



Last Updated ( Wednesday, 18 November 2015 )