Page 1 of 2
Programming, and computer science in particular, has a tendency to use other people's jargon. Often this makes things seem more difficult. You may have heard of covariance and contravariance and wondered what they were all about. If you want a simple explanation that applies to any computer language, here it is.
Covariance and contravariance are terms that are used in different ways in the theory of object-oriented programming and they sound advanced and difficult - but in fact the idea they they encapsulate is very, very simple. It is an idea that seems to originate in physics but in fact it is a more a mathematical concept.
Let's find out.
Functions - the start of co and contra
Covariance and contravariance occur all over mathematics - in vector spaces, differential geometry and so on.
They derive from a very general idea based on observing what happens when you make a change - the result usually goes in the same direction as the change, i.e. it is covariant, but sometimes it goes in the other direction, i.e. it is contravariant.
The most elementary example I can find of the co and contra behavior is the simple mathematical function.
A function has an input and an output and these behave differently if you try to change them.
For example, suppose you have the function:
then first lets see what happens to the function if we add a constant to x. You can think of this as defining a new function F(x)=f(x+a).
A graph of sin(x)
If you draw a graph of the function sin(x+a) you can easily see that the effect of the +a is to move the graph -a units. That is x is translated a units but the function is translated -a units.
The function moves in the opposite direction to x and so we say that it is contravariant in x.
The reason this happens is simply that if you add a to x then the function value that was at x i.e. f(x) is no longer produced when you enter x because this now produces f(x+a). The value that was produced by x i.e. f(x) is now produced when x is set to x-a as f(x-a+a) is just f(x) again. Thus shifting x by +a moves all the function values by -a.
Changing x to x+a moves the graph in the -a direction
Now consider adding a constant to the function y . You can see that y+a is given by sin(x)+a which moves the graph of the function up by a units.
The function moves in the same direction as the constant and so we say it is covariant in y.
The reason for this is much easier to see because the transformation is applied after the function i.e. F(x)=f(x)+a and the new function is just the old function moved up by a.
Changing y to y+a moves the graph up a units
In general changes to the inputs of things tend to move the function in the opposite way and are contravariant but changes to the outputs move it in the say way and so they are covariant.
This is where the idea comes from but it turns up with modifications in all sorts of places and it can seem much more sophisticated than this simple example - but in all cases it is contravariant if the change you make results in the opposite change in what you are considering and covariance if it results in the same change.
Covariant and Contravariant Vectors
Note: if you are not into physics skip to the next section.
If you encounter the use of the contravariance and covariance in say physics or math then it might not be obvious that the difference is related to changes in inputs - contravariant or outputs - covariant, but if you dig deep enough you will find that this is exactly what it is.
For example a contravariant vector is just a normal standard vector and when you change say the scale of the axes it changes in the opposite direction. The same rules work for a general transformation of the basis vectors but it is much easier to understand if you first restrict your attention to a simple change of scale.
For example, the 2D vector (1,1) with the axes marked up in meters becomes (100,100) when you reduce the scale to centimeters - because what was 1m long is now 100cm long.
Notice you scale the axis by 1/100 i.e meters to centimeters and the vector co-ordinates change by a factor of 100.
Thus standard vectors are contravariant with respect to changes in the basis.
The more complicated case is for a dual vector or functional, things work in the opposite way because a dual vector is a function. If x is a vector and y is a dual vector xy or x.y or (x,y), the notation varies a lot, is a numeric value. To get the numeric value you simply multiple each co-ordinate of the vector by the corresponding coordinate of the dual and sum. That is if the vector is (a,b) and the dual is (c,d) then
This is usually called the scalar or inner product.
In other words a dual vector is a linear function that maps a standard vector to a number.
Now consider the effect of changing the scale of the axes used for the dual vector. If you multiply its scale by 1/100 then as before the dual that was (c,d) becomes (100c,100d) but the dual vector that gives you the same result as (c,d) before the transformation, when applied to a vector is (c/100,d/100).
For example is you have (a,b)(c,d)=ac+bd before the transformation then after the transformation you have (a,b)(100c,100d)=100(ac+bd) and hence the dual vector that produces the same result is (c/100,d/100).
As the dual is a function it changed covariantly by a change in the basis.
It is a wonderful result that if you change the basis of the vectors and the dual vectors in the same way e.g. by the same scaling then as one of them is contravariant and the covariant the scalar product is invariant.
But to return to programming.
The common use of the terms contravariance and covariance in programming has come to mean something a little more specific than in physics and math - but still often related to the inputs and outputs of functions so lets look at this common usage first.
The whole idea relates to the hierarchy of types.
If type B is derived from type A that is it "inherits" from A or, put another way, A is the base class for B - then it contains every method and property that type A does and more.
In this sense type B is "bigger" than type A.
You can see that the type system provides a way of ordering classes. (Notice that not all classes are on the same branch of the type hierarchy and so not all classes can be compared in this way i.e. we only have a partial ordering.)
The idea that B is in some way "bigger" than A is an important idea. Unfortunately it has long been the case that we use the less than symbol to show that a type is derived from another type which is perhaps the wrong way round.
That is if we write A>B then A is "higher" in the type hierarchy than B or in other words B is derived from A.
The substitution principle
If B derives from A then A>B and B inherits or contains everything that A has to offer.
As B contains everything that A does you can use a B anywhere that you could have used an A. Of course you can't always use an A in place of a B so the relationship is asymmetrical.
This is formally known as the Liskov Substitution principle.
In fact the full "principle" is a little wider than this as it holds that anything that is true of an A should be true of a B and in general this is difficult to achieve - but you get the general idea.
In practice it is a more a guiding principle than something that is enforced all of the time.
Inferred type relationships
Looking at the substitution principle the other way round we can use it to define the hierarchical type relationship between types.
That is if type B can be used anywhere a type A would be ok then you can say that type A is a base type for type B and B<A.
This is a reasonable idea because if B can always be used in place of A then it has everything that A has and perhaps more.
This is a useful idea when you are working with types that are not explicitly defined within a type hierarchy by being explicitly derived from one another - as is the case for delegates say, see later.