|Written by Ian Elliot
|Thursday, 15 January 2015
Page 1 of 3
There is a newer version of the draft of the book here.
This raises the question of what exactly is type and what is the problem it is supposed to solve?
Three Types Of Type
Even before we get started it is important to realize that there are a number of meanings to the word "type".
The most common usage of the word means primitive data type and after this it refers to the "class" that defines an object.
Exactly what these two meaning of type are all about will be explained in more detail later. For the moment it is assumed that you have a rough idea what a primitive type is e.g. and int or a string say and you have an idea what a class based type is i.e. an instance of a class.
For the rest of this article type will be taken to mean either primitive data type or class based type.
What Is Type For?
So what is type for?
Put simply type tells you the operations that you can perform.
By declaring the type of an object you specify exactly what operations you can use and what methods you can call.
If you know that x is of type integer then you know that it is fine to perform x+3 and not only do you know it but the compiler knows it as well. This allows the compiler to detect incorrect code and flag type errors at compile time - thus saving you from the embarrassment of a run time error.
In the case of an object knowing that an object is of a particular type defines precisely what methods and properties it has. This allows the compiler to check at compile time that you aren't calling any methods or accessing any properties that the object doesn't support. Again it is saving you from the embarrassment of a run time error.
This is a worthwhile idea but it also limits what you can do and forces you to introduce other mechanisms to get over the restrictions strong typing brings with it.
There is a trade off.
When you accept strong typing and type checking it becomes possible to find some types of error but these errors are fairly easy to find in other ways. After all you simply have to check that every operation on an object is legal. If this can be done at compile time then it can be done by reading the code. This approach is often referred to as "type inference" but you could just as well call it "property checking".
If you do adopt strong typing what you lose in return are ways of working that are type free - for example generic algorithms - and most language have to invent complicated ways of restoring these features e.g. generics, covariance, contravariance and so on.
In short type checking finds errors that are mostly easy to find and places restrictions on what you can do.
Because of languages such as Java, C#, C++ and so on most programmers are taught the strong typing is nothing but good and in fact you can't develop quality software without it.
This point of view is far from proven and to have a balanced view you really need to see both sides of the coin.
So we have two general meanings of the word type - primitive type and class based type.
The first and most basic relates to primitive data type.
In many ways this is the least important meaning of type - but it leads on to the more sophisticated - class-based data type.
Primitive Data Typing
Primitive data typing is so ingrained in most approaches to programming that it is difficult to see it afresh and consider its implications.
The idea of primitive type is deeply embedded in nearly all programming languages and hence in the minds of most programmers. As a result the alternative position of trying to eradicate all primitive data types which is outlined below is usually met with a great deal of resistance.
Try to keep an open mind.
Historically this is where the whole idea of type originated because it was necessary to make the distinction between different types of data for reasons of efficiency. As time has passed it has become less and less necessary to worry how data is actually stored and computer languages have become increasingly abstract and removed from the constraints of hardware.
The point is that what we program should never depend on the low level detail of how the bits are stored - and as long as it does we are still in a primitive state of development.
The object of any high level language is to abstract away from the reality of hardware, bits and representations.
In an ideal world we wouldn't worry too much about low level concepts such as primitive data type because the language would take care of everything. Instead of worrying about the format that data is stored in you would concentrate on the operations that you apply to data.
This is a difficult idea to take on board because as programmers we know for a fact that a number is stored in one way and a string say is stored in another. In fact we even know that numbers - integers and floats - are stored in different ways.
These differences are so ingrained that it is difficult to see or agree with the assertion that this isn't necessary.
You may say - yes it is - because numbers are different from text and you need integers and reals and so on.
No you don't - you simply need operators that do the right job. Most ideas of primitive type are in fact the result of not defining operators in the correct way.
For example the addition operator would expect to work with two numbers and the concatenation operator would expect to work with two strings. The form that the data is stored in shouldn't matter and in this sense 123 and "123" are numbers - a string that happens to represent a number is a number. In the same way the concatenation operator would treat any number as a string.
What if you try to add a string that doesn't represent a valid number?
You get an error what else could happen, but throwing an error when the string does represent a valid number is ignoring what is in front of you.
The representation of data should be irrelevant to the operation of a program.
This is how humans work with data and we are the pinnacle of sophistication and abstraction. If I ask you to add 1 to 2 you don't worry about the data representation - are they strings or integers or what? Clearly they are valid numbers so you get on and add them.
Similarly, if I ask you to concatenate 123 at the end of the "some string of words" then you just do it. Again you don't worry about the difference between text and numeric values.
It is not the data representation that matters but the operation.
From the point of view of sophisticated abstraction the complete beginner is close to the ideal when they complain that there really is no difference between 123 and "123".
In an ideal world for example the addition operator and the concatenation operator would be distinct - + and & say. When you write a+b this would be addition irrespective of what a and b are and a&b would be text concatenation. Notice that a+b might throw an error if either a or b could not be interpreted or coerced to a number - this is reasonable. Throwing an error in any other case is being too picky and yet many programmer believe that this is what should happen "you can't use a string as if it was a number".
The two operators are both represented by + and what a+b means depends on the primitive types of a and b - this is not good.
What about the logical distinctions between integers and reals?
Surely this is a data type distinction that is founded in mathematics?
Mathematically there are integers, rationals and irrationals. Some math programming languages have all three types but in most cases all we need is one type of number that can be either an integer or a decimal rational as the need arises. Integer and rational operations need only to be built into the operators not the data.
You also don't need notions of int32 or int64 to work with integer arithmetic - you simply need an integer division operator. In fact ideas such as int32 and int64 reveal that there are many languages that are far too bound to the hardware to be regraded as modern.
All we really need is the notion of number - any precision necessary and text - any number of characters.
We might also need a Boolean type and here we get into very complicated matters best discussed in another chapter dealing with Truthy and Falsy.
This viewpoint that primitive data types are irrelevant isn't particularly popular - mainly due to many years of exposure to low level languages such as C and C++ that have institutionalized the notion of primitive type.
From this view point and in an ideal world there should be no primitive data types - just objects and operators.
|Last Updated ( Sunday, 10 May 2015 )