|Fundamental C - Variables|
|Written by Harry Fairhead|
|Monday, 08 April 2019|
Page 1 of 3
Data is often under-regarded by programmers. It just isn't as exciting as writing the code that does something with that data. In fact, nothing could be further from the truth and C in particular is a language that was designed to have data at its core - but not for the same reasons that most modern languages do This extract, from my new book on programming C in an IoT context, looks at some aspects of data in C that makes it different.
Fundamental C: Getting Closer To The Machine
Now available as a paperback and ebook from Amazon.
Also see the companion volume: Applying C
One of the big differences between a language like C and more abstract languages like Java or C# is that C was designed to be close to the way that the machine works in terms of data. C creates abstract constructs that make writing code easier than writing in assembler. It gives you for and while loops, if statements and so on, which are much simpler to use than the lower level sequences of assembly language needed to do the same tasks.
When it comes to data, however, C stays very close to the addressing and the organization of RAM that you find in a real machine. This is a big plus point if you are looking to make effective use of memory. It is also essential if you are writing code which interacts with hardware that is represented as particular areas of memory. The problem is this flexibility and realism entails some major responsibilities. It is up to you to organize and use memory in sensible ways. It is all too easy to write programs that stray into areas of memory they were never intended to access. This is the reason why C code has a reputation for being buggy and dangerous. It is undeniably low-level code and as such the only way to write safe, high-quality code is to understand how C works and know what it is you are trying to achieve.
Being so close to the hardware means you can write programs that work with it in ways that other languages make difficult. You also learn how the hardware works and this is a valuable education in itself.
Finally, C is low-level in another sense. What is stored in memory is best regarded as a pattern of bits rather than a particular type of data. In C programming you often will treat an area of memory as an int and later on treat it as a character or an int of a different size. What matters in C is the bit pattern, and we will investigate this further in much of the rest of this book.
Computer memory is organized into chunks of storage which are fixed in size, typically 16 or 32 bits. Generally each chunk of storage has a unique address which is used to identify it. It is also usual that parts of a chunk of storage will also have addresses. For example most modern machines assign an address to the individual bytes that makeup a larger memory unit.
Back in the days when C was being invented, the standard machine of the day, the PDP 11, organized its memory as 16-bit words. This means that if you specified an address, you read or wrote a single 16-bit word. When C was created its fundamental data type was int and this was assumed to be a 16-bit or2-byte storage location.
So what is the size of an int today?
The answer might surprise you. It is still decided by the compiler implemented for the machine. Many C programmers believe that C data types have a fixed size; they don't, and they vary according to the machine you are using.
This might seem like madness if you are familiar with other higher level languages, but C is designed to be close to the machine it is running on. When you say you want to use an int, you are asking for a variable that is the fundamental access unit of the machine. That is, reading or writing the variable involves one memory access. For example, suppose the C standard defined int to be 32-bits in size and you were working on a machine that had a memory organized into 16-bit words. Now when you stored or retrieved something from an int, 32 bits would have to be transferred and this would mean two memory accesses on a machine with a 16-bit word. This would slow your programs down significantly.
The way C actually operates means that when you ask for an int variable you get a word size that corresponds to the most efficient memory access the machine can offer.
Of course, even this rule is likely to be broken because it is up to the compiler writer to implement whatever makes sense in the circumstances, but this is the intent. The same is also true of the other data types and this is the reason that their sizes are not fixed.
The vagueness about int extends beyond its size. An int should be capable of holding a signed value, i.e. both positive and negative values, but the format used to store this isn't specified. There are two common ways to represent negative numbers, one's complement and two's complement. Most common hardware uses two's complement and this is usually what you encounter in a C int, but it isn't mandated. Even so, many programmers think that when they declare an int they get a 2- or 4-byte memory location holding a two's-complement value, but this doesn't have to be so. An int should be the natural size for the machine in use and the numeric format is whatever the machine uses when it does arithmetic. All in all this is very vague.
So how do C programmers cope with this vagueness? Sometimes it doesn't matter because the program would work with a 2-byte int and just as well with a 4-byte int and the numeric representation doesn't matter. Sometimes it does matter and in these cases you need to use data types that are guaranteed to be particular sizes - more of this later.
The often overlooked fact is that C is a language that targets specific machines and if you want to know what the data types are you have to ask what the target machine is. For nearly all machines with a 16-bit architecture, a C int is a two's-complement 16-bit value; for 32-bit architectures int is a two's-complement 32-bit value. However, for a machine with a 64-bit architecture int is still usually 32 bits because this is more efficient than using 64-bit integers.
|Last Updated ( Monday, 29 April 2019 )|