|Fundamental C - Compilation & Preprocessor|
|Written by Harry Fairhead|
|Monday, 17 February 2020|
Page 1 of 3
This extract, from my new book on programming C in an IoT context, explains the way that programs are compiled and looks in detail at the preprocessor stage - perhaps the most misused C facility.
Fundamental C: Getting Closer To The Machine
Now available as a paperback and ebook from Amazon.
Also see the companion volume: Applying C
So far we have mostly ignored the process of turning a C program into an executable. In the case of many languages this is a trivial process – the system runs the program. In the case of C things are more complicated for a variety of reasons. The main difference is that C has a preprocessor which is generally thought of as some sort of text processing engine that can transform your source code before it is presented to the compiler. The preprocessor should not be underestimated as in the early days it was the tool that converted C++ into C, so avoiding the need for a C++ compiler. At the very least you need to understand it and the way header files are combined with code files to create the document that is submitted to the compiler.
We also need to look at the process of converting source code into an executable. This is a more involved process than you might think and to fully master the C environment you need to understand it.
Up to this point we have avoided having to consider how everything works because we have relied on NetBeans to get the job done. NetBeans hides much of the complexity from the beginner, but sooner or later you have to find out where your files are and how it all works. This doesn’t mean you have to give up using an IDE like NetBeans, but you will understand what it is doing much better.
Four Stages of Compilation
When you compile a C program there are four distinct stages:
These stages all happen when the GCC compiler is invoked – so it isn’t really just the compiler you are using. Other compilers do the job in similar ways, but for the sake of concrete examples let’s focus on GCC.
Your C program takes the form of a text file, usually with a name like myprogram.c, where you write not just the C language but instructions to the preprocessor about how to modify the file. In particular the #include statements that you have been using are instructions to the preprocessor to merge additional files together with yours to produce the final text file that will be submitted to the compiler. This merged file is usually called a unit of compilation and it has the same name as your source file but with a different extension, e.g. myprogram.i.
The compiler takes the unit of compilation and produces an intermediate file with the same name as your source code and another extension, e.g. myprogram.s. This contains assembly code. That is, the compiler doesn’t compile to machine code, but to assembler. You can view and edit the assembly language that the compiler produces.
The intermediate file is processed by an assembler to produce machine code in a file with the same name as your source code with yet another extension, e.g. myprogram.o.
You might think that this is the end of the story and now you can run the machine code as an executable program, but this is usually not the case. The program generally is missing code that performs initialization and it may well refer to functions that are not defined in the compilation unit. Your code can refer to functions such as printf, which aren’t part of your program but are defined in libraries. The libraries are precompiled and are available as files ending in .o. The linker takes your .o file and either adds the code of the library function to it, static linking, or creates a reference to the external code, dynamic linking.
Static linking is simple, but the code is included in your executable program and having printf included in every executable in a system would make the code much bigger. Dynamic linking allows programs to share the library definition and hence save space, but if the library is missing your program will not run. Your program has a dependency on the library.
At last you have an executable that can be run. The linker outputs the final machine code file that is your executable and it uses a name with yet another extension. In the case of GCC this is usually a null extension, so your executable in this example is myprogram.
Thus the sequence is:
We now need to look at the details of some of these steps.
The Preprocessor - Include
The preprocessor is a powerful tool but it is also a dangerous one unless you fully realize that it is just a text processor.
The simplest thing that the preprocessor does for you is read in and merge include files.
When the preprocessor encounters:
it finds myfile and replaces the line with the contents of the file.
All preprocessor instructions start with a hash symbol and don’t end with a semicolon.
By convention files that are intended to be included have the extension .h and files that are not intended to be included have the extension .c but they are all C source files. You can include files with other extensions. There is more to say about header files and how they are used after we have looked at the linker.
The GCC compiler automatically searches for header files in the same directory as the source file and then in standard system directories, usually:
but this isn’t fixed. You can also include a relative path for the include file.
will first look in the folder mysub in the current directory.
If you use the angle bracket form:
then only the system folders are searched.
You can add directories to the search path using the -I command line option. If you are using NetBeans then you can set additional include directories using the project properties.
Going beyond #include the key idea is that of a macro, indeed the preprocessor is best defined as a macro processor.
|Last Updated ( Monday, 17 February 2020 )|