C preprocessor

From OSDev Wiki
Jump to: navigation, search

The C preprocessor is the first step in the process of translating C/C++ source code into a binary. Generally, the process walked through is preprocessing, compiling and finally linking. In trivial environments, the preprocessor is used only for #includeing header files, and providing "header guards" to avoid multiple inclusions. However, the preprocessor can do much more, and can be very useful - not only for C/C++ sources, but for your Assembly sources as well. Use it with care, since it can also obfuscate your source code and introduce bugs that may be very difficult to debug.



The preprocessor handles preprocessor directives, which are lines that begin with '#' . Really old compiler versions demanded that the '#' be placed in column 1, modern versions of C and C++ allow preprocessor directives to begin in any column, as long as the first non-whitespace character of the line is '#' .

Lines with preprocessor directives can be "continued" by placing a backslash ('\') as the last character of the line.


The most familiar use of the preprocessor is to include header files (containing function declarations, definition of constants etc.):

#include <stdio.h>
#include "myheader.h"

The effect is that the contents of the given header file are pasted into the source file. The technical difference between <> and "" is that the compiler is allowed to satisfy <> includes internally, i.e. without actually accessing any on-disk files of that name. None of the prominent compilers do this, to the knowledge of the author, but it has become common practice to use <> for system headers and "" for your own header files.

Header files are searched for in a list of preconfigured directories (the include path). This list of include directories can be prepended to by the user (e.g. by using the "-I <directory>" option of GCC).

The #include statement can be used in other contexts, too: As a replacement for assembler-specific include directives, for example.

Another possible use is "templating" a piece of code that keeps recurring in more than one source file but could not be put into a seperate function. This way, you could still reduce redundancy by keeping the shared code in a single file and merely #includeing it where needed. This, however, is a pretty ugly construct and should be avoided if possible.

Preprocessor Macros, pt. 1

The preprocessor can define tokens. It is good custom to write these tokens in ALL CAPS. (See pt. 2 as for why.)

#define MYTOKEN

Most compilers also allow the definition of preprocessor tokens on the command line, e.g. the "-D MYTOKEN" option for GCC.

Conditional Compilation

The preprocessor can conditionally select which parts of source code to compile, depending on whether a given token is defined or not (see above).

#define MYTOKEN
#ifdef MYTOKEN
/* This source will be compiled */
#ifndef MYTOKEN
/* This source will be removed */
/* This source will be compiled */

Note that such #if / #ifdef / #ifndef - #endif sections can be nested.

Header Guards

Non-trivial projects face the problem that a header file includes other header files in turn. Let's say both abc.h and def.h both include xyz.h. Should you #include both abc.h and def.h in your source, you will likely end up with warnings and errors about redefinitions etc.

The solution are header guards, a combination of conditional compilation and token definition:

/* abc.h */
#ifndef ABC_H_
#define ABC_H_
/* declaractions here */

Preprocessor Macros, pt. 2

Preprocessor tokens can also be assigned a value.

The preprocessor will replace any occurrence of a defined token in the source code with the value the token has been defined to. This is also true for tokens that have been defined to nothing (as in pt. 1 above). This is the reason why preprocessor tokens are customarily written in ALL CAPS - to avoid accidential clashes with identifiers used in the source code itself.

The #if statement can be used to base conditional compilation on token values. Note that the preprocessor can only work with compile-time constants. Compiler-evaluated code like `sizeof()` cannot be used in preprocessor directives. On the upside, the preprocessor can natively handle non-numerical values.

#define MYTOKEN foo
#define OTHERTOKEN 42
#if MYTOKEN == foo
/* This code will be compiled */
#elif MYTOKEN == bar
/* This code won't */
/* Will be compiled. */
#if OTHERTOKEN != 42
/* Won't be compiled. */

The #if directive also allows for a simple construct to disable a region of code without having to worry about nested /* ... */ style comments:

#if 0
/* disabled code */

Such code can easily be re-enabled temporarily with no more effort than replacing the "0" with a "1". Source comments as to why you disabled code this way are in order.


Using the #undef directive, a preprocessor token can be undefined. This is useful for trickier setups where you might want to redefine a token to a different value: Redefinitions generate a warning message, undefinitions of undefined tokens don't.

This should not be constructed as an advice to always use #undef before a #define. Those warnings might actually be pointing to a real problem in your logic. Use #undef with care.

Predefined Tokens

The preprocessor provides a couple of tokens which are automatically defined to the appropriate values - something very useful when constructing error messages or tracing messages. Note that some obsolete compilers might balk at __func__ and not all tokens may be supported or implemented by all compilers.

Preprocessor Token Explanation
__FILE__ Holds the name of the current source file being compiled (as a string).
__LINE__ Holds the current line being compiled (as an integer).
__DATE__ Holds the date when the compilation process began (a string with the format "Mmm dd yyyy").
__TIME__ Same as the previous, but the time (a string with the format "hh:mm:ss").
__cplusplus When defined, the value indicates that C++ compilation is active. When the compiler is (fully) compliant to the standards, the value should be >= 199711L.
__STDC__ When defined, the value indicates that the compiler is (fully) compliant with the ANSI C standard.
__func__ Holds the name of the function it is used within (as a string).

Different compilers may define extra preprocessor tokens. Visual C++ for example may define _MSC_VER __cplusplus_cli. See the link section below for more information.


Assertions are used to catch situations which should never happen, even under error circumstances. If the condition given in the parantheses does not evaluate to "true", a diagnosis is printed which contains source file name, line number, and (since C99) name of the current function; the program then calls abort().

#include <assert.h>
assert( sizeof(struct free_memory_block) == 8 );
assert( 1 != 2 );
assert( gdt_ptr != null );

For production code, assertions may be turned off by defining NDEBUG:

gcc -DNDEBUG ...

Note that <assert.h> does not have (or need) a header guard, i.e. can be included multiple times in a source file, and that whether NDEBUG is defined or not is evaluated anew at every inclusion of <assert.h>. You can thus enable / disable assertions at a very fine-grained level if necessary:

#include <assert.h>
    /* assert() at this point only fails-on-false if NDEBUG is not defined */
    assert( isChecksumCorrect() );
#ifdef NDEBUG
/* Hard-enabling of assert() even if NDEBUG is defined */
#undef NDEBUG
#include <assert.h>
    /* assert() in this block of code should fail-on-false even in production */
    assert( isChecksumCorrect() );
/* Restoring NDEBUG if it was enabled originally */
#define NDEBUG
#include <assert.h>

See also


External Links

Personal tools