Why function implementations shouldn't be put In header files

From OSDev Wiki
Jump to: navigation, search

This document is a side-bar to a thread in the General Programming forum, and explains some basic ideas of the history and use of include headers in languages which use them. It also covers material relating to linkers and loaders, specifically the historical origin of them and their relationship to compilers and assemblers. The original text is from a reply post by Schol-R-LEA on the DevShed C++ forum. The wording is lighthearted and somewhat sarcastic, but it should be clear enough, and the subjects are important Required Information for anyone using C or C++, for OS dev or anything else.

A big part of programming, especially in an object-oriented language like C++, is being able to use the same functions or objects in many different programs, so that you don't have to repeat the same code over and over again. While it is possible (and pretty easy in modern systems) to simply copy the code by hand all over the place, it makes all of your program source files very large and repetitious, and tends to be error prone. Also, if you have to fix something that is copied to hell and gone, you need to fix it everywhere you have it - which is really error prone.

One solution is to automate copying the source. This is basically what #include does: it sticks a copy of the included file into the source code at the point where it is included, verbatim. Whatever was in the included file will show up in the program where the #include directive was. In fact, most of the job of the preprocessor is the replace one piece of text with another one, then hand the result on to the compiler proper.

This is a bit better, but it still has problems. First off, if you include all of the source code, you end up having to compile all of the code in those source files you've included - again and again and again. This is a slow process even today; in the days of yesteryear, it would have made compiling "Hello, World!" an all day event. And since you would be including a lot of code that you don't actually end up using, the program size grows like a cancer in your disk space.

What you really want is a way to copy the program functions, objects, etc. without having to recompile them every time, and pick out only the parts you need. Enter the linkage editor, or linker as it's usually called. This can take different pieces of compiled programs ('object code') and tie them all together into a nice like executable package, adding only the parts you need. If you want to get really fancy, it can even wait until you actually run the program to link the really common parts, meaning that if you have five programs using a shared function, you only need one copy in memory (though you only want to do this with very common things, since indiscriminate dynamic linking can slow programs down a lot). Now you can have a library, a file composed of pre-compiled object code that any program can use. A lot of the things which you probably think of as being part of C++ (e.g., the string and iostream classes, and in fact almost all of the the common functions programmers use regularly) are actually part of a standard library, not the core language.

(HystericalHistorical note: Linkers actually came before preprocessors, and in fact before compilers or even assemblers (the same is true of interpreters, curiously enough, though they were special-purpose ones for handling things like floating-point math and 'interpreted' a sort of bytecode). The term 'compiler' originally referred to what we now call a linker, because it compiled a list of routines to add to an executable binary image - this was back in the days when men were men and programs were written with toggle switches - because otherwise you'd spend a lifetime punching in a simple addition routine.)

Of course, there are still problems with this, the biggest one being that your source file doesn't know what is defined in the libraries and the other parts of your program until it's linked, meaning that it would have no way of knowing if you really meant to call pow(2, "foo") until the linker goes bonkers on you trying to find a function named 'pow' that takes an integer and a character string as an argument - rather than that other one which takes two doubles. So we need to declare all of the functions etc. before we can use them, in every file that uses the functions... what a hassle...

But wait, we've still got #include! We can create a file of nothing but declarations - a header file - which solves the problem of having to add all of those tedious prototypes without having to repeat ourselves any more than we have to.

There are still some problems, mind you. If you try to get clever and stick some actual program code in a header file, and that header file is included in more than one place in a program, the linker will have a fit trying to figure out which function 'quux(int, int)' the the real McCoy, even if the source code for them is the exact same thing. This means that header files have to be declarations and nothing but (there's a few weird exceptions involving things like macros - which are actually inserted into the source code directly by the preprocessor - and templates - which are a safer and less annoying type of macro, though C++ programmers tend to get sniffy about it when freaky Lisp hackers like me point that out - but that's getting ahead of ourselves). You can even get in trouble if you have the same header included in two places in the same source file, as you'd be declaring the same thing twice, which has led to some funny tricks involving the conditional compilation directives in (yet again) the pre-processor to make sure it can't happen.

There's also the matter of telling the linker itself where to get the different pieces. Most linkers have options for this, but if you have to write out the whole list of files to link together in a large project every time you try to compile it, your fingers will wear away to nubs, and your bound to screw it up about a quarter of the time anyway. So this, too, get automated away: makefiles (or project files in a lot of newer systems) are scripts telling the linker where to go and what to do when it gets there. They can also do a lot of the other scut work, like checking to see if a file has been changed (so that it doesn't compile it again) and cleaning up all of the temporary files after it's done.

But all of this amounts to nothing if a) the program is small enough to fit into a single file without making your eyes glaze over reading it, and b) you aren't going to be re-using the code anywhere else. In that case, you may as well just stick it all in one big file and be done with it - unless, of course, the whole point of the exercise is to show you about all the stuff I just rattled off to you.

For further information, consult your pineal gland. Or don't, see if I care fnord.

Personal tools