A Linkage Editor or linker is a programming tool which combines one or more partial Object Files and libraries into a (more) complete executable object file. It often has to modify the executable code from these files, most often to resolve the names of external symbols referenced in one file into addresses (or address markers) matching the code in other files. In many cases, the linker is also able to resolve the actual addresses of most or even all of the labels within the executable code, though in many instances the final resolution is left to the Execution Loader.
Historical Note: the original name for a linker was compiler, because the original linkers were intended to compile a list of punch cards which the card reader would select before running a program. Modern compilers got their name because they grew out of this function of connecting disparate subroutines into a working program.
It's typical good-practice to split a large project in several code units that can be compiled/assembled independently. Each such group of source files will be translated into a single object file (typically named yourstuff.obj or yourstuff.o). Object files are thus intermediate binaries that will be used to create your executable and they contain:
- raw datas (binary numbers and ascii strings, for instance)
- machine code
- instructions on how raw datas and machine code are stored in the file
- instructions on what items are still missing from that object file, what it offers (symbol table) and how to fix the raw data/code if a missing item is found (relocations).
Other debugging info such as mapping between code offsets and filename/lineno, or what tool has been used to produce the file and when it was produced may also be available.
Code and data are grouped in sections that share similar properties. For instance, all your code should not be modified and should be executable, so it is placed in the .text section that has r-x flags (read, don't write, exec). Arrays of numbers, on the other side, may be modified, but they certainly shouldn't be jumped to, so they'll rather go to the .data section with rw- flags.
At the stage of intermediate object files, you usually don't know where the section will end up in memory, so only references that remains in the same section (e.g. calling a local procedure) can be pre-compute. All the rest (e.g. indicating the address of a string in the print_hello() function, calling a function from another object file, etc.) is left to the linker.
The ultimate goal of a linker is to collect bits from all the sections of all the "input" object file to produce an executable file. That usually means that all the .text sections will be merged into a single .text output section, for instance. Moreover, the linker computes the definitive size and (usually) location of each sections according to some rules.
It will also walk the "missing items" list and check other object file's symbol table to make sure every dependency can be resolve. Be it a single printf symbol that cannot be found anywhere, the linker aborts here and throw you an error message. Everytime a missing symbol is found, the relocation list for that symbol in the "importing" object file is used to patch the binary (e.g. write the definitive address of the symbol everywhere you referenced it).
As a result, you now have an executable image that has no "dangling references", that has well-known address for every bit of data and every opcode, which you'll write into an executable file (.EXE or another ELF file, most likely).
In many linkers, the user is allowed to alter the default rules, to force a given section to end at a given address, or to add padding here and there. Some can even modify which "output section" will be used for every "input section" manually!