Object files basically consists of compiled and assembled code, data, and all the additional information necessary to make their content usable. In the process of building an operating system, you will encounter lots of object files. While for common development tasks you do not need to know their exact details, but when you want to create or use one with various specifics, the details can be very important.
Objects and executables
Whereas wikipedia considers executables to be a subset of object files, there are significant differences. In some systems, they are a completely different format (COFF vs PE), or they have different fields (ELF program/section headers). The key difference is that in executables the addresses have been resolved, while in object files they have not. This means that non-executable files do not contain working code.
A good part of the object file contains the code and its associated data. In source, code contains references to other functions and storage of data. In the object file such references are converted to instructions and relocation pairs, as the compiler can't tell in advance where the code will end up. As an example a function call on an x86 will look like this (in an object file):
14: e8 fc ff ff ff call 15 <sprintf+0x15> 15: R_386_PC32 vsnprintf
The disassembly contains the opcode for call (e8) plus the offset -4 (fc ff ff ff). If this were to be executed it would make a call to address 15, which looks like halfway through the instruction. The second line (the relocation entry) lists that the address at position 15 (the -4) should be fixed with a displacement to the address of vsnprintf. That means it should get the address of the called function minus the address of the relocation. However blankly entering the difference would not work, as the call address is relative to the next instruction, not the start of the offset bytes halfway into an opcode. This is where the -4 comes in: the result of the relocation is added to the field being padded. By subtracting 4 (adding -4) from the address, the displacement becomes relative to the end of the instruction, and the call ends up where it should go. In the executable file:
804a1d4: e8 07 00 00 00 call 804a1e0 <vsnprintf> 804a1d5: R_386_PC32 vsnprintf 804a1d9: c9 leave (...) 0804a1e0 <vsnprintf>:
The displacement needed for the call is the address of vsnprintf minus the address of the next instruction, i.e. 0x804a1e0 - 0x804a1d9 = 0x7, which is the value seen in the call bytes (07 00 00 00). This is equivalent to the address of the target minus the address of the relocation plus the value stored: 0x804a1e0 - 0x804a1d5 + -4 = 0x7.
When an executable is created, it will be set to use a specific address by default. This can be a problem when you need several object files in the same address space and they may overlap, or you want to perform address space randomization, you might find relocating an executable an option.
Since relocations are only needed to build an executable, but not when you run it, they normally aren't present in a linked file. Instead you need to specifically tell the linker to emit relocations when necessary. For the GCC Cross-Compiler, this can be done with the -q switch. Note that the -i and -r switches have a similar description, but cause the linker to yield an object file rather than an executable.
Relocating is of itself fairly straightforward by finding the differences. Start with loading the sections to the location of your choice, then for each relocation entry:
- compute the original address where the relocation was applied
- compute the address where the relocation applies now (its moved by the same amount you moved the original section from its original location)
- do the same for the destination of the relocation
- compute what the relocation value is - the destination for absolute relocations, and the destination minus the origin for relative relocations.
- compute what the relocation value was using the original location.
- subtract the old value from the new value
- add the result to the original relocation value in memory.
If the sections are moved relatively to each other, then relocating can become as simple as only adding the displacement to the absolute relocations. The relative locations do not get changed as both the source and the target are moved by the same amount.
- Passing -i or -r to ld. There's actually a tutorial that does this. It does not work except for some limited cases, as it generates a file where relocations have not been applied at all.
- Assuming code and data are continuous. A pitfall when trying to make a PE file multiboot-compatible. A section is generally page-aligned (4k), but a PE file is sector-aligned (512b). So if a section is not multiple of 4k in size, relative addresses to the data section will be off by a multiple of 512 bytes as the gap has been removed from the binary. Worse, it is perfectly valid to have metadata sections between the various loadable sections, which can put addresses off.
- Loading as a flat binary. All executables that aren't flat binaries have a header up front. Blatantly loading a file and starting at the start will execute the header instead of your code. Again, there is a tutorial that tries to get away with this.
- Assuming the entry point is at the start. The linker has a certain amount of freedom in what order it loads the object files, and so does the compiler. That means that main doesn't need to be at the start of the code section.