ELF

From OSDev Wiki
Jump to navigation Jump to search
Executable Formats
Microsoft

16 bit:
COM
MZ
NE
Mixed (16/32 bit):
LE
32/64 bit:
PE
COFF

*nix
Apple

ELF (Executable and Linkable Format) was designed by Unix System Laboratories while working with Sun Microsystems on SVR4 (UNIX System V Release 4.0). Consequently, ELF first appeared in Solaris 2.0 (aka SunOS 5.0), which is based on SVR4. The format is specified in the System V ABI.

A very versatile file format, it was later picked up by many other operating systems for use as both executable files and as shared library files. It does distinguish between TEXT, DATA and BSS.

Today, ELF is considered the standard format on Unix-alike systems. While it has some drawbacks (e.g., using up one of the scarce general purpose registers of the IA-32 when using position-independent code), it is well supported and documented.

File Structure

ELF is a format for storing many program types (see ELF Header table) on the disk, created as a result of compiling and linking. An ELF file might indepedenently contain sections or segments. For an executable program, an ELF header and a segment are the bare minimum, while sections are optional, though it's common for an executable to have a ".text" section for the code and ".data" section for initialized data. Libraries don't have segments, but only sections because they are used for linking purposes. Sections and segments are described by their respective headers that contain information about their sizes, required alignment, etc.

Note that depending on whether your file is a linkable or an executable file, the headers in the ELF file won't be the same: process.o, result of gcc -c process.c $SOME_FLAGS

C32/kernel/bin/.process.o
architecture: i386, flags 0x00000011:
HAS_RELOC, HAS_SYMS
start address 0x00000000

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00000333  00000000  00000000  00000040  2**4
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  1 .data         00000050  00000000  00000000  00000380  2**5
                  CONTENTS, ALLOC, LOAD, DATA
  2 .bss          00000000  00000000  00000000  000003d0  2**2
                  ALLOC
  3 .note         00000014  00000000  00000000  000003d0  2**0
                  CONTENTS, READONLY
  4 .stab         000020e8  00000000  00000000  000003e4  2**2
                  CONTENTS, RELOC, READONLY, DEBUGGING
  5 .stabstr      00008f17  00000000  00000000  000024cc  2**0
                  CONTENTS, READONLY, DEBUGGING
  6 .rodata       000001e4  00000000  00000000  0000b400  2**5
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  7 .comment      00000023  00000000  00000000  0000b5e4  2**0
                  CONTENTS, READONLY

The 'flags' will tell you what's actually available in the ELF file. Here, we have symbol tables and relocation: all that we need to link the file against another, but virtually no information about how to load the file in memory (even if that could be guessed). We don't have the program entry point, for instance, and we have a sections table rather than a program header.

.text where code live, as said above. objdump -drS .process.o will show you that
.data where global tables, variables, etc. live. objdump -s -j .data .process.o will hexdump it.
.bss don't look for bits of .bss in your file: there's none. That's where your uninitialized arrays and variable are, and the loader 'knows' they should be filled with zeroes ... there's no point storing more zeroes on your disk than there already are, is it?
.rodata that's where your strings go, usually the things you forgot when linking and that cause your kernel not to work. objdump -s -j .rodata .process.o will hexdump it. Note that depending on the compiler, you may have more sections like this.
.comment & .note just comments put there by the compiler/linker toolchain
.stab & .stabstr debugging symbols & similar information.

/bin/bash, a real executable file

/bin/bash:     file format elf32-i386
/bin/bash
architecture: i386, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x08056c40

Program Header:
    PHDR off    0x00000034 vaddr 0x08048034 paddr 0x08048034 align 2**2
         filesz 0x000000e0 memsz 0x000000e0 flags r-x

The program header itself... taking 224 bytes, and starting at offset 0x34 in the file

  INTERP off    0x00000114 vaddr 0x08048114 paddr 0x08048114 align 2**0
         filesz 0x00000013 memsz 0x00000013 flags r--

The program that should be used to 'execute' the binary. Here, it reads as '/lib/ld-linux.so.2', which means some dynamic libraries linking will be required before we run the program.

    LOAD off    0x00000000 vaddr 0x08048000 paddr 0x08048000 align 2**12
         filesz 0x0007411c memsz 0x0007411c flags r-x

Now we're requested to read 7411c bytes, starting at file's start (?) and being 7411c bytes large (that's virtually the whole file!), which will be read-only but executable. They'll be to appear starting at virtual address 0x08048000 for the program to work properly.

    LOAD off    0x00074120 vaddr 0x080bd120 paddr 0x080bd120 align 2**12
         filesz 0x000022ac memsz 0x000082d0 flags rw-

More bits to load, (likely to be .data section). Notice that the 'filesize' and 'memsize' differ, which means the .bss section will actually be allocated through this statement, but left as zeroes while 'real' data only occupy first 0x22ac bytes starting at virtual address 0x80bd120.

 DYNAMIC off    0x00075f4c vaddr 0x080bef4c paddr 0x080bef4c align 2**2
         filesz 0x000000e8 memsz 0x000000e8 flags rw-

The dynamic sections are used to store information used in the dynamic linking process, such as required libraries and relocation entries.

    NOTE off    0x00000128 vaddr 0x08048128 paddr 0x08048128 align 2**2
         filesz 0x00000020 memsz 0x00000020 flags r--

NOTE sections contain information left by either the programmer or the linker, for most programs linked using the GNU 'ld' linker it just says 'GNU'

EH_FRAME off    0x000740f0 vaddr 0x080bc0f0 paddr 0x080bc0f0 align 2**2
         filesz 0x0000002c memsz 0x0000002c flags r--

That's for Exception Handler information, in case we should link against some C++ binaries at execution (Needs citing).

/bin/bash, loaded (as in /proc/xxxx/maps)

08048000-080bd000 r-xp 00000000 03:06 30574      /bin/bash
080bd000-080c0000 rw-p 00074000 03:06 30574      /bin/bash
080c0000-08103000 rwxp 00000000 00:00 0
40000000-40014000 r-xp 00000000 03:06 27304      /lib/ld-2.3.2.so
40014000-40015000 rw-p 00013000 03:06 27304      /lib/ld-2.3.2.so

We can recognize our 'code bits' and 'data bits', by stating that the second one should be loaded at 0x080bd*120* and that it starts in file at 0x00074*120*, we actually preserved page-to-disk blocks mapping (e.g. if page 0x80bc000 is missing, just fetch file blocks from 0x75000). That means, however, that a part of the code is mapped twice, but with different permissions. I suggest you do give them different physical pages too if you don't want to end up with modifiable code.

Loading ELF Binaries

Executable image and elf binary can being mapped onto each other

The ELF header contains all of the relevant information required to load an ELF executable. The format of this header is described in the ELF Specification. The most relevant sections for this purpose are 1.1 to 1.4 and 2.1 to 2.7. Instructions on loading an executable are contained within section 2.7.

The following is a rough outline of the steps that an ELF executable loader must perform:

  • Verify that the file starts with the ELF magic number (4 bytes) as described in figure 1-4 (and subsequent table) on page 11 in the ELF specification.
  • Read the ELF Header. The ELF header is always located at the very beginning of an ELF file. The ELF header contains information about how the rest of the file is laid out. An executable loader is only concerned with the program headers.
  • Read the ELF executable's program headers. These specify where in the file the program segments are located, and where they need to be loaded into memory.
  • Parse the program headers to determine the number of program segments that must be loaded. Each program header has an associated type, as described in Figure 2-2 of the ELF specification. Only headers with a type of PT_LOAD describe a loadable segment.
  • Load each of the loadable segments. This is performed as follows:
    • Allocate virtual memory for each segment, at the address specified by the p_vaddr member in the program header. The size of the segment in memory is specified by the p_memsz member.
    • Copy the segment data from the file offset specified by the p_offset member to the virtual memory address specified by the p_vaddr member. The size of the segment in the file is contained in the p_filesz member. This can be zero.
    • The p_memsz member specifies the size the segment occupies in memory. This can be zero. If the p_filesz and p_memsz members differ, this indicates that the segment is padded with zeros. All bytes in memory between the ending offset of the file size, and the segment's virtual memory size are to be cleared with zeros.
  • Read the executable's entry point from the ELF header.
  • Jump to the executable's entry point in the newly loaded memory.

Relocation

Relocation becomes handy when you need to load, for example, modules or drivers. It's possible to use the "-r" option to ld to permit you to have multiple object files linked into one big one, which means easier coding and faster testing.

The basic outline of things you need to do for relocation:

  1. Check the object file header (it has to be ELF, not PE, for example)
  2. Get a load address (eg. all drivers start at 0xA0000000, need some method of keeping track of driver locations)
  3. Allocate enough space for all program sections (ST_PROGBITS)
  4. Copy from the image in RAM to the allocated space
  5. Go through all sections resolving external references against the kernel symbol table
  6. If all succeeded, you can use the "e_entry" field of the header as the offset from the load address to call the entry point (if one was specified), or do a symbol lookup, or just return a success error code.

Once you can relocate ELF objects you'll be able to have drivers loaded when needed instead of at startup - which is always a Good Thing (tm).

Tables

ELF Header

The ELF header is always found at the start of the file.

Position (32 bit) Position (64 bit) Value
0-3 0-3 Magic number - 0x7F, then 'ELF' in ASCII
4 4 1 = 32 bit, 2 = 64 bit
5 5 1 = little endian, 2 = big endian
6 6 ELF header version
7 7 OS ABI - usually 0 for System V
8-15 8-15 Unused/padding
16-17 16-17 Type (1 = relocatable, 2 = executable, 3 = shared, 4 = core)
18-19 18-19 Instruction set - see table below
20-23 20-23 ELF Version (currently 1)
24-27 24-31 Program entry offset
28-31 32-39 Program header table offset
32-35 40-47 Section header table offset
36-39 48-51 Flags - architecture dependent; see note below
40-41 52-53 ELF Header size
42-43 54-55 Size of an entry in the program header table
44-45 56-57 Number of entries in the program header table
46-47 58-59 Size of an entry in the section header table
48-49 60-61 Number of entries in the section header table
50-51 62-63 Section index to the section header string table

The flags entry can probably be ignored for x86 ELFs, as no flags are actually defined.

Instruction Set Architectures:

Architecture Value
No Specific 0x00
Sparc 0x02
x86 0x03
MIPS 0x08
PowerPC 0x14
ARM 0x28
SuperH 0x2A
IA-64 0x32
x86-64 0x3E
AArch64 0xB7
RISC-V 0xF3

The most common architectures are in bold.

Program header

This is an array of N (given in the main header) entries in the following format. Make sure to use the correct version depending on whether the file is 32 bit or 64 bit as the tables are quite different.

32 bit version:

Position Value
0-3 Type of segment (see below)
4-7 The offset in the file that the data for this segment can be found (p_offset)
8-11 Where you should start to put this segment in virtual memory (p_vaddr)
12-15 Reserved for segment's physical address (p_paddr)
16-19 Size of the segment in the file (p_filesz)
20-23 Size of the segment in memory (p_memsz, at least as big as p_filesz)
24-27 Flags (see below)
28-31 The required alignment for this section (usually a power of 2)

64 bit version:

Position Value
0-3 Type of segment (see below)
4-7 Flags (see below)
8-15 The offset in the file that the data for this segment can be found (p_offset)
16-23 Where you should start to put this segment in virtual memory (p_vaddr)
24-31 Reserved for segment's physical address (p_paddr)
32-39 Size of the segment in the file (p_filesz)
40-47 Size of the segment in memory (p_memsz, at least as big as p_filesz)
48-55 The required alignment for this section (usually a power of 2)

Segment types: 0 = null - ignore the entry; 1 = load - clear p_memsz bytes at p_vaddr to 0, then copy p_filesz bytes from p_offset to p_vaddr; 2 = dynamic - requires dynamic linking; 3 = interp - contains a file path to an executable to use as an interpreter for the following segment; 4 = note section. There are more values, but mostly contain architecture/environment specific information, which is probably not required for the majority of ELF files.

Flags: 1 = executable, 2 = writable, 4 = readable.

Dynamic Linking

Main article: Dynamic Linker

Dynamic Linking is when the OS gives a program shared libraries if it needs them. Meaning, the libraries are found in the system and then "bind" to the program that needs them while the program is running, versus static linking, which links the libraries before the program is run. The main advantages are that programs take up less memory, and are smaller in file size. The main disadvantage, however, is that the program becomes less portable because the program depends on many different shared libraries.

In order to implement this, you need to have proper scheduling in place, a library, and a program to use that library. You can create a library with GCC:

myos-gcc -c -fPIC -o oneobject.o oneobject.c
myos-gcc -c -fPIC -o anotherobject.o anotherobject.c
myos-gcc -shared -fPIC -Wl,-soname,nameofmylib oneobject.o anotherobject.o -o mylib.so

This library should be treated as a file, which is loaded when the OS detects its attempted usage. You will need to implement this "Dynamic Linker" into a certain classification of code such as in your memory management or your task management section. When the ELF program is run, the system should attach the shared object data to a malloc() region of memory, where the function calls to the libraries redirect to that malloc() region of memory. Once the program is finished, the region can be given up back to the OS with a call to free().

That should be a good starting point to writing a dynamic linker.

See Also

Articles

External Links