Stack
- Stack can also refer to the TCP/IP stack in Networking. This article discuss the datastructure and stacks used in architectures.
A stack is a datastructure. You can push and pop elements to and from it respectively. Unlike a FIFO (first in, first out) however, the popped elements from a stack are the elements that you pushed last. Because of this, a stack is also termed LIFO (last in, first out) or FILO (first in, last out).
In the X86 architecture, and many others, there is one stack used for code execution. It is used to store return pointers when calling routines, but you can also store temporary data and local variables on it.
Stack theory
Many languages and architectures have a stack at their disposal. When return values are stored on it, the concept of stack frames emerges. A stack is divided into a number of stack frames. Each stack frame contains the local/temporary data for the routine, parameters, and a return value to the former routine (the caller).
Stack example on the X86 architecture
On the X86 architecture the stack grows downwards. The stack frames have a certain structure regarding the calling convention. The CDECL calling convention is the most widely used. It is most likely used by your compiler. Two registers are used:
- ESP: Extended Stack Pointer. 32 bit value containing the top-of-stack address (more accurately the ``bottom-of-stack`` on X86!)
- EBP: Extended Base Pointer. 32 bit value defining the current stack frame, when using CDECL calling convention. It points at the current local data. It can also access the routine parameters.
Take care when implementing your kernel. If you use segmentation, the DS segment should be configured to have its base at the same address as SS does. Otherwise you may run into problems when passing pointers to local variables into functions, because normal GPRs can't access the stack the way you might think.
Here is an example stack. The elements are 4 byte words in protected mode:
Memory address: Stack elements: +----------------------------+ 0x105000 | Parameter 1 for routine 1 | \ +----------------------------+ | 0x104FFC | First callers return addr. | > Stack frame 1 +----------------------------+ | 0x104FF8 | First callers EBP | / +----------------------------+ 0x104FF4 +->| Parameter 2 for routine 2 | \ <-- Routine 1's EBP | +----------------------------+ | 0x104FF0 | | Parameter 1 for routine 2 | | | +----------------------------+ | 0x104FEC | | Return address, routine 1 | | | +----------------------------+ | 0x104FE8 +--| EBP value for routine 1 | > Stack frame 2 +----------------------------+ | 0x104FE4 +->| Local data | | <-- Routine 2's EBP | +----------------------------+ | 0x104FE0 | | Local data | | | +----------------------------+ | 0x104FDC | | Local data | / | +----------------------------+ 0x104FD8 | | Parameter 1 for routine 3 | \ | +----------------------------+ | 0x104FD4 | | Return address, routine 2 | | | +----------------------------+ > Stack frame 3 0x104FD0 +--| EBP value for routine 2 | | +----------------------------+ | 0x104FCC +->| Local data | / <-- Routine 3's EBP | +----------------------------+ 0x104FC8 | | Return address, routine 3 | \ | +----------------------------+ | 0x104FC4 +--| EBP value for routine 3 | | +----------------------------+ > Stack frame 4 0x104FC0 | Local data | | <-- Current EBP +----------------------------+ | 0x104FBC | Local data | / +----------------------------+ 0x104FB8 | | <-- Current ESP \/\/\/\/\/\/\/\/\/\/\/\/\/\/
The CDECL calling convention is described here:
- Caller's responsibilities
- Push parameters in reverse order (last parameter pushed first)
- Perform the call
- Pop the parameters, use them, or simply increment ESP to remove them (stack clearing)
- The return value is stored in EAX
- Callee's responsibilities (callee is the routine being called)
- Store caller's EBP on the stack
- Save current ESP in EBP
- Code, storing local data on the stack
- For a fast exit load the old ESP from EBP, else pop local data elements
- Pop the old EBP and return – store return value in EAX
It looks like this in assembly (NASM):
SECTION .text caller: ; ... ; Caller responsibilities: PUSH 3 ; push the parameters in reverse order PUSH 2 CALL callee ; perform the call ADD ESP, 8 ; stack cleaning (remove the 2 words) ; ... Use the return value in EAX ... callee: ; Callee responsibilities: PUSH EBP ; store caller's EBP MOV EBP, ESP ; save current stack pointer in EBP ; ... Code, store return value in EAX ... ; Callee responsibilities: MOV ESP, EBP ; remove an unknown number of local data elements POP EBP ; restore caller's EBP RET ; return
The GCC compiler does all this automatically, but if you have to call C/C++ methods from assembly or reverse, you have to know the convention. Now look at one stack frame (the callee's):
+-------------------------+ | Parameter 3 | +-------------------------+ | Parameter 2 | +-------------------------+ | Parameter 1 | +-------------------------+ | Caller's return address | +-------------------------+ | Caller's EBP value | +-------------------------+ | Local variable | <-- Current EBP +-------------------------+ | Local variable | +-------------------------+ | Local variable | +-------------------------+ | Temporary data | +-------------------------+ | Temporary data | +-------------------------+ | | <-- Current ESP +-------------------------+
Using EBP the callee can access both parameters and local variables:
MOV EAX, [[EBP + 12]] ; Load parameter 1 into EAX MOV EAX, [[EBP + 16]] ; Load parameter 2 MOV EAX, [[EBP + 4 * EBX + 12]] ; Load parameter EBX (0-indexed) MOV EAX, [[EBP]] ; Load local variable 1 MOV EAX, [[EBP - 4]] ; Load local variable 2
There are other calling conventions for X86. To name a few: Pascal calling convention, fastcall convention, stdcall. Read more on Wikipedia, see the links below.
Set up the stack
When creating a kernel you have to manually set up the stack. It can be done by the bootloader, but it is good to know how anyway.
If you go from real mode to protected mode, you have to set up the stack as well. This is because the SS segment might change, so ESP in protected mode does not point to the same location that SP in real mode did. If you switch a lot between real mode and protected mode, it can be nice for them to share the stack. You have to find a smart solution for that yourself. It can be done.
In protected mode you set up the stack by just moving a new pointer value into the ESP register:
MOV ESP, 0x105000 ; Set the stack pointer
Remember that it grows downwards. You might allocate space for it in the kernel's .BSS section if it contains one:
SECTION .text set_up_stack: MOV ESP, stack_end ; Set the stack pointer SECTION .bss stack_begin: RESB 4096 ; Reserve 4 KiB stack space stack_end:
If your kernel is booted by a Multiboot-compliant bootloader, like GRUB, you are provided a memory map. You can set up the stack by looking for free memory chunks of the appropriate size. You just have to ensure that you don't overwrite any important data or code when setting the stack pointer.
Security
The stack is easy to use, but it has one problem. There is no "end," so it is vulnerable to a variation of the buffer overflow attack. The attacker pushes more elements than the stack is able to contain, so elements are pushed outside the stack memory, overwriting code, which the attacker can then execute.
In X86 protected mode it can be solved by allocating a GDT descriptor solely for the stack, which defines its boundaries.
Stack trace
When debugging, a stack trace is often shown and can be helpful. Stack Trace describes how this can be done and provides sample code for X86 CDECL using the stack layout above.
Unwinding the stack
Unwinding the stack is complex. It is done when using exceptions, like in C++. It is performed when an exception is thrown. The purpose of unwinding the stack is to call the destructor of local objects of the stack frames and to remove stack frames until an appropriate landing pad is found. The landing pad is the try..catch block in C++ or Java. The catch block has to match the exception, i.e. a RuntimeException object can't be caught as a String object.
The unwinding algorithm depends on the architecture. Normally this algorithm is provided in the language runtime library. When using GCC and C++ it is defined in the libsupc++ library linked with your application. However this doesn't happen when creating a kernel. The libsupc++ library is also too bloated to use in kernel space.
See Also
Articles
Stack Trace - Trace the called functions from the stack
Threads
External Links
- x86 calling conventions on Wikipedia.
- Stack (data structure) on Wikipedia.