Task State Segment
The Task State Segment (TSS) is a special data structure for x86 processors which holds information about a task. The TSS is primarily suited for hardware multitasking, where each individual process has its own TSS. In Software multitasking, one or two TSS's are also generally used, as they allow for entering Ring 0 code after an interrupt.
The x86_64 structure is slightly different. The RSP0, RSP1 and RSP2 fields remain. RSPx is loaded in whenever an interrupt causes the CPU to change PL to x. The TSS in long mode also holds the Interrupt Stack Table, which is a table of 7 known good stack pointers that can be used for handling interrupts. You can set an interrupt vector to use an IST entry in the Interrupt Descriptor Table by giving it a number from 0 - 7. If 0 is selected, then the IST mechanism is not used. If any other number is selected then when that interrupt vector is called the CPU will load RSP from the corresponding IST entry. This is useful for handling things like double faults, since you don't have to worry about switching stacks; the CPU will do it for you.
You can also use the IST for NMIs, but if you do so then be aware that you are opening yourself up to the possibility of a race condition when NMIs nest. Imagine a scenario in which a NMI handler is called, then another exception occurs, which in turn leads to a second NMI. The second NMI will have loaded the exact same stack pointer as the first, and it will then proceed to overwrite it, so upon return to the first NMI (if you get that far) your whole stack will be corrupted. If you support the SYSCALL instruction, you are in a bad spot. Imagine a scenario in which the syscall instruction is executed from ring 3. The CPU is then switched to ring 0. Then if an NMI occurs before your syscall handler has had a chance to switch to a kernel stack, your NMI handler will now be executing on a user controlled stack. To fix this you have little choice but to use the IST, which leads to the race condition mentioned above. However, this race condition is not a problem if you just panic in the NMI handler anyway, since the second NMI will never return.
Note that in any case, when an interrupt occurs, SS is forced to NULL and the SS selector's RPL field is set to the new CPL.
TSS in software multitasking
For each CPU which executes processes possibly wanting to do system calls via interrupts, one TSS is required. The only interesting fields are SS0 and ESP0. Whenever a system call occurs, the CPU gets the SS0 and ESP0-value in its TSS and assigns the stack-pointer to it. So one or more kernel-stacks need to be set up for processes doing system calls. Be aware that a thread's/process' time-slice may end during a system call, passing control to another thread/process which may as well perform a system call, ending up in the same stack. Solutions are to create a private kernel-stack for each thread/process and re-assign esp0 at any task-switch or to disable scheduling during a system-call.
Setting up a TSS is straight-forward. An entry in the Global Descriptor Table is needed (see also the GDT Tutorial), specifying the TSS' address as "base", TSS' size as "limit", 0x89 (Present|Executable|Accessed) as "access byte" and 0x40 (Size-bit) as "flags". In the TSS itself, the members "SS0", "ESP0" and "IOPB offset" are to be set:
- SS0 gets the kernel datasegment descriptor (e.g. 0x10 if the third entry in your GDT describes your kernel's data)
- ESP0 gets the value the stack-pointer shall get at a system call
- IOPB may get the value sizeof(TSS) (which is 104) if you don't plan to use this io-bitmap further (according to mystran in http://forum.osdev.org/viewtopic.php?t=13678)
The actual loading of the TSS must take place in protected mode and after the GDT has been loaded. The loading is simple as:
mov ax, 0x?? ;The descriptor of the TSS in the GDT (e.g. 0x28 if the sixths entry in your GDT describes your TSS) ltr ax ;The actual load
- Task State Segment on Wikipedia