User:Neon/Recursive Paging
Background
Let us say that you have set up a simple page table for your kernel. You may even have it identity mapped for now. No biggie, you enable paging and everything appears to be working perfectly. You make a few changes and improve the paging code over time, but then the time comes when you need to create a new page table. Easy right? After all, you already did it right? Just allocate a new page frame, map it into the virtual address space, fill it with data, and now you have to update a page directory entry so it references the page table page frame number. But...lets say we have some virtual address and we need to figure out what needs to be mapped in and created. The virtual address itself is a bit field: it tells us the page directory entry, page table entry, and page offset. So we already know precisely what entries need to be worked with. And no matter what, we always have a virtual address to the root directory already -- all Process Control Blocks should have it. So we use the virtual address to reference the page directory entries. We find the address we are trying to map needs PDE[200]. Lets say we already created this page table...somewhere. PDE[200] stores the page frame number of the page table we need. Except...paging is enabled, we need a virtual address to the page table. Ah shoot, so every page table we create we need to somehow keep track of where all of them are in the virtual address space. And the number of page tables is not static: kernel space page tables are typically shared, however... user mode page tables are not. So any time you create a new user process, you'll end up needing more page tables to map user space. So whatever structure the kernel uses to keep track of the virtual addresses of page tables, it needs to be dynamic. So, how do we keep track of all virtual addresses of the page tables? Keeping in mind paging is enabled, how can we map this dynamic structure when there is no dynamic structure already in place to use? That is, what can we use to manage the virtual addresses of the page tables we create...before being able to have the way to manage it? Its the constant chicken and egg scenario that likes to pop up all the time in memory management. Isn't it fun?
In 64 bit long mode, the virtual address space is so large that you can easily map all of physical memory into kernel space. In 32 bit protected mode however, we introduce a way to get around this chicken and egg scenario: a way so that you can always access, update, and edit paging structures as needed. It is the same method used by mainstream operating systems such as Linux and Windows when targeting 32 bit versions -- recursive paging.
Before getting into details, we need to cover some very important background material related to paging. Recursive paging is not hard to implement, however it can be hard to follow what is happening and why without a strong background. We recommend skimming these sections at least even if you feel strong with the topics.
How page mapping works
Typically we are introduced to paging as a "Page Directory" and "Page Table". However, it does not matter what you would like to call them: in general operating systems, there is only Page Tables. We give different names because they are on different "Layers" of page tables. To keep things simple, let us just look at the lowest layer page tables.
A Page Table consists of 1024 Page Table Entries (PTE)'s. We will focus on 2 fields of the PTE: the Active bit and the Page Frame Number (PFN). When the Active bit is set, the PTE maps to a PFN. Typically you would have a Physical Memory Manager and grab a free PFN to use it. In order to understand how paging works, we need to know what a PTE represents.
A PTE represents...a page. I know, I know...not very surprising. But it doesnt just represent a page...it represents a specific page. Assuming 1 Page Table, PTE[0] is the first page, PTE[1] is the second page ... PTE[1023] is the 1024'th page. If a page is 4096 bytes, this would mean the whole page table represents a 4MB region of the virtual address space. Let us go up one layer... Lets say we have 1024 page directory entries. If each page directory entry has the PFN of a page table, then PDE[0] is the first page table, PDE[1] is the next page table ... PDE[1023] is the 1024'th page table. Or, in other words, PDE[0] represents the first 4MB of the address space, PDE[1] is the next 4MB region of the address space ... PDE[1023] is the last 4MB region of the address space. If a page table represents 4MB of the virtual address space, then the whole page directory represents 1024*4MB = 4GB of the virtual address space. Another way of looking at it is that, we can pick any PTE[K]. This represents the 4MB region of the address space starting at (# of PTE's in a page table * size of a page)*K = (1024*4096)*K.
No matter the virtual address, the CPU always goes through the same translation process: look it up to find the PDE and use its PDE.PFN to find the PTE and use its PTE.PFN to find the page.
Introduction
So, how do we implement recursive paging?
PDE [1023].PageFrameNumber = PageDirectory
That is it. No really, it is that simple. Okay, sure, you have to allocate a PFN for the page directory and be able to access it. As such, this is typically done during early initialization when paging structures are being first created. You'll also of course need to still have a separate page table for mapping the current executing code so when you do enable paging it will work. So, typically you do this when enabling paging for the first time so the issue of updating paging structures doesn't exist yet. The actual recursive paging trick though is just that one line.
Accessing the page directory
So...let's see if we can wrap our head around what that line actually does. To do that, we'll be the CPU for the moment. Since the focus here is for updating and modifying page tables, we'll use addresses for them...for now. The magic of this line is how it effects the address space.
So we need to translate an address...let's say 0xfffff000. Note these address are not arbitrary: we'll explain why we are using them later. Okay, so us being the CPU, we need to find the page directory. This part is easy as we already have it in the Page Directory Base Register (CR3). This has the PFN for the page directory. So we go out and see the Page Directory Entries. 0xfffff000 is PDE[1023]. Okay, no biggie, nothing special so far. We get the PFN from PDE[1023] to find the page table.
This is where things get interesting. We set PDE[1023].PFN to be the page directory itself. But now during the translation process, we don't see it as a page directory: we just see it as another page table. In other words, we are looking at the page directory as a page table.
Let us continue... 0xfffff000 is PTE[1023]. So we go out and look at PTE[1023].PFN to get the Page it is mapped to.
Which is, again, the page directory. Except this time, we are just looking at it as a raw page.
So what does this mean? When we recursively set PDE[1023] it automagically maps 0xfffff000 to the page directory itself. 0xfffff000 maps to a page containing the contents of the page directory. We can edit, update, or modify the page directory by just accessing its PDE's at 0xfffff000.
But...why 0xfffff000?
Let's first consider that we picked PDE[1023]. This represents the last 4MB region of the virtual address space. That is, PDE[1023] only ever maps 0xffc00000 - 0xffffffff. But then...we looked at the page directory as a page table and looked at PDE[1023] again...but this time as PTE[1023]. Recall that PTE[1023] represents the last 4K region of the address space. So we are looking at the last 4K of the last 4MB of the address space... which is 0xfffff000. The page directory is at this address because of that 1023.
Accessing page tables
Let's first reiterate a few important things: we are only considering the last 4MB of the address space from 0xffc00000 - 0xffffffff. This is because we recursively mapped PDE[1023] which represents this region of the address space. We walked through an example of translating an address to find the page directory base address above. Recall that we ended up looking at the Page Directory itself as a Page Table while translating the address. So when we looked at the Page Directory as a Page Table, each PTE referred to some 4K page of the address space. But...keep in mind that when translating an address that uses PDE[1023] we would be looking at just the Page Directory itself as a page table. And each PTE references a 4K region of the address space. But...the PTE's here are PDE's as we are just looking at the Page Directory still... so those PTE references a 4K region of the address space...a page...containing the contents of a Page Table.
0xfffff000 is PDE[1023] which is the last page table. We can work backward to find all of the page table virtual addresses: 0xffffe000 is PDE[1022] which is second to last page table. 0xffffd000 is PDE[1021] ... 0xfff00000 is PDE[768] this is typically the kernel page table for higher half kernels at 3GB ... 0xffc00000 is PDE[0]
Recursive mapping on other entries
All virtual addresses is based on which PDE you decide to recursively map. We mapped the 1023 entry as it is what we use in our operating system -- although it is also what Linux uses. A problem with recursive paging is that it allocates a range of virtual addresses for the page tables and page directory which of course effects the address space layout of your operating system design. So sometimes it may make sense to select a different entry to recursively map. For a real world example of using a different entry we can look at Windows.
Consider that each PDE represents a 4MB region of the address space. Select a 4MB region that you would like to reserve for page tables and that is the entry for you to recursively map. I.e. If you recursively map PDE[768].PageFrameNumber = Page Directory you would have effectively mapped 0xc0000000 - 0xC0400000 for paging structures - this is method used in Windows.
The method works the same, PTE[768] refers to the page directory which would be located at 0xc0000000+(768*4096) = 0xC0300000.
Things to think about
No matter what, 4MB of the address space gets reserved for paging structures since each PDE represents 4MB and you need to recursively map an entry. The benefit however is that this greatly simplifies paging code as all paging structures will be at specific locations in the address space. However...this means if you need to update the paging structures of some process K you must first switch to the process K address space before you can update the paging structures.
So how does this solve that pesky chicken and egg scenario? We no longer need to manage or think about where paging structures are in the virtual address space: we always know where they are and can easily access them to perform updates.