User:Neon

From OSDev Wiki
Jump to navigation Jump to search

Neptune SDK

The Neptune SDK (NSDK) consists of the Neptune Build Environment (NBE) and tool chain. This includes NASM, NCC, NLINK, NLIB, NDBG, NMAKE, and NBUILD. The entire system is to be built with NSDK.

This is an ongoing project.

Neptune Boot

The loader for Neptune is configurable, extendable; supporting BIOS and EFI firmware and runs in 32 bit protected mode with a purposed plan to provide a 64 bit build as well even if just for EFI. It implements the Multiboot 1 specification.

Memory Management

Memory management is done in two layers. This is to separate the firmware specific code from the core program.

Layer 1: Memory Driver Objects

There is a Memory Driver Object for EFI and BIOS. For EFI builds, we request free pages from the firmware. For BIOS builds, we reserve an area of space and use a bitmap allocator to manage free frames of memory.

Layer 2: Zone allocation

Sometimes it is necessary for the BIOS firmware to be able to access allocated memory. For this reason, we use a Zone Allocation scheme: We allocate from a Zone (such as Low Memory or Standard Memory). Standard Memory is > 1MB. A Free List is used to allocate and free blocks of memory.

Driver Objects and Device Objects

The Neptune Boot Library (NBL) defines the core framework for Driver Objects and Device Objects. There are some standard Driver Objects for file systems, keyboard mapping, and a few others; and firmware-specific Driver Objects such as for disk, memory, video, and others. Firmware Driver Objects are defined in nbl_bios and nbl_efi so they can be easily linked.

Driver Objects implement a set of standard functions. Driver objects operate and create Device Objects. NBOOT either calls a driver service directly through a function pointer or through the Device Object the driver is attached to. There is a special case for partitioning support: Partitions are implemented as Disk Driver Objects that create Disk Device Objects for each partition. When NBOOT reads from a device like "disk(1)part(2)" it calls the Disk Driver Object associated with the Device Object "disk(1)part(2)" which passes the command to an attached Device Object to handle for "disk(1)". This special case is why we support Device Objects having an attached secondary Device Object as well. Almost everything is done through Device Objects for abstraction: it allows NBOOT to support virtually any disk and file system type by just adding the supported Driver Object to the NBL.

BIOS Build: How do we call the BIOS?

The Driver Objects in nbl_bios need to drop the system to real mode to call the BIOS. We implemented a general I/O Services function -- io_services -- that is used by the BIOS Driver Objects. This is done by dropping the system to 16 bit protected mode then 16 bit real mode. We pass a Register Set to our I/O Services function that provides the Input to the BIOS call. io_services is modeled after the dos.h int86 function. Self modifying code is used to implement the general INT n instruction needed. In summary, the algorithm is the following:

  1. Save IDTR and ESP
  2. Load real mode IVTR into IDTR
  3. Enter 16 bit protected mode
  4. Enter 16 bit real mode
  5. Call BIOS
  6. Restore previous IDTR
  7. Enter 32 bit protected mode

BIOS Build: How do we boot?

In BIOS builds, we have the boot sector codes ("Stage 1"), NSTART ("Stage 1.5") and NBOOT ("Stage 2"). NSTART is only required for BIOS builds. NSTART.SYS is prepended to NBOOT.EXE to produce the file "NBOOT". When Stage 1 completes, NSTART locates and copies NBOOT to its expected load address, and executes it in 32 bit protected mode. NSTART operates primarily in 32 bit protected mode itself. It is expected that "Stage 1" will load the entire NBOOT image into memory so NSTART does not need to load anything.

Neptune Executive

Memory Management

A lot of careful design has been put into the mechanics of the memory management involved in the kernel. As of this writing, it is the largest part of the kernel. We wanted to make sure everything was covered and be extendable. We hope the information presented here can provide some ideas to others that are working on their own memory management designs and putting everything together.

  • Terms
    • PFN - Page Frame Number
    • VAD - Virtual Address Descriptor
    • LPB - Loader Parameter Block
    • PTE - Page Table Entry
      • Standard PTE - Active x86 PTE mapped to a PFN
      • System PTE - Used for page swapping / demand loading
      • Free PTE - Used for allocating kernel PTE's

VAD Tree: How do we allocate from Low Memory?

Low Memory is treated special in order to accommodate legacy hardware devices and firmware service requirements. These requirements can require specific physical address alignment criteria, addressing limitations (like must be <640k) and must be in contiguous pages. We use the same technique that is to be used with User Space with the use of Virtual Address Descriptors (VAD's) to manage regions of the address space.

Low Memory regions are mapped either through the System Pool (Kernel Space) or per-process VAD Tree (User Space).

PFN Database: How do we allocate from Standard Memory?

When the system starts up, either paging is not enabled (NBOOT 32 bit) or paging is enabled and all RAM is accessible (NBOOT 64 bit). In both cases, we have access to all of the physical address space. We manage physical frames by creating a Page Frame Number (PFN) free stack within the physical frames themselves. This is what we call the "PFN Database". At any time we can allocate and free PFN's using this Database.

A Header is stored in all free allocatable frames in physical memory: a global pointer stores the top of the stack. This is done early during the startup process. An "allocatable" frame is a block of memory that is currently not in use by the Kernel, Modules, or other structures. The Kernel may free some of this memory for use. The PFN Database is used to allocate from "Standard Memory" above 1MB physical.

The most important services are:

  • MmScanSystemMemoryMap
  • MmFreeFrame
  • MmAllocateFrame
  • MmGetFreeFrame

MmScanSystemMemoryMap scans the memory map and builds the PFN Database within the free frames. The memory map is obtained from the Loader Parameter Block (LPB) provided from NBOOT. MmGetFreeFrame returns the PFN on the top of the stack, which is the next available PFN. MmAllocFrame allocates the next available PFN. MmFreeFrame returns the PFN back to the free stack.

PFN Database: How do we allocate with paging?

When paging is enabled, we can only update physical frames through a virtual address. This requires careful interaction between the mapping (allocating a Page) and the PFN Database. The process is to get what would be the next PFN without actually allocating it yet. Then we map this PFN to some free Page. And then we call the PFN Database to allocate using this virtual address. That is, the process is, in this order:

  • Free PFN = MmGetFreeFrame
  • Map PFN to a virtual address. Do NOT write to this address yet
  • Call MmAllocateFrame (address)

In other words, we map a page to the next PFN so that we can access the physical frame and write to it using the mapped page. This setup is hidden behind MmAllocPage.

Kernel Pool: Where can we safely get free kernel pages from?

Alright, so we can allocate free PFN's to use. So where do we map it to? This is where the System Pool comes in. The basic idea is that we need to reserve an area of Kernel Space and manage what regions of the area are in use or free to map. Since we need a way to describe pages, we introduce a new PTE type for a "Free PTE". We select a Page Table to act as the System Pool, initially all PTE's are "Free PTE" type. Free PTE's have a special format to store a link between other Free PTE's effectively creating a PTE linked list. Like with the PFN Database, the implementation uses a single global to store the top of the PTE Free Stack.

When a Free PTE is selected to be allocated, it is turned into a Standard PTE mapped to a PFN. When a Standard PTE is to be free'd, it is turned into a Free PTE and reattached to the stack. There are 1024 Free PTE's creating a 4MB System Pool.

Free PTE's have the following format:

typedef struct _MMPTELIST {
	uint32_t valid: 2;
	uint32_t oneEntry : 1;
	uint32_t filler0 : 8;
	uint32_t nextEntry : 20;
	uint32_t prototype : 1;
}MMPTELIST;

Kernel Heap: How do we allocate and free kernel resources?

The Kernel Heap allocates free pages from the System Pool and maps them with free frames from the PFN Database. The Kernel Heap is a Cached based Slab allocator; this allows us to avoid unnecessary reallocations. All kernel objects have a dedicated Cache that which allocations are taken place on. There is a set of special Caches, in powers of 2, used for general allocations as well. There is also a Cache of Caches that Cache objects are allocated from. If resources run low, the Kernel calls the allocator to actually release Free'd resources which will free the page from the System Pool (if it is empty) and the frame from the PFN Database.

Recursive paging: How do we update paging structures?

A well known problem is that all paging structures use PFN's and the Page Directory Base Register (PDBR) uses the physical address of the root paging structure. However when paging is enabled, this becomes an issue: we need to be able to dynamically update, create, remove paging structures with paging enabled. What we need to know is what virtual address the paging structures are located at, and store that somewhere. Yet again we run into the problem of needing memory to manage memory.

A common solution to this is "recursive paging". Because this is a microkernel architecture, we want to reserve as much of the address space to User Space. So we go with the highest possible page table and recursively define it so the last page table has the PFN to itself. This reserves the last 4MB of the address space to the paging structures themselves: without needing to allocate memory. The recursive structure gives us the following addresses:

  • MM_PAGE_DIRECTORY_BASE 0xfffff000
  • MM_PAGE_DIRECTORY_END 0xffffffff
  • MM_PAGE_TABLE_BASE 0xffc00000
  • MM_PAGE_TABLE_END 0xffffffff
  • MM_KERNEL_PAGE_TABLE_BASE 0xfff00000
  • MM_KERNEL_PAGE_TABLE_END 0xfff00fff

In order to map something into the address space of a process K, the kernel must switch to the respective address space and then update or write to the paging structures starting at MM_PAGE_TABLE_BASE.

VAD Tree: How do we manage User Space?

User Space is a little complicated as we don't know what will be where: it can contain different segments, user space stack, shared libraries, copy-on-write segments, file mappings, and others. To know where things are, we use a Virtual Address Descriptor (VAD) tree. This is part of each Process Control Block (PCB) for managing the respective Process User Address Space.

How is page swapping / demand paging implemented?

When the valid bit of a PTE is set to 0, all other bits are ignored by the processor. This allows us to store whatever we want in the remaining bits. In fact, we already do this with the System Pool. We define what we call a "System PTE" that stores information about how to find the page in Swap Space. If a page fault occurs, we check if the PTE is a System PTE and, if it is, it will tell us how to locate the page so we can bring it back in. When the Kernel is told to load a Process, it will allocate the PTE's for the process as System PTE's only without loading any part of the process into memory. When it executes the process and a page fault occurs, the kernel will load the real page into memory on demand.

The System PTE type for page swapping is the following.

typedef struct _MMPTE_SYSTEM {
	uint32_t valid: 1;
	uint32_t pageFileLow: 4;
	uint32_t reserved: 7;
	uint32_t pageFileHi: 20;
}MMPTE_SYSTEM;

What about Copy on Write?

The Standard PTE has 3 bits next to the PFN field that are free for use by the operating system. One of the bits is set to mark for Copy on Write pages. Copy on Write pages are read only; on a write attempt a page fault will trigger. If the Copy on Write bit is set, the Kernel is free to allocate a new PFN for the page, copy the original page, and map the new PFN as a now Writable page.

Initializing: How do we start at 1MB to a higher half kernel?

Setting up paging is a bit complicated: the kernel is loaded at 1MB after all. Additionally, we are still using the stack setup by NBOOT in low memory. So setup needs to be done very carefully in order to gather what is needed to initialize core memory systems, identity map ourselves to two locations at once, enable paging, relocate the kernel to 3GB and continue executing from there, continue memory management setup, create stack space for the kernel, and finally remove Identity Space.

  1. MmScanMemoryMap
  2. MmBuidPfnDatabase
  3. MmBuildKernelSpace
  4. HalEnablePaging
  5. ExRelocateSelf
  6. Jump to KERNEL_VIRTUAL
  7. MmBuildSystemPTEPool
  8. MmInitializeGlobalCacheList
  9. MmBuildKernelStack
  10. MmRemoveIdentitySpace