Brendan's Memory Management Guide
Overview
Memory management is something that all OS developers need to deal with sooner or later. It is a very broad topic; and has a tendency to appear very easy at first, but grow into something far more complicated than originally anticipated. This leads to a likelihood of people making design decisions that seem perfectly fine (and work well) initially while the OS is small, where these design decisions become significant problems later on.
In the hope of reducing the chance of design decisions become significant problems later on, "memory management" should be split into 3 separate concepts:
- "Heap" (Dynamic Memory Management)
- Virtual Memory Management
- Physical Memory Management
Please note that this is only a guide, intended to help beginners become aware of various issues and become prepared to handle the complexity that is (eventually) involved. It is not intended to be something that must be followed to the letter, nor is it intended to cover all the possible ways that memory management might be done.
"Heap" (Dynamic Memory Management)
This is the highest level layer, and the layer that normal software uses.
In general different processes typically use memory managers designed for whichever language the process is written in (including simple "malloc/free" used by C programs, garbage collection systems in Java, etc). For this reason "heap" typically has nothing to do with the OS itself and is part of a language's run-time. However; in special cases (e.g. where maximum performance/efficiency is needed) a process may bypass the language's generic memory management and implement its own memory management that's custom designed for that application's data structures and usage patterns.
Also; processes tend to allocate and free memory very frequently; and it's a bad idea to call the kernel's API every time an allocation or deallocation is made. For this reason processes allocate a larger amount of virtual memory (from the kernel's virtual memory manager) and then carve it up into smaller pieces as needed.
Essentially; the purpose is to allow processes to use whatever method they like for managing their memory themselves; while avoiding the overhead of excessive use of the kernel's API.
A kernel might also have something for generic dynamic memory management (e.g. "kmalloc()" and "kfree()" that mimics the "malloc/free" that would've been in the C standard library); but might have custom designed allocators for managing specific data structures, or multiple different memory managers for different purposes. However, this is not very different to the memory management done by a process, and the kernel's "heap" would still just allocate larger areas using the virtual memory manager and then carve it up.
Things that tend to complicate this memory manager include:
- satisfying the caller's alignment requirements
- multiple "pools" (e.g. one pool per thread or per CPU, or different pools for different types of data) to improve cache locality/scalability/efficiency
- attributes that effect how memory is allocated (e.g. "allocate fast without caring about fragmentation because it will be freed soon anyway" vs. "take extra time to minimize fragmentation because this is memory will be in use for a long time")
- higher level interface to things like memory mapped files and "memory attributes"
- additional information for debugging and detecting common programmer errors (e.g. freeing the same memory twice)
Virtual Memory Management
The purpose of virtual memory management is to manage virtual address spaces; including creating and destroying virtual address spaces and mapping/unmapping "things" into them.
Initially a virtual memory manager can be extremely simple - e.g. when the "heap" for the process (or kernel) asks to make an area of its virtual address space become "usable RAM" the virtual memory manager could just find any pieces of the requested area that aren't already "usable RAM" and allocate physical memory (using the physical memory manager) and map it into the virtual address space; and when the "heap" for the process (or kernel) asks to make an area "not usable RAM" the virtual memory manager could just find any pieces that aren't already "not usable RAM", unmap it, and deallocate it (using the physical memory manager).
However; the virtual memory manager is responsible for a number of "tricks" intended to reduce RAM consumption, improve performance, or provide additional functionality. These tricks may include:
- pretending an area is "usable RAM" when no RAM is allocated initially, and RAM is only allocated if/when the area is actually used (to avoid wasting RAM when the area is allocated by "heap" but not used)
- pretending a file has been loaded from disk into an area, when the file initially wasn't (to improve performance, reduce file IO, and avoid wasting RAM for parts of the file that aren't accessed)
- pretending there's more virtual memory than there is physical memory by transfering data to/from swap space (and discarding pages from memory mapped files where the data can be obtained from the file if needed again) while trying to ensure the most likely to be needed data is in memory (to improve the amount of memory software can use)
- tricks to improve the efficiency of communication between processes (e.g. shared memory areas, moving entire page tables from one virtual address space to another, etc)
Please note that these "tricks" aren't necessarily just for processes alone; and the kernel itself may benefit from them. For example, I use the "pretend area is usable RAM when no RAM is allocated initially" trick a lot in my kernels.
The virtual memory manager is also responsible for maintaining security (ensuring software can't access things it shouldn't be allowed to access, ensuring data that one process freed can't be accessed next time it's allocated, etc).
Things that tend to complicate the virtual memory manager (in addition to the tricks and security issues) include:
- performance and scalability (e.g. lockless/blockless and "O(1)" algorithms)
- lazy TLB invalidation schemes (to reduce overhead on multi-CPU systems)
- support for multiple page sizes
- page/cache coloring and NUMA optimizations (to reduce cache miss costs)
- "compressed data in RAM" as first level swap space (so you can have 5 GiB of data in RAM when there's only 4 GiB of RAM)
- pre-fetching things from disk (memory mapped files and swap space) when RAM becomes free and disks aren't doing much (to reduce overhead later if/when the data is needed)
- tracking statistics (how much memory each process thinks it's using, how much each process is actually using, how many pages fetched from disk per second, etc). This can include how often each page of data is used (used to estimate "most likely to be needed" and decide which page/s to send to swap space)
- notifying other software when RAM is running out (so that things like file system caches, web browser HTML caches, etc. can free their caches) to avoid or reduce the use of swap space
- support for memory mapped IO areas (e.g. mapping video display memory into a virtual address space so a video driver can access it)
- supporting various cachability types (write-through, write-combining, etc) for both normal processes and device drivers (including using PAT to make up for a lack of MTRRs for memory mapped devices)
- higher level interface for special restrictions device drivers may have (e.g. needing buffers of "physically contiguous" memory and/or memory below a certain address)
There is one "beginner mistake" that is common enough to be worth mentioning specifically. The mistake is mapping the physical address space "as is" into kernel space. This prevents the kernel from benefiting from all of the tricks and optimisations that the virtual memory manager does. It can also waste memory for the additional page tables, etc. needed to map things that have no reason to be mapped into kernel space to begin with; and also typically results in "poor locality" (as the data the kernel does need to access is spread throughout data the kernel doesn't need to access). It also creates a massive security risk - any bug in the kernel (or CPU - e.g. Meltdown) may allow an attacker to access everything in RAM (and not just things that actually needed to be mapped in kernel space). Finally it fails when the physical address space is larger than kernel space and results in ugly work-arounds for those cases. Unfortunately, once upon a time a beginner (who didn't know any better at the time) made this mistake, and their kernel grew until it was too late to fix the problem because too many other pieces of their kernel depended on it, and their "still broken" kernel became a well known open source kernel that people look at to understand memory management.
Physical Memory Management
The physical memory manager is all about managing the physical address space (in the same way that virtual memory management is for managing virtual address spaces). Part of this is managing free physical RAM; and part of it is managing things that aren't RAM at all (e.g. memory mapped PCI device areas, MTRRs, etc).
By far the most frequent case (and the part that has to be optimized the most) is allocating and freeing individual physical pages of RAM where it makes no difference what the physical address is. I strongly recommend implementing an allocator specifically for this use case, that's built on the concept of fast/"O(1)" free page stack/s.
However, the physical memory manager also has to support special requests (typically from device drivers and nothing else) for physical memory that has specific requirements. These requests for physically contiguous areas, areas that are below a certain physical address, etc. The worst case is for buffers that will be used by legacy/ISA DMA controllers, where the buffer has to be below 0x01000000, must be physically contiguous and must not cross a 64 KiB boundary. Satisfying these requests requires a slower allocator. For this reason it's a good idea to split "usable physical RAM" into 2 or more zones, where some RAM (e.g. RAM below 0x01000000) uses one allocator designed to be able to handle the much less common special requests, and other zones use an allocator designed for much more frequent "allocate/free one page" use case.
Note that these allocators have no need to access the data within free pages, and only allocates/frees them. There is no reason for free pages of RAM to be mapped into any virtual address space.
For managing things that aren't RAM, you need a map of the physical address space that says which areas are used for what. For example, when you're trying to initialize a new PCI device (to configure its "BARs") you want to search for an area that isn't ROM, isn't used by another device and isn't RAM. This information includes "cacheability attributes" and is used to configure (and manage) the CPU's MTRRs. The physical address space map should also contain information for RAM - e.g. is it volatile or non-volatile, is it hot-pluggable, etc.
Things that tend to complicate the physical memory manager (beyond what was mentioned above) include:
- performance and scalability (e.g. lockless/blockless and "O(1)" algorithms)
- support for multiple page sizes
- page/cache coloring and NUMA optimizations (to reduce cache miss costs)
- tracking statistics (how much RAM in each "zone" is used/free)
- fault tolerance (e.g. faulty RAM detection and avoidance)
- power management (e.g. keep all frequently accessed data in some RAM chips, so other RAM chips can spend most of their time in lower power states)
- hot-plug RAM (removal and insertion)
- support for virtual memory ballooning (a technique virtual machines/hypervisors use to reclaim memory guests aren't using)