User:Nessphoro

On Memory Allocators

Disclaimer: This is a work in progress.(This is essentially documenting the heap allocator in my OS)

Overview

In a nutshell, the memory allocator described here is a simplification of jemalloc and posted for reasons of creating diversity. The language I will use is C++, however it will look more like C with classes and namespaces. Also, the naming standard will be the same Microsoft uses in C# API's due to its appealing neatness.

Quick Look at jemalloc

jemalloc is described in a paper by Jason Evans and is freely available on the Internet. Its purpose is to provide a concurrent and scalable malloc for general use. The algorithm achieves that by heavily utilizing TLS (Thread Local Storage) so that threads need not use locks to allocate small to mid sized object. If, however, TLS does not have enough caches paged, the algorithm simply asks the global page manager to allocate more space and maps it to the TLS. The global page manager is also responsible to allocate space for large and huge objects which are mapped at page boundaries. Please note, that jemalloc does not use a 4 byte header before each pointer to store size, instead it uses spans which are described further on. Spans are spans of objects with a fixed size. The size is predetermined and simplifies the algorithm in several ways. By utilizing fixed sized pooled object, the algorithm seeks to minimize fragmentation which leads to better performance and stability in long running systems. Each span in 2MiB long with all metadata stored at the beginning. Within the span, bitmaps are utilized to store information about allocated/free objects. Size's for most small objects is determined by the equation 16n where n is a number from 1 to 32.

Note from the author:

Bitmaps are actually quite efficient as inline structures. Their worst case is equivalent to linked list with O(n). Nevertheless, caution must be exercised especially when implement bit-checking. It is often a good idea to avoid such consturcts:

for(int i=0;i<32;i++)
{
    if(intToTest>>i && 1)
        DoSomething();
}

Instead, implementation of a fast-path in assembly is advised, particularly due to x86 instructions such as BSF which allow the execution of the above code in one operation.

As for large objects, the algorithm uses red-black self balancing binary trees to store the metadata which allows very efficient look up in O(log n). As a possible optimization, it might be sound to use hash-tables to achieve constant time speed, this is however out-of-scope of this discussion.

Quick Overview of Hydra's Heap Allocator

Hydra being an OS in its early life naturally doesn't have many features compared to FreeBSD (the platform jemalloc originally targeted) hence I decided to not use things such a TLS as the kernel does not perform many allocations in concurrency and is not expected to compete with user-space programs due to the separation of allocators and space. In addition, it is expected that kernel drivers needing a fast/concurrent malloc will implement it themselves - for now at least. In conclusion, this approach has many perks:

1. If you allocate many objects but don't actually touch them - no physical memory is consumed.

2. No extra metedata is needed, which increases memory utilization.

3. Guaranteed memory alignment on the object's size boundary.

4. Subjectively quick allocations.

5. Reduced fragmentation.

User:Nessphoro

Contents

On Memory Allocators

Overview

Quick Look at jemalloc

Quick Overview of Hydra's Heap Allocator

Navigation menu

User:Nessphoro

On Memory Allocators

Overview

Quick Look at jemalloc

Quick Overview of Hydra's Heap Allocator

Navigation menu

Search