Linux Kernel Primer
Introduction
The Linux kernel source is an excellent case study for a modern, used, and tested operating system design. It was not, however, created as a teaching tool. The source is often not commented and provides very little hand holding. That said, there are a great many places and things to learn. A full description of the linux kernel source could span multiple books.
Obtaining the Source
http://www.kernel.org is the major source for all that is the linux kernel source code. http://lxr.free-electrons.com lets you browse the source online. It also has a very nice search engine.
Version Numbers
Basic history of the major functional changes:
- 1.0 and prior - x86 only, a.out binary format
- 1.2 - adds support for a few more CPU types, ELF binary format, modules
- 2.0 - adds SMP support
At 2.2 functionality reached a point where improvements became much more iterative. The 2.2 kernel implements most of the major features required of a modern kernel. Using anything below 2.2 as a reference or basis for a fork could easily result in serious architectural challenges for your derived code or design. Core improvements to the kernel have been made after 2.2, and the kernel developers had good reason for them.
The size of the kernel can be attributed to the plethora of file systems, device drivers, architecture supports, and kernel services and features. While 2.6 is a massive code base, removing the non-core parts reduces the size. Drivers take up over 100MB, non-i386 architectures, around 50MB, file systems, around 20MB. As a general rule, you won't need to venture into the various subsystems and drivers that you are not using.
Browsing and Using The Source
It's better to not try and follow the whole of the Linux kernel source as a single process flow. Different architectures start in different ways, but the generic portions can be found in init/. Keep in mind that an operating system kernel is very different from a user space program - the kernel is responding and servicing to user space requests.
Instead of trying to figure out a process flow for the entire kernel, try to figure out what's going on for a single event or system call. System calls can easily be found by searching the code for functions beginning with "sys_". These generally have descriptive names that correspond to the kernel requirements for a C library call.
Looking at a driver source in Linux can tell you a great deal about a given piece of hardware functionality. Keep in mind that the Linux kernel source isn't always "correct" about how it handles specific hardware. Many drivers were written with little or no documentation or standards information.
Unless you modify large portions of your kernel to use the same names and conventions of the Linux kernel, Linux kernel source is not easily ported away from Linux. Also, the source is GPL, so your kernel source is going to require proper licensing if you use anything. There are some exceptions to this rule, mostly in code shared between projects located in the Linux kernel tree.
The amount of infrastructure in Linux for basic system tasks - interrupt handlers, virtual memory, swap, block io - will make a lot of that code difficult to read if you are just starting. It might be better to read other tutorials here before looking at how a "real" kernel does things, so that you can recognize what is infrastructure setup and what is required to perform a given task. The interesting side effect is that big picture kernel logic, obscured in other small example kernels, becomes very clear in Linux. The scheduler algorithm, for instance, is fairly visible and easy to read in a single source file with little of the task swapping details obscuring how it works.
As the Linux kernel already provides tested mechanisms for many different tasks, creating a kernel module can allow you test your own kernel code. The downside to attempting this is that you'll have to figure out how to "play nicely" with the portions of Linux you are trying to use...
Important Source Files
This list is by no means complete. These files provide common functionality likely to be required in any operating system kernel.
- Locking, Synchronization
- kernel/mutex.c, include/linux/mutex.h - kernel space mutex implementation
- kernel/futex.c - kernel level support for 'fast' userspace mutual exclusion
- kernel/spinlock.c, include/linux/spinlock.h - kernel spinlock implementation
- Block devices
- block/elevator.c - support for the various IO scheduling algorithms
- block/noop-iosched.c - the easiest to read and understand IO scheduler (noop)
- block/ll_rw_blk.c - where block requests meet a block device queue
- File system
- fs/read_write.c - read, write system calls
- fs/file.c - management of file handles
- fs/inode.c - functionality for inodes (file information structs, sometimes like kernel file handles)
- fs/open.c - contains several of the major file operations
- fs/cramfs/inode.c, fs/cramfs/uncompress.c - a very small filesystem with easy to read/find code*
- Process Management
- kernel/sched.c - the linux scheduler, schedule() is the function to look at here
- kernel/workqueue.c - a workqueue that provides a means for doing work
- arch/i386/kernel/process.c - large portion of process handling for x86
- Binary formats
- fs/binfmt_elf.c - ELF loading
- Memory Management:
- mm/slab.c - slab allocator, also home for kmalloc
- mm/vmalloc.c - virtual contiguous memory allocator
- mm/page_alloc.c - buddy allocation
Finding what you are looking for
The arch/ directory contains architecture dependent files. If you want to learn about x86 development, the best place to look is arch/i386. Most any hardware specific functionality is contained in drivers/. Keep in mind that things like PCI are used by multiple architectures, and so they are located under drivers/ as well. Sound and Networking are treated differently and reside under their own directory sound/ and net/ with drivers in subdirectories there. Core kernel functionality lives in kernel/, and lib/.
There is "Documentation" in the Documentation directory. This directory is not what you think it is. It is more of a collection of tidbits of knowledge than any decent information on the kernel.
Resources
Websites:
- https://kernelnewbies.org/ - guides for new Linux kernel developers
- http://www.tldp.org/HOWTO/KernelAnalysis-HOWTO.html - guide to the Linux kernel organized by subsystems
- https://syscalls.kernelgrok.com/ - a reference list of Linux syscalls
Books:
- Linux Kernel Development, by Robert Love - a very decent 'overview' book of the Linux kernel. It doesn't go into too much detail, but provides enough of a big picture and detail view to really get started on a Linux kernel project.
- Understanding the Linux Kernel, David Bovet - more detailed than Linux Kernel Development. If you want to do more work than browsing the kernel, this has more of the detail required.