IPC Data Copying methods

From OSDev Wiki
Jump to: navigation, search

Contents

Prologue

IPC (Inter Process Communication) is a well known kernel-provided abstraction, which is core to almost any type of kernel design; whether multi-address space, single-address space, microkernel, monolithic kernel, or otherwise, a kernel will generally have to provide IPC data transfer mechanisms as long as it supports multiple processes.

Since various higher IPC abstractions exist, and many APIs and frameworks have been introduced into the body of known "IPC mechanisms" throughout history it is generally not difficult to obtain information on IPC APIs in use contemporarily. However, the distinction between an IPC framework, and the underlying copying and data transfer method may not be well understood even by someone who has read on the topic of IPC. This article seeks to go one layer below IPC, and examine the four general strategies which will be available to a kernel for the transfer of data across process boundaries.

Caveat lector: not every methodology can be applied to every design. Additionally, this article does not take asynchronous design into consideration; persons wishing to implement asynchronous designs should bear that in mind, substituting the appropriate messaging infrastructure where necessary as they read.

Multi-copy, Single-copy, and Zero-copy

The number of data copies required for any IPC abstraction or implementation is reliant on the underlying data transfer technique being used. For the techniques described in this article, the classifications are as follows:

  • Multi-copy: Kernel-relayed buffers (hereafter: "Kernel-buffered").
  • Single-copy: Map-shared and copy (hereafter: "map and copy").
  • Zero-copy: Map-shared and read (hereafter: "map and read"); Page-table displacement (hereafter: "pagetable-displacement").

Kernel-buffered data transfer

In this approach, the data sender process initiates an IPC transaction, and gives the kernel the address of a buffer which it will wishes to have transferred to another process. The kernel copies that buffer into its own address space region and sends a message or signal of some sort to the target process, and returns to the sender.

When the target process "picks up" the signal or message from the kernel, it will generally allocate memory to contain the data from the sending process, and then ask the kernel to fill the data into the allocated memory.

This approach does not require the kernel to block the sending process, because the changes to the sending process' copy of the data will not affect the fidelity of the data that the target process receives. The sending process does not strictly need to be informed when the target process is given a copy of the data; but a particular implementation may choose to give such a notification for a "handshake" roundtrip.

As a keen reader can probably see, there were two copies involved in this data transfer, along with potentially two address space switches (a switch into the target process' address space, and then back to the sender's address space), and any number of privilege barrier crossings (userspace to kernelspace, and back) which are mandated by the particular IPC API implementation that uses the technique.

Map and copy

This data transfer method makes use of shared memory mapping to reduce the amount of copying done. It holds the following advantages over kernel-buffered data transfer:

  • Large data transfers do not exhaust the kernel virtual address space.
  • Data transfer is done directly between the sender and the target process, without needing an intermediate copy into a buffer in the kernel.
  • For large data transfers, the reduction from two copies to one copy, and the elimination of the need to allocate a kernel buffer brings a performance benefit.

In brief, the sender process asks the kernel to send a designated packet of data to a target process. The kernel must at this point put the sender to sleep, and then post a message or signal to the target process.

The sender process must be put to sleep because the kernel does not create a copy of the data to be sent; therefore if the sender process is not forced to sleep, it could modify the data before the target process "picks it up". More elaborate synchronization schemes can be constructed around this method to ensure better serialization of access to the data buffer. Consider for example, the scenario where the sender process is multithreaded, and another thread within the sender process modifies the data buffer before the target process is able to create a copy of it. Clearly, sleeping the sender thread is not adequate. And putting all of the threads within the sending process to sleep just to transfer one buffer of data is also obviously not justifiable for most designs.

Continuing: Eventually, the target process receives the message or signal from the kernel that data is available to be "picked up", and it allocates enough memory to hold the data. The target process will then ask the kernel to copy the data into the memory area it has allocated. The kernel will at this point, create a temporary shared mapping between the sender process (which is asleep) and the target process, and use that temporary shared mapping to access the data from within the target's address space to facilitate a fast, single copy into the allocated receipt memory area.

After the copy, the kernel will tear the shared memory mapping down and wake up the sending process.

Map and read

This technique is usable only if the sender can guarantee that it does not need to write to the data it intends to send, until the target process has read it and finished using it.

The advantage of this approach over "map and copy" is that this approach does not require copying of data; it simply maps the data from the sender into the target process' address space, and lets the target process use the shared memory directly. The steps are essentially the same as "map and copy":

First, the sender prepares a buffer of read-only data which it intends to send to a target. The sender asks the kernel to transfer this data to its target using some IPC abstraction. The kernel will then put the sender to sleep, and post a message or signal of some kind to the target process.

The target process eventually picks up the message or signal which tells it that data is available from some other process. The target process asks the kernel to set up the shared memory mapping to the data from the sender, and the kernel does so. The target process reads the data and acts on it in one manner or another, preserving it and treating it as read-only.

When the target process is done working with the data, the target process must tell the kernel so; the kernel will then unmap the shared mapping and wake up the sender process.

Pagetable displacement

This is an IPC technique which, rather than copy data, moves data from one process to another; or more precisely, it moves data from the scope of access of one process to another, by unmapping the data from the view of the sender process, and mapping it into the address space of the target process.

No copying is done, and only the page tables of the two processes are modified, essentially making this method as fast as a few page-table modifications, plus some MMU translation flushing.

The data is not copied, but moved. Naturally, the implications should be understood by the sender before the method is employed.

Epilogue

In synopsis, most IPC data transfer techniques will employ one of the above in some way or another. There are variations, and further synchronization considerations to be made. Each technique can be implemented as a synchronous operation, or an asynchronous operation; the implementation details only serve to add or diminish from the flexibility and throughput characteristics of each method.

Naturally, while this article postulates that for example, the map-and-copy and map-and-read techniques disallow the sender to modify the data buffer while it is "in transit", it is not inconceivable that one or more kernel designs may prefer to allow such behaviour, and see such behaviour as desirable.

Personal tools
Namespaces
Variants
Actions
Navigation
About
Toolbox