Memory mapped registers in C/C++
When accessing memory mapped registers in any language, as long as the compiler being used is an optimizing compiler, it is liable to make assumptions about the values contained in registers, and to possibly re-order reads and writes based on the assumptions it has made to produce code that is more efficient in the use of the CPU's pipeline, and to better enable write buffering and cache use.
However, memory mapped registers are a case where the order of writes to memory is important; along with this, even if the compiler respects the order of the reads and writes from/to a location, it may still make assumptions about the values in a "variable" since it does not know that the variable is actually a memory mapped device's register. The same applies to data retrieved from reads from a port-mapped IO device's registers.
See the following:
Serialization and memory barriers
Serialization or memory barriering, also known as memory fencing, etc., refers to the explicit directive to a compiler or CPU that an instruction stream must be executed in the order specified.
When a memory barrier is issued to a compiler, the compiler re-orders all reads and writes with respect to the location where the barrier is issued, such that no writes or reads which appear in source before the barrier will be emitted after it.
When a memory barrier is issued to a CPU, the CPU stops, flushes the pipeline, and waits for all reads and writes in the pipeline to complete before proceeding with any instructions beyond the serializing instruction.
Compiler memory barriers
Explicit memory barrier directives to the compiler are compiler specific. Volatile may work as a memory barrier, but it in many cases is an overoptimization. Where possible, avoid the use of volatile as a memory barrier, and prefer to use an explicit memory barrier directive to the compiler.
// Explicit memory barrier for use with GCC. asm volatile ("": : :"memory");
CPU memory barriers
CPUs generally either provide explicit memory fencing (serializing) instructions, or ensure that certain behaviours detected in the code stream will force a memory fence operation to take place.
On x86 family CPUs, the issuing of a serializing instruction will force a memory barrier operation in hardware. An example of a serializing instruction is the CPUID instruction. An example of an operation that forces serialization is a MOV instruction to any control register.
The C/C++ volatile keyword
The volatile keyword was, amongst others, meant for use with MMIO-registers, and is used to indicate that a variable may change outside the scope of the current execution stream, or function, without the compiler's knowledge. An example of such a variable might be an actual hardware register, or a shared lock variable, or a shared synchronization memory location which is written to by multiple CPUs, or polled for changes.
Volatile generally denotes that a change may occur without the compiler being able to see the code that causes the change. This indicates that the compiler should not assume anything about the value in such a variable, naturally, and should always write it back to memory immediately, and not hold it in a register, for example.
Note well that this does not mean that the CPU will be forced to write the variable back to memory. It means that the compiler will emit code which will tell the CPU to do so. The CPU may or may not respect this directive pending the Caching options of the memory in question. For example, if a volatile variable is stored in a location which, to the CPU is "write-back" memory, the CPU may buffer the write and keep it in the cache, and not write it back to physical RAM until nanoseconds or milliseconds later, thus ruining any order preservation, introducing timing errors in the writes to the hardware device.
Memory caching options
There are three common memory cache configurations:
- Write-Back caching: This is where the CPU is given the right to lazily write changes back from its cache to memory. It is allowed to do so on the basis of its caching policy.
- Write-through caching: This is where a CPU may use its cache to speed up reads from a cache line which has not been modified. On write however, the CPU must write all changes back to physical store immediately. There is a performance penalty for using write-through caching, relative to the use of write-back caching.
- Uncached memory: This is memory which the CPU has been configured not to use any caching on, whether caching of writes, or caching of reads from variables which have not been written to in the current CPU's scope of knowledge, or that of its cache coherency domain. The performance penalty for using uncached memory is higher than that of using write-through caching.
Hardware registers should be accessed with either Write-through caching or should be set to be uncached.
TODO: Give an example of a set of memory mapped registers using indirect addressing to show how to use all of the above together to ensure proper visibility of accesses to devices from C/C++ code.
http://www.kernel.org/doc/Documentation/volatile-considered-harmful.txt "Volatile Considered harmful"