User:Schol-r-lea/Understanding RISC vs CISC

From OSDev Wiki
Jump to navigation Jump to search

For more on the background of this topic, see Historical Notes on CISC and RISC.

The RISC vs CISC issue is often badly misunderstood. The main reasons RISC is often seen as an improvement over CISC are

  • Elimination of rarely-used instructions (which were originally intended to make assembly programming easier, but add complications to compiler optimization) reduces the design cost, saves silicon real estate (meaning it can use fewer transistors, or use the same number of transistors to add things like caches), and reduces or eliminates the need for microcoding in the instruction decoding and execution.
  • Using only simple instructions (a separate issue from the one above) makes for faster instruction throughput (since everything fits into one cycle without having to make a cycle excessively long), reduces energy consumption, and reduces design and production overhead.
  • Fixed-size instruction formats mean that the instruction decoder always knows exactly how large the instruction it needs to pre-fetch is (at the cost of less efficient memory usage for common instructions, hence the later retrofitting of the 16-bit Thumb instructions on to the ARM).
  • Elimination of multiple addressing modes for data operations means that each instruction always accesses data in the same way, with most instructions only operating on registers (with value offsets - often used for indexing - being handled via constant fields in the instructions themselves).
  • Very large register files, with all or almost all registers being general-purpose, simplifies compiler implementation and optimization. The instruction pointer in a RISC is usually special-purpose, though exceptions exist even for that. Things like the stack and frame pointers are usually set by convention and assembly-language naming rather than hardwired.
  • The use of register-based conventions (or register windows) for handling procedure call returns, and the passing of procedure arguments and return values, allows procedures to only use the in-memory stack as needed, rather than requiring it in any procedure call. This makes optimizing the calls easier for a compiler writer.


In addition, many CISCy designs - including the 8086 - were just poorer designs, period. They often had cripplingly small register files (or even were accumulator-based systems, like the 6502, with only a single general register that was the implicit argument of most instructions), were often filled with oddball exceptions and special cases, and had various kludges in how the memory was handled (not just segmentation; for example, the 'Page Zero' memory and fixed 'Page 1' stack on the 6502). This was by no means universal, however; the VAX and the Motorola 68000, two designs which epitomized CISC, had very cleanly designed ISAs with consistent addressing formats and large (for the time) register sets.

At first, RISCs also had the advantage that even a heavily optimized design could fit onto a single-chip CPU, whereas many optimizations used in CISC systems - such as out-of-order execution and the use of large, multi-level associative memory caches - wouldn't, given the transistor densities of the 1980s and early to mid 1990s. While the other advantages were independent of the CPU implementation, this was an issue with any CPU designed then, and made RISC a very appealing option in that period.

Finally, a single chip design will always have an advantage over a multi-chip or TTL one simply due to the smaller lightspeed delay. However, early on this had to be balanced against the ability of larger systems to parallelize their instruction pipelines. Also, a multi-chip system using dozens or even hundreds of smaller, special-purpose CPUs such as DSPs - or even general-purpose ones, such as in the Connection Machine and the Hypercube - could take this even further, as seen in pretty much every supercomputer in use today, which might combine dozens or even hundreds of general-purpose processors and thousands of GPGPU and DSP units to crunch a single computation.

The single-chip vs. TTL issue ceased to be a factor in the early 1990s as it became possible to put increasingly complex architectures onto a single chip - today, no one except retrocomputing hobbyists build individual CPUs out of multiple chips. However, the smaller footprint of the CPU cores gave RISC designs an early advantage in the move to multiple-core CPUs, though this was not really exploited for non-technical reasons, and by the time multicore ARM and MIPS CPUs were common, the CISC designs had caught up.

However, as Moore's Law ground on, the advantages in silicon real estate melted away, as existing CISC optimizations were mated to ones specific to IC designs. Modern superscalar CISC designs get most of the same advantages as RISCs, but do so by throwing hardware and electricity at the problem (by the use of complex pipelining, instruction re-ordering, register renaming over large hidden register files, various forms of instruction merging and splitting, and especially, caching). Furthermore, by performing the splitting of complex instructions by dynamic binary translation, rather than using the older method of microcoded instructions, it gave the system more flexibility in instruction reordering.

All of these optimizations came at significant cost in terms of both silicon real estate and power consumption, but by the mid-1990s, Moore's Law meant that not only could more transistors be fit onto a chip, but over all power consumption was dropping due to the shorter distances between transistors. By the early 2000s, even RISC systems were adopting some of these methods to improve their performance, most notably Dynamic Out-of-Order Execution (DOoOX), though due to the markets they soon found themselves primarily in (mobile devices and other low-power-consumption uses such as single board computers) they limited this in order to maintain their advantages in lower heat generation and power consumption.

Today, RISC vs CISC is less about raw computing power (though RISC advocates, including myself, often argue that it has the potential for it if the same amount of design effort were applied to it) than over the costs of new chip development and improvements to the designs (Intel pours billions into each new chip generation), retail costs per unit, and the drastic differences in energy consumption (DOoOX in particular is immensely energy-intensive, and accounts for around 90% of the wattage needed by current x86 chips, which is why RISC designs which do apply it do so sparingly).