Historical Notes on CISC and RISC
(NB: I originally wrote this for some of the other students in an assembly language class I took in Fall 2006, and posted shortly oafterwards to a thread on DevShed. While it is not about OS-dev per se, it may help clarify some of the puzzling aspects of assembly language for some members. - User:schol-r-lea)
It is somewhat ironic that many Assembly language instructors today have found it easier to teach assembly programming in a RISC architecture such as MIPS or ARM, rather than in the ubiquitous but far more complex x86 PC system: historically, it was the CISC (Complex Instruction Set Computer) designs which are generally associated with extensive assembly language programming, while the RISC (Reduced Instruction Set Computer) designs are explicitly intended to make compiling code from a high-level language easier, with the expectation that assembly language would be rarely used except for in the operating system.
To understand why and how this reversal came about, you need to know a bit of history. When stored-program computers based on the Von Neumann architecture were first being developed in the late 1940s and early 1950s, the size and complexity of the hardware (which was much more diverse than today, using such disparate technologies as mercury delay lines, CRT-phosphor memories, magnetic drums, wire and tape recorders, ferromagnetic cores, and the ever-present vacuum tubes) meant that the hardware capacities were extremely limited; it was not uncommon for even a large machine to have only one or two registers and a main memory of a few thousand words (which varied in size from machine to machine). The instruction sets were equally constrained, and often had special-purpose instructions for connecting to the peripherals, taking up the already small instruction space.
For example, the 18-bit Lincoln Labs TX-0 (one of the first transistor-based systems designed) had a grand total of four primary instructions (though one of these was a kind of escape code that told it to use the following word as an instruction from a different group of special-purpose operations) and two registers, X and Y. Many of these designs had what would today be considered poorly developed instruction sets, full of special cases, lacking useful instructions1, and including instructions that were of little use.
Generally speaking, these early designs had only a single special-purpose register, called the accumulator, on which it could perform most of the common operations such as addition or subtraction; the other operand, if any, usually was either in main memory, or in another special-purpose register called the source, and the result would remain in the accumulator (hence the name).
As the designers began to develop more effective architectural principles based on the success and failures of earlier efforts, hardware became smaller and less expensive, which in turn allowed for greater freedom in the CPU design, so the complexity of the systems grew; by 1966, the PDP-6, a typical design of the time, had sixteen 36-bit general-purpose registers, plus a stack register and frame register. The instruction sets became more complex as well; since a significant percentage of programming even into the 1970s was done in assembly language, it was thought that the instruction sets should be 'rich', that is, they should have a special instruction for nearly every common operation the programmer would want to use.
This culminated with the 32-bit VAX-11 series of minicomputers and mainframes, which had 256 primary instructions, including ones for polynomial evaluation, trigonometric functions, and CRC calculation. Many of these instructions had multiple addressing modes, so that they could operate on registers, on main memory, or some combination thereof; while this increased the convenience of assembly programming, it also meant that there were several special cases the programmer had to be aware of, and it complicated the CPU design substantially.
To provide these instructions, designers began to use 'microcode', which was a set of what could be called firmware-encoded macro instructions that would be handled by the CPU itself. The complex instructions would get broken down into simpler instructions internally, which the programmers wouldn't need to be aware of, and executed as if they were a fixed part of the hardware. Such instructions would often take several system clock cycles to run, and in most processors, there was no pipelining in the modern sense - the system would have to wait until the each instruction was finished before it began processing the next one.
About the time the VAX was being designed, four other things were happening that would change this attitude. First, high-level languages were becoming the primary method of programming, meaning that the baroque instruction sets of the then-current CPUs were becoming unnecessary.
Second, assembly language programmers were noting that in the majority of programs, whether in assembly or in a compiled language, only a handful of instructions were being used: few programs needed a CRC instruction, and hardly any programmers were familiar enough with the entire VAX instruction set to know that the instruction existed and how to use it. A side aspect of this is that the more complex instruction sets were increasingly difficult to learn and to teach.
Third, computer hardware design had advanced to the point where graduate courses in CPU design were being offered; such courses naturally enough stuck to very simple and regular designs, which they found actually outperformed comparable complex systems (though professionally-constructed CPUs often used techniques which were patented or trade secrets, skewing the results when comparing the student designs to the commercial ones). They especially noted that the operations that worked directly on memory were generally slower than if one instead loaded the values into registers, performed several operations on those registers, and then stored the results back into memory.
Finally, the advent of single chip microprocessors meant that it would soon be possible to mass-produce inexpensive computers for a fraction of the cost of minis and mainframes. Early on, these were limited by the capabilities of the hardware in much the same ways the first generation of CPUs were (and some of the design mistakes of the first generation were repeated as well), but the technology soon grew beyond anyone's expectations, and in some ways, beyond the abilities of the designers to predict what they would later need to support.
This last part had some major repercussions, especially regarding some of the design decisions as microprocessors became more powerful. Chief amongst these was the decision by Intel to try and extend the 8-bit 8080 design into a new 16-bit design, the 8086; they bent over backwards to make the CPU as familiar as possible to programmers of the older chip, with the result that the design ended up with some unnecessary complications and limitations, especially in how it addresses larger sections of memory2 Also, because the designers had given the system only a small number of registers, most of the operations have several complicated addressing modes to avoid running out of registers.
The fact that this processor was selected for the IBM PC, which would soon become the dominant platform, meant that these weaknesses were of critical importance to millions of users. This was further exacerbated when they extended the design still further with the 80286, which now needed a separate 'protected mode' to access its full abilities while retaining 'normal mode' for backwards compatibility. Some of the design flaws were resolved in the next design, the 80386, but at the cost of exponentially increasing complexity both of the chip itself and of assembly programming for it.
Meanwhile, other chip designers were going in other directions. One, Motorola, started over with a classic 'big' design, the Motorola 68000, which resembled a scaled-down version of the VAX in many ways, with 16 general-purpose registers and a complicated instruction set. This would become the CPU for several successful workstations, as well as the original Apple Macintosh, Atari ST, and Commodore Amiga home computer lines. Later design extensions would complicate this, though not to the extent that the 80x86 design would be.
Still, it was growing clear that the complex instruction sets were growing counter-productive. Thus, many chip designers decided to go in the opposite direction: minimal instruction sets, no microcoding, load/store architectures with few if any operations working on memory directly, large register sets which could be used to avoid accessing main memory whenever possible, and an emphasis on supporting high-level languages rather than assembly programing. This new idea, which was called RISC (reduced instruction set computer), would be the basis several new CPU designs, including the MIPS (and it's successor the DLX), the Sun SPARC, the Acorn ARM, the IBM POWER architecture3, and the DEC Alpha. Of these, all but the Alpha remain in use today for certain specialized areas of use, and the ARM in particular has become the de facto standard for mobile computing, though for the most part the domination by the Windows-x86 system has forced them out of the market for home and business systems.
In principle, only a single operation, 'subtract two memory values and branch if the result is negative' (or several variants on this) is sufficient to allow a Random Access Machine to perform all Turing-computable calculations). There are even (simulated) machines which are designed on this principle, such as the OISC. In practice, of course, such a system would be both tedious and wasteful, especially for the more commonly used operations such as integer arithmetic.
The specific issue is the use of 'memory segmentation', which was intended to provide a 20-bit address space while still only using 16-bit addressing for most purposes. It worked by having a separate set of segment registers, which pointed to 64K regions overlapping each other at 16-byte intervals. The addresses are formed by taking the segment address and a 16-bit offset, which are equivalent to the 8080's addresses, adding the two values with a 4-bit displacement to get the 20-bit address. Most instructions could use one of the segment registers as a default value for code or data, and use just the offsets in the instruction stream. The particular overlap was set at 4 bits because using more than 20 address pins was determined to be prohibitively expensive at the time, and would make the Dual-Inline Package design too large. It was assumed that a 1 MiB address space would be sufficient for a CPU meant as a microcontroller with a limited design lifespan rather than a general-purpose system that would be in use 40 years later. This segmentation system persists as 'Real mode' even in current x86 models.
used in modified form as the PowerPC, which was used in Macintoshen from 1994 to 2006 and in some IBM OS/2 systems from 1992 to 1996. The POWER architecture itself is primarily used in larger multiprocessor systems such as Deep Blue and Watson.