User:Virtlink/My dream CPU

From OSDev Wiki
Jump to navigation Jump to search

My ideal CPU would be more high-level than most. The advantage of having a high-level CPU is that it can have more knowledge about the software than most, and use this information for run-time optimizations. For example, having a single possible calling convention and providing just one call and one ret instruction allows the CPU to do anything to make the call happen, without caring about how it is done. Today the difference between non-load/store (CISC) and load/store (RISC) processors has gotten a lot smaller. Transistors became smaller and cheaper, instructions are heavily pipelined and parallelized, and non-load/store processors are again viable.

Memory and registers

My CPU would have 64-bit virtual and physical addresses, but does not require the availability of the whole virtual and/or physical address space. For virtual addresses, less than 64-bits may be actually used (canonical form addresses) resulting in a higher half and a lower half of the virtual memory.

The CPU has 32 64-bit general purpose (integer) registers and 32 256-bit SIMD/floating-point registers.

The CPU also tracks the free pages, and it can do that really fast.

Stack

Stacks are 64-bit. In my ideal CPU they are managed solely by the CPU, and the CPU also manages the stack frames. There are instructions to allocate stacks, destroy them, allocate a chunk of stack memory, calling procedures, returning from procedures and of course pushing and popping values.

The CPU is also responsible for moving the arguments on the stack to the callee, and moving the results back to the caller. The CPU can do this extremely fast, might even used hidden caches or registers for this.

The CPU managed stack opens up all kinds of new possibilities. For example, my CPU would also manage exception handling: no more magic numbers or error codes, but a pointer (which presumably points to some exception object, but the CPU does not care) that is propagated up the call chain. As the CPU knows the exact stack frames, it can unwind the stack and run the exception filters, handlers and finalization code it encounters on its way. If the stack ever runs out, the CPU runs out of memory or there is a division by zero, the CPU will throw the appropriate exception (instead of interrupting).

Also, my CPU would manage the context switches and threads. This can follow naturally as all stacks are managed by the CPU, and CPU just manages the call stack for each thread. The CPU can optimize the saving of the state as it desires.

Instruction set

The instruction set is a non-load/store variable-length instruction set, again with the argument that higher-level instructions may give the CPU more freedom to optimize. Each instruction consists of a 2-byte opcode, an 1-byte addressing mode specifier, and up to 4 operands.

The first opcode byte selects the opcode map to use, there are 128 available for general use and extensions (such as SIMD extensions) and 128 for vendor-specific extensions. Each opcode map contains up to 255 instructions (encoded in the second opcode byte). This gives a theoretical maximum of 32767 possible general instructions and 32767 vendor-specific instructions.

More at ISA

CPU functions

Instructions tend to be small, quick things that are executed in the blink of an eye. However, many of the basic stuff in for example the C library is required on all computers, and people try to some up with more and more sophisticated methods to squeeze that tiny extra bit of power from memcpy and memset. In my CPU, there are special CPU functions that perform such operations. These are usually not finished within a single CPU cycle, but the CPU has the ability to do it as optimal as possible. When the CPU vendor wants to speed 99% of the programs up, all it has to do is spend some more transistors in the memcpy area. On the other hand, an embedded version might include a 1 MiB on-die ROM chip with machine code representations of the functions.