Symmetric Multiprocessing

From OSDev Wiki
Jump to: navigation, search

This article is a stub! This page or section is a stub. You can help the wiki by accurately contributing to it.

Symmetric Multiprocessing (or SMP) is one method of having multiple processors in one computer system. In an SMP system (as opposed to a NUMA system) all logical cores are able to see the entire memory for the system. Note that SMP and NUMA are not mutually exclusive however; as Brendan has pointed out on the forums, Intel's Core i7 implements both SMP and NUMA, as well as hyper-threading.


Initialisation of an old SMP system

The startup sequence is different for different CPUs. Intel's system programmer's manual (section 7.5.4) contains the initialization protocol for Intel Xeon processors, and doesn't cover older CPUs. For the generic "all CPU types" algorithm, see Intel's Multi-processor Specification.

For 80486 (with an external 8249DX local APIC), you must use an INIT IPI followed by an "INIT level de-assert" IPI without any SIPI's. This means you can't tell them where to start executing (the vector part of a SIPI) and they always start executing BIOS code. In this case you set the BIOS's CMOS reset value to "warm start with far jump" (i.e. set CMOS location 0x0F to the value 10) so that the BIOS will do a jmp far ~[0:0x0469]", and then put the segment & offset of your AP entry point at 0x0469.

The "INIT level de-assert" IPI isn't supported on newer CPUs (Pentium 4 and Intel Xeon), and AFAIK it is ignored completely on these CPUs.

For newer CPUs (P6, Pentium 4) one SIPI is enough, but I'm not sure if older Intel CPUs (Pentium) or CPUs from other manufacturers need a second SIPI or not. It's also possible that the second SIPI is there in case there's a delivery failure for the first SIPI (bus noise, etc).

I normally send the first SIPI and then wait to see if the AP CPU increases a "number of started CPUs" counter. If it doesn't increase this counter within a few milli-seconds, then I send the second SIPI. This is different to Intel's generic algorithm (which has a 200 micro-second delay between SIPIs), but trying to find a time source capable of accurately measuring a 200 micro-second delay during early boot isn't so easy. I've also found that on real hardware, if the delay between SIPIs is too long (and you don't use my method) an AP CPU can run the OS's early AP startup code twice (which in my case would lead to the OS thinking there's twice as many AP CPUs as there are).

You can broadcast these signals across the bus to start every device that is present. However by doing so you might also enable the processors that were disabled on purpose (because they were defective).

Finding information using MP Table

You may want to use newer ACPI instead of MP Table. If so, please see the next section.

Some information (which may not be present on newer machines) dedicated for multiprocessing is available. First one must find the MP Floating Pointer Structure. It is aligned on a 16 byte boundary, and contains a signature at the start "_MP_" or 0x5F504D5F. The OS must search in the EBDA, the BIOS ROM space, and last kilobyte of "base memory"; the size of base memory is specified in a 2 byte value at 0x413 in kilobytes, minus 1K. Here is what the structure looks like:

struct mp_floating_pointer_structure {
    char signature[4];
    uint32_t configuration_table;
    uint8_t length; // In 16 bytes (e.g. 1 = 16 bytes, 2 = 32 bytes)
    uint8_t mp_specification_revision;
    uint8_t checksum; // This value should make all bytes in the table equal 0 when added together
    uint8_t default_configuration; // If this is not zero then configuration_table should be 
                                   // ignored and a default configuration should be loaded instead
    uint32_t features; // If bit 7 is then the IMCR is present and PIC mode is being used, otherwise 
                       // virtual wire mode is; all other bits are reserved

Here is what the configuration table, pointed to by the floating pointer structure looks like:

struct mp_configuration_table {
    char signature[4]; // "PCMP"
    uint16_t length;
    uint8_t mp_specification_revision;
    uint8_t checksum; // Again, the byte should be all bytes in the table add up to 0
    char oem_id[8];
    char product_id[12];
    uint32_t oem_table;
    uint16_t oem_table_size;
    uint16_t entry_count; // This value represents how many entries are following this table
    uint32_t lapic_address; // This is the memory mapped address of the local APICs 
    uint16_t extended_table_length;
    uint8_t extended_table_checksum;
    uint8_t reserved;

After the configuration table, there are entry_count entries describing more information about the system, then after that there is an extended table. The entries are either 20 bytes to represent a processor, or 8 bytes for something else. Here are what the processor and IO APIC entries look like.

struct entry_processor {
    uint8_t type; // Always 0
    uint8_t local_apic_id;
    uint8_t local_apic_version;
    uint8_t flags; // If bit 0 is clear then the processor must be ignored
                   // If bit 1 is set then the processor is the bootstrap processor
    uint32_t signature;
    uint32_t feature_flags;
    uint64_t reserved;

Here is an IO APIC entry.

struct entry_io_apic {
    uint8_t type; // Always 2
    uint8_t id;
    uint8_t version;
    uint8_t flags; // If bit 0 is set then the entry should be ignored
    uint32_t address; // The memory mapped address of the IO APIC is memory

For more information, see, chapter 4.

Finding information using ACPI

You should be able to find a MADT(APIC) table in ACPI. The table have a list of local-APICs, number of which should be consistent with number of cores on your processor. Details of these table are not listed here, but you can find them easily on this wiki.

AP startup

After you've gathered the information, you'll need to disable the PIC and prepare for I/O APIC. You also need to setup BSP's local APIC. Then, startup the APs using SIPIs.

Startup Sequence

The MP specification contains a standard method to start an AP, however it is not recommended to be used, as it contains very precise timings, which, if done incorrectly, can lead to several problems. Brendan offers an alternative method, which should be done for each AP individually. First send an init IPI and wait 10 milliseconds. Then send a SIPI, and poll for a flag to be set by the AP's trampoline code with a timeout of 1 millisecond. If the timeout was reached, send another SIPI, and poll for the same flag, but this time with a timeout of 1 second. If the AP managed to set the flag, the BSP should set another flag to allow the AP to continue (probably to wait for the scheduler to have a process it needs executing).


The easiest method for the timings is to use the PIT's mode 0. Write 0x30 to IO port 0x43 (select mode 0 for counter 0), then write your count value to 0x40, LSB first (e.g. write 0xA9 then 0x4 for a millisecond). To check if counter has finished, write 0xE2 to IO port 0x43, then read a status byte from port 0x40. If the 7th bit is set, then it has finished.

Sending IPIs

IPIs are sent through the BSP's LAPIC. Find the LAPIC base address from the MP tables or ACPI tables, then you can write 32-bit words to base + 0x300 and base + 0x310 to send IPIs. For a init IPI or startup IPI, you must first write the target LAPIC ID into bits 24-27 of base + 0x310. Then, for an init IPI, write 0x00004500 to base + 0x300. For a SIPI, write 0x00004600, ored with the page number at which you want to AP to start executing, to base + 0x300. For more information, see

See Also



External Links

Personal tools