User:Intx13
Intx13's PCI tutorial
Peripheral Component Interconnect (PCI) provides protocol, signaling, and physical standards that allow third-party components to be connected to a host system. In the x86 architecture, PCI-connected devices are software-accessible via a combination of port-mapped IO, memory-mapped IO, and interrupts.
Basics
Definitions
A bus is a collection of PCI signaling lines. Some of these lines are shared among all components on the bus, such as the address and data lines that carry traffic. Others come in sets with one line available for each possible component on the bus, such as request/grant lines that determine which component gets to use the shared lines and when.
A slot is a connection point at which a single component can be attached to a bus. A slot can be a slotted connector into which an expansion card plugs, such as a sound card. A slot can also be location on a motherboard where PCI signals terminate into a planar device, such as an "on-board" network card.
A function is an independent hardware capability presented as a unique device to the host system. A planar device or expansion card can provide multiple functions simultaneously, which are treated as separate devices by the host system. For example, some manufacturers sell expansion cards that combine a sound card and a modem. The sound card would appear as one function and the modem as another. While they are attached to the same slot on the same bus, they are typically considered two distinct devices because they are accessed independently by software.
Processors do not have a native PCI interface; there is no "read-from-PCI-card" instruction in a processor's instruction set. Instead, a PCI host controller is integrated into the CPU at a level where it can interface with native IO mechanisms. Software communicates with the host controller using these mechanisms and the host controller takes the appropriate action on the PCI bus on behalf of the software. In the x86 architecture, the host controller is integrated into the Southbridge chip. Software uses a combination of port-mapped IO (the in and out instructions) and memory-mapped IO (indirect mov instructions) to access the host controller.
Geographical addressing
A device's geographical address is the bus/slot/function triple that identifies a device with which software wishes to communicate. Buses are numbered 0-255. Slots are numbered 0-31. Functions are numbered 0-7. This means there can be a maximum of 256*32 = 8192 planar devices/expansion cards attached to a single system. If each card utilizes all 8 functions, the system could have up to 8192*8 = 65536 distinct PCI devices. That's a lot!
When a vendor develops a new expansion card, they decide if and how to use function numbers. Function 0 must be used, but functions 1 through 7 are optional. Optional function numbers need not be used in order. For example, a card could implement functions 0, 3, and 7. The slot number assigned to a specific slot on a bus cannot be changed. If an expansion card is moved from one slot to another, its geographical address will change. The bus number assigned to a specific bus is fixed by the host controller and cannot be changed. x86 systems include a single host controller which has at least one bus. One of its buses will always be numbered 0, but the other numbers could be arbitrary.
Bridges
A bridge is a planar device/expansion card that exposes additional PCI buses to the host. For example, a host controller might only implement one bus with a few slots. A bridge (representing a single bus with its own set of slots) could be plugged into one of those slots, allowing for additional cards to be connected. It's not uncommon to find on-board bridges on modern desktop motherboards.
The host controller does not need to be configured by system software in order to know how to communicate across with the devices attached to its slots, because host controllers are integrated into the Southbridge and their bus/slot numbers are fixed at the time of manufacture. However, bridges must be configured at runtime with a bus number that doesn't conflict with any pre-existing bus. Bridges must also be "daisy-chained" to one another, with the host controller itself being daisy-chained to the first bridge. When software tells the host controller to communicate with a device on a bus that the host controller itself does not provide, it must know to which bridge device the traffic must be sent. The traffic "hops" along the chain until the bridge responsible for the bus number accepts it.
(Although it's of little relevance to the operating system developer, the "hopping" of traffic from one device to another doesn't actually require repeated transmission of data, due to the shared data lines that PCI uses.)
The host controller as a PCI device
As described previously, the host controller is a distinct hardware component that resides within the Southbridge to provide PCI bus access to software. However, the host controller also sits on the PCI bus like a device! This can be thought of as a "management interface", so that the host controller itself can be configured over PCI. Some host controllers might provide multiple buses. In this case, the host controller acts as multiple devices on the PCI bus: one for each bus.
A host controller always appears on slot 0 of bus 0. The function numbers of the host controller represent the bus numbers that the host controller supports. Note that this means a host controller can only provide 8 buses, numbered 0-7. The system can have up to 256 buses, so the other 248 can only be provided by bridges. Software can communicate with the host controller at geographical addresses 0/0/* to configure the individual buses the host controller supports.
Thank goodness for BIOS
Early configuration of the PCI bus includes assigning bus numbers to all bridges and daisy-chaining them together. It also includes BAR mapping, which we will get into later. Luckily, BIOS will do all of this at start-up. Operating systems may choose to reconfigure the PCI bus by reassigning bus numbers, re-daisy-chaining bridges, and remapping BARs, but if BIOS can be trusted this isn't necessary. Linux does not bother to reconfigure PCI bridges or remap BARs. TODO: What about BSD? Windows? Should be easy to check; compare their "lspci" equivalents to what Linux reports on the same system.
The remainder of this document will therefore not cover the configuration of bridges or host controllers or BAR remapping.
Learn to love lspci
The lspci Linux program retrieves information from the kernel about the PCI buses in the system. If executed as lspci -n, the program will display geographical addresses, class codes, and vendor/product/revision IDs for all configured PCI devices. Because Linux sticks with whatever bridge configuration and BAR mapping BIOS arranged, lspci tells you what your code should see. The output of lspci -n is formatted as below.
xx:yy.z ccss: vvvv:pppp (rev rr) xx Bus number yy Slot number z Function number cc Class code ss Sub-class code vvvv Vendor ID pppp Product ID rr Revision ID
If you haven't read the rest of this document, class codes and vendor/product/revision IDs aren't something you're interested in just yet. However, you can learn some interesting information about the layout of the PCI buses just by looking at the geographical addresses.
$ lspci -n 00:00.0 0600 8086:7190 (rev 01) 00:01.0 0604 8086:7191 (rev 01) ... 02:00.0 0200 8086:100f (rev 01)
In the above example, we see right away that the system's host controller only provides one bus. We know this because the host controller will appear as a PCI device on bus 0 slot 0 with a function for every bus it supports, but this system only has a single device on bus 0 slot 0 (namely, geographical address 00:00.0). At the same time we see a device (02:00.0) whose geographical address indicates it is attached to bus 2. This means there must be a bridge in the system. As it happens, the second device listed (geographical address 00:01.0) is that bridge. Using only the geographical addresses we can put together a map of the host controller, the buses it provides, and any other buses that are provided by bridges, along with the number of slots and functions that are used on each bus. Until we learn a little more, however, we won't know which devices are those bridges and which devices are other useful things, like network cards.
Configuration Space
Every PCI device has a set of internal registers called its configuration space. The host controller provides port-mapped IO that allows software to read and write registers within each device's configuration space. This port-mapped IO is called the Configuration Access Mechanism, or CAM. The contents of a device's configuration space includes many useful fields that describe the purpose of the device. The configuration space also includes BARs, which tells us how to interact with the device beyond CAM. To put it shortly, CAM retrieves "header" information that can be parsed without knowing the details of the device, while BARs would be used by drivers to interact with the guts of the device.
Configuration Access Mechanism
The host controller implements CAM through PIO ports 0x0CF8 and 0x0CFC. Writing a 32-bit value to port 0x0CF8 informs the host controller which device (by geographical address) and which register number within that device we are interested in reading or writing. Reading from port 0x0CFC will cause the host controller to read that register in that device's configuration space and return it to software. Writing to port 0x0CFC will cause the host controller to write to that register in that device's configuration space.
(figure here, illustrating this standard method of accessing "shadowed" registers)
The 32-bit value written to port 0x0CF8 breaks down as follows.
(figure here)
The following pseudocode selects a register within a device by geographical address.
cam_select: bus, slot, function, register val32 = 0x80000000 | bus << 16 | slot << 11 | function << 8 | register << 2 out(0x0CF8, val32)
The following pseudocode reads 32-bit, 16-bit, and 8-bit registers from within a device.
cam_read_32: bus, slot, function, register -> val cam_select(bus, slot, function, register) val = in(0x0CFC)
cam_read_16_0: bus, slot, function, register -> val val32 = cam_read_32(bus, slot, function, register) val = val32(15:0)
cam_read_16_1: bus, slot, function, register -> val val32 = cam_read_32(bus, slot, function, register) val = val32(31:16)
cam_read_8_0: bus, slot, function, register -> val val32 = cam_read_32(bus, slot, function, register) val = val32(7:0)
cam_read_8_1: bus, slot, function, register -> val val32 = cam_read_32(bus, slot, function, register) val = val32(15:8)
cam_read_8_2: bus, slot, function, register -> val val32 = cam_read_32(bus, slot, function, register) val = val32(23:16)
cam_read_8_3: bus, slot, function, register -> val val32 = cam_read_32(bus, slot, function, register) val = val32(31:24)
A normal device's configuration space
TODO
A bridge's configuration space
TODO
A host controller's configuration space
TODO
Base Address Registers
TODO
Port-mapped IO with PCI devices
TODO
Memory-mapped IO with PCI devices
TODO
Interrupts
TODO
Recursive descent AML parser
The latest ACPI specification defines TODO AML objects. Most of them are just wrappers/containers for other objects. For example, a TermObj is either a NameSpaceModifierObj, a NamedObj, a Type1Opcode, or a Type2Opcode. And a NameSpaceModifierObj is either a DefAlias, a DefName, or a DefScope. And so on.
The goal of an AML parser is to transform a blob of AML bytecode in memory into a parse tree, which can subsequently be walked to look for various types of objects. Pages 814 through 825 of the ACPI specification provide the formal grammar for every AML object. For example, a null-terminated ASCII string is defined as follows.
String := StringPrefix AsciiCharList NullChar
StringPrefix := 0x0D
AsciiCharList := Nothing | <AsciiChar AsciiCharList>
AsciiChar := 0x01 - 0x7F
NullChar := 0x00
You can read this out loud as below:
A String consists of three objects: a StringPrefix followed by an AsciiCharList followed by a NullChar. A StringPrefix is a single byte with value 0x0D. An AsciiCharList is either Nothing or an AsciiChar followed by another AsciiCharList. An AsciiChar is a single byte with value between 0x01 and 0x7F, inclusive. A NullChar is a single byte with value 0x00.
If you wanted to represent the string "cat" in AML, you would build a parse tree like this:
String
StringPrefix (0x0D)
AsciiCharList
AsciiChar (0x63)
AsciiCharList
AsciiChar (0x61)
AsciiCharList
AsciiChar (0x74)
AsciiCharList
Nothing
NullChar (0x00)
The resulting bytecode would look like this:
0D 63 61 74 00
The job of a parser (for the String object anyway) is to take that bytecode and produce the corresponding parse tree.
Recursive descent parsing
Many AML objects (like String) are defined in terms of other objects. Others (like AsciiChar and NullChar) are defined in terms of bytes consumed from a blob of AML. This suggests that a "recursive descent" parser might be a good choice for parsing AML. When writing a recursive descent parser you write a parsing function for every object. Many of those parsers (like the parser for String) just call other parsers and collect their results, while some parsers (like AsciiChar and NullChar) actually read bytes from a blob of AML.
Below is a pseudocode example of a recursive descent parser for String and related objects. In this example and throughout this document, blob is some global object from which bytes of AML can be read. We'll start by writing the simple parsers: the ones that aren't defined in terms of any other objects.
function ParseStringPrefix(blob):
if blob.NextByte == 0x0D:
blob.ConsumeByte()
return StringPrefix
else:
return null
function ParseAsciiChar(blob):
if blob.NextByte >= 0x01
and blob.NextByte <= 0x7F:
byte x = blob.ConsumeByte()
return AsciiChar(x)
else:
return null
function ParseNullChar(blob):
if blob.NextByte == 0x00:
blob.ConsumeByte()
return NullChar
else:
return null
function ParseNothing(blob):
return Nothing
Note how ParseStringPrefix and ParseNullChar just check whether the next byte in the blob is the value they're expecting, and if so, consume it and return the appropriate object. ParseAsciiChar does the same, but it also attaches the consumed byte to the object being returned, so that the parse tree will contain the characters from the string. The parser for Nothing is simple: it always returns a Nothing object, without consuming any bytes of AML. Now let's write a parser for AsciiCharList, which is a little trickier, because it's defined recursively in terms of itself.
function ParseAsciiCharList(blob):
a = ParseAsciiChar(blob)
if a:
b = ParseAsciiCharList(blob)
if b:
return AsciiCharList(a, b)
else:
return null
else:
return ParseNothing(blob)
ParseAsciiCharList first calls ParseAsciiChar. If that parser succeeded (meaning that the next byte in the blob was an ASCII character and it's been returned inside of an AsciiChar object) then ParseAsciiCharList calls itself recursively. It then returns an AsciiCharList object consisting of the AsciiChar object and whatever the recursive call returns. If the ParseAsciiChar parser failed (meaning that the next byte in the blob was not an ASCII character) then ParseAsciiCharList calls the ParseNothing parser and returns whatever it returns. This is a tail-recursive function.
Finally, let's write the parser for the String object.
function ParseString(blob):
a = ParseStringPrefix(blob):
if a:
b = ParseAsciiCharList(blob):
if b:
c = ParseNullChar(blob):
if c:
return String(a, b, c)
else:
return null
else:
return null
else:
return null
ParseString doesn't touch the contents of the AML blob at all. It just calls three other parsers and if they all succeed, it returns a String object containing them.
We could continue along this path, writing recursive descent parsers for every AML object. The syntax of the formal grammar by which AML objects are defined makes this very straightforward, although somewhat tedious. Unfortunately, there are two big problems with writing a recursive descent AML parser.
Fixed, implicit and explicit-length objects
AML objects can be sorted into three types: fixed-length, implicit-length, and explicit-length.
Fixed-length objects
A fixed-length object is an object that always consumes the same number of bytes of AML (if it exists in the AML, of course.) AsciiChar and NullChar are examples of fixed-length objects; if present, they each consume a single byte. Fixed-length objects are the easiest to parse, because you simply consume the required number of bytes and make sure they are correct for the type of object being parsed.
Implicit-length objects
Implicit-length objects can take up different numbers of bytes of AML. AsciiCharList and String are examples of implicit-length objects. When parsing a String, you don't know how many bytes of AML you are ultimately going to consume. When parsing an AsciiCharList you don't know how deep the call to ParseAsciiCharList is going to recurse. For some AML objects, like String and AsciiCharList, you don't care how many bytes you'll ultimately consume. You simply call the parsing function and it happily recurses until eventually the recursion terminates (for example, when ParseAsciiChar fails to find any more ASCII characters and so ParseAsciiCharList calls ParseNothing instead). These "don't-care" implicit-length objects are easy to parse using recursion.
For some implicit-length objects, however, you have to know when to stop parsing, because there's no "end point" or "terminating byte" that causes recursion to stop. The AML objects that behave like this are too complex to look at right now, so instead we'll make up an object that contains a list of zeroes. Here is the grammar for this made-up object.
ZeroList := Nothing | <Zero ZeroList>
Zero := 0x00
Now suppose the AML bytecode that you wanted to parse consisted of two bytes:
00 00
Now ask yourself, is this a single ZeroList containing two Zero objects? Or is two ZeroLists, each containing one Zero object? Does the parse tree look like this:
ZeroList
Zero
ZeroList
Zero
ZeroList
Nothing
Or this:
ZeroList
Zero
ZeroList
Nothing
ZeroList
Zero
ZeroList
Nothing
Both of those are perfectly valid ways to parse that AML bytecode. Note that AsciiCharList avoids this confusion because it only ever appears within a String, and it's book-ended by a StringPrefix on one side, and a NullChar on the other. But there are quite a few real AML objects that behave just like our ZeroList example.
When parsing these "must-know" implicit-length objects, the parser needs to know how many bytes it should parse before stopping. The parsers for Zero and ZeroList could look like this.
function ParseZero(blob)
if blob.NextByte == 0x00:
return Zero
else:
return null
function ParseZeroList(blob, count):
if count == 0:
return Nothing
else:
a = ParseZero(blob)
if a:
return ZeroList(a, ParseZeroList(blob, count - 1))
else:
return null
Notice how ParseZeroList decrements the "count" variable before recursing, and if count ever reaches zero it knows that it's time to apply the "end-of-recursion" case.
But where does the "count" variable come from? Well, "must-know" implicit-length objects always appear inside of explicit-length objects, which we will discuss next.
Explicit-length objects
Explicit-length objects are objects whose AML bytecode includes the actual total length of the object (and of any sub-objects it contains). The total length of an object is stored in a PkgLength object. We won't look at the encoding for a PkgLength object right now, but it's usually a single byte that says how many bytes are in the current object, from the PkgLength onwards. We'll make up another object to illustrate this.
Happy := HappyPrefix PkgLength ZeroList
HappyPrefix := 0xFF
Like our made-up Happy object, real AML explicit-length objects always include a PkgLength object, typically as the second object, just after an "opcode" style object and just before a "must-know" implicit-length list. Suppose we were parsing the following AML.
FF 03 00 00 00
We begin executing a parsing function for Happy, which might look something like this:
function ParseHappy(blob):
a = ParseHappyPrefix(blob)
if a:
b = ParsePkgLength(blob)
count = b.Value - 1
if b:
c = ParseZeroList(count)
if c:
return Happy(a, b, c)
else:
return null
else:
return null
else:
return null
First we call ParseHappyPrefix, which consumes the first byte of the blob, 0xFF. Then we call ParsePkgLength, which consumes the second byte of the blob, 0x03. We take the value consumed and subtract one to account for the length of the PkgLength object itself. Then we call ParseZeroList, passing along the remaining number of bytes to be consumed.
With one exception, every "must-know" implicit-length AML object appears within an explicit-length object (although it might be nested rather deep within it). That one exception is the other big difficulty in parsing AML, to be addressed next.
Object references
AML objects are often referenced by-name within other AML objects. The MethodInvocation object is used to represent object references and calls to methods defined within the AML. (A simple object reference is the same as a method that doesn't take any arguments.) We're not going to look at the DefMethod object right now, which defines a method to be invoked, but it specifies the name of the method, the number of arguments it takes, and a list of objects - usually math or comparison operations - to be performed on them. A simplified version of the MethodInvocation object is defined below.
MethodInvocation := String TermArgList
TermArgList := Nothing | <TermArg TermArgList>
TermArg := MethodInvocation
...
From the grammar we can see that a MethodInvocation consists of the name of the method to be invoked, followed by a list of arguments passed to the method. Each argument is itself a MethodInvocation. A simple object reference has no arguments, so its TermArgList is Nothing.
Unfortunately, while TermArgList is a "must-know" implicit-length object, MethodInvocation is not an explicit-length object. For example, consider the following AML.
0D 63 61 74 00 0D 64 6F 67 00 0D 66 6F 6F 00 0D 62 61 72 00
This AML consists of four strings in a row: "cat", "dog", "foo", and "bar". So is this an invocation of the "cat" method, taking three arguments, which are references to the "dog", "foo", and "bar" objects...
cat(dog, foo, bar)
...or is it an invocation of the "cat" method, taking two arguments, the first of which is a reference to the "dog" object, and the second is an invocation of the "foo" method, which itself takes a single argument, the "bar" object...
cat(dog, foo(bar))
...or is it something else? There is no way to know how to parse a MethodInvocation without knowing whether an object is a simple object or a method, and if a method, how many arguments it requires. A parser might look like this.
function ParseMethodInvocation(blob):
a = ParseString(blob)
if a:
if IsMethod(a):
num_args = GetNumArgs(a)
b = list[num_args]
for i = 0 to num_arguments - 1:
b[i] = ParseMethodInvocation(blob)
return MethodInvocation(a, b)
else:
return MethodInvocation(a)
else:
return null
So where do the IsMethod and GetNumArgs functions get their information? The parser must store this information somewhere whenever a named object is parsed, as illustrated below for the DefMethod object.
function ParseDefMethod(blob):
// Not actually showing how to parse a DefMethod, but it includes
// the name of the method and the number of arguments the method
// requires.
SetIsMethod(name)
StoreNumArgs(name, num_arguments)
return DefMethod(...)
But what if the corresponding named object is defined after the MethodInvocation? These kinds of "forward references" mean that you cannot parse AML in a single pass. Whenever you encounter a MethodInvocation but have not yet encountered an object with the name of the method being invoked the parser must stop and backtrack all the way back to the nearest explicit-length object. All the AML for that object must be stored for later parsing.
An example will make this clearer. Consider the following grammar for some made-up objects.
Funky := FunkyPrefix PkgLength Goofy
FunkyPrefix := 0xAA
Goofy := GoofyPrefix MethodInvocation
GoofyPrefix := 0xBB
Consider the following AML.
AA 07 BB 0D 62 61 72 00 01 02 03 04...
Clearly we have a FunkyPrefix followed by a PkgLength with value 0x07, which tells us that the next 6 bytes are part of the Funky object. The partial parse tree might look like this.
Funky
FunkyPrefix
PkgLength (0x07)
(Unparsed AML: BB 0D 62 61 72 00 01 02 03 04 ...)
Continuing on, we have a GoofyPrefix and we expect there to be a MethodInvocation afterwards. We parse the String and find that the method to be invoked is named "bar". Now our partial parse tree might look like this.
Funky
FunkyPrefix
PkgLength (0x07)
Goofy
GoofyPrefix
MethodInvocation
String "bar"
(Unparsed AML: 01 02 03 04 ...)
Suppose we look up "bar" in our database of methods, but we don't find anything - clearly we haven't found an object with the name "bar" yet. This means we can't finish parsing the MethodInvocation. If we can't finish parsing the MethodInvocation, we can't finish parsing the Goofy object either. And if we can't finish parsing the Goofy object we can't finish parsing the Funky object. However, the Funky object is an explicit-length object, so we know how much AML is inside it. So we can wrap that AML up in a special object temporarily, and continue parsing. Now our parse tree looks like this.
Funky
FunkyPrefix
PkgLength (0x07)
AML wrapper {BB 0D 62 61 72 00}
(Unparsed AML: 01 02 03 04 ...)
Suppose we continue parsing the unparsed AML and eventually we find a DefMethod object with name "bar", and it says that the "bar" method doesn't take any arguments. Now we can revisit the AML wrapped inside the Funky object and parse it again. We work our way back to the MethodInvocation and this time we know that there's nothing expected after the "bar" method name. Our completely parsed Funky object looks like this.
Funky
FunkyPrefix
PkgLength (0x07)
Goofy
GoofyPrefix
MethodInvocation
String "bar"
This isn't too bad, right? You just have to keep a registry of method and object names as you parse them, and wait until you've parsed all of them before you attempt to parse any MethodInvocations.
Unfortunately, in this document so far we've described method/object names as Strings, but that's not the case. Instead, method/object names are NameString objects, which are more like paths in a filesystem. To find the corresponding DefMethod (or other named object definition) for a particular MethodInvocation you have to understand the ACPI namespace and AML scoping. Have faith: this is the last AML concept needed to completely parse arbitrary AML with a recursive descent parser.
ACPI namespace and scoping
TODO
AML errata
The ACPI specification includes a number of mistakes in the AML grammar. TODO: summarize all the mistakes I've found