User:Bellezzasolo/Intel Graphics Family

From OSDev Wiki
Jump to navigation Jump to search

This page is under construction! This page or section is a work in progress and may thus be incomplete. Its content may be changed in the near future.

Intel have been producing integrated graphics devices in their CPUs or Chipsets for approximately 20 years at the time of writing. Integrated Graphics is generally less glamorous than the dedicated graphics cards that are essentially a requirement for playing modern games.

The advantage for Hobby OS development is that Intel make their programming manuals freely available, covering the entire card. In contrast, AMD publish the ISA for their GPUs, but documentation on basic functions like modesetting is hard to come by - AMD Atombios is probably the place to go for that.

However, Intel have recently entered the discrete GPU market with the ARC range.

The general architecture of the family hasn't changed between the i965/G35 chipset of 2008 (the earliest documentation the author can find), and current generations.

Other Sources

On the wiki, very incomplete:

Intel HD Graphics

Native Intel graphics

Documentation

https://www.intel.com/content/www/us/en/docs/graphics-for-linux/developer-reference/1-0/overview.html

https://www.x.org/docs/intel

General Architecture

Overview image of intel graphics architecture
General Architecture of Intel Graphics

Intel GPUs consist of several engines, and a couple of subsidiary components.

Display Engine

The display engine is responsible for driving the displays. This is the component responsible for setting video modes. It consists of several display pipes, which can be connected to a number of Display Digital Interfaces (DDIs). These contain transcoders and connect to a physical display (or there may also exist options for wireless display or capture - e.g. hardware screen capture support).

A pipe can only drive one DDI, however, DisplayPort Multistream can allow several pipes to be transmitted on one PHY (i.e, running multiple monitors off one DisplayPort cable).

Pipes effectively map to a monitor, and can perform several transforms, as well as framebuffer flipping.

Blitter

Variously called the blitter or copy engine, this is effectively the 2D acceleration component of the GPU, and a prime resource for a hobby OS. The command set is reasonably straightforward, but has a wide array of available 2D transformations, utilising source data, pattern data, and the destination data. For instance, this component can be used to render bitmap fonts, in hardware.

Video Codec Engine

This provides hardware support for decoding and even encoding to video formats like HEVC and AVC1.

3D Engine

Split into several subsystems, but this is the meat of the GPU, its 3D rendering capabilities.

Memory Interface

The memory interface controls access to Graphics and System memory.

The GPU uses a virtual address space, implemented using paging. However, there are two modes: Global Graphics Translation Table (GGTT) - Shared translations Per Process Graphics Translation Table (PPGTT) - Translations tied to a specific process.

Certain objects like Display Framebuffers, must use GGTT for simplicity and performance.

Family

Name CPUs Codename Year Description
G965
G35
Pentium Dual Core

Core 2 Duo

Broadwater

Glenwood
Bearlake

Q2 2006 Initial version that's documented
G45 Core 2 Duo/Quad Eaglelake Q2 2008
Intel HD Graphics
HD Core i3/i5 1st Gen Ironlake Q1 2010
HD 2000
HD 3000
Core 2nd Gen Sandy Bridge Q1 2011
HD 2500
HD 4000
Core 3rd Gen Ivy Bridge Q2 2012
Haswell
Intel UHD Graphics
UHD 6xx Kaby Lake Refresh Q4 2017
ARC
A3xx
A5xx
A7xx
Graphics Card Alchemist Q1 2022
B570
B580
Graphics Card Battlemage Q4 2024

Infrastructure

This section details the basic principles of the hardware, common to all engines.

Detection and Address Spaces

The graphics adapter will show up on the PCI bus. Typically this is at 0:02:0, but it could vary by chipset.

The adapter typically has 3 BARs. Again, details could vary by chipset, however, as far as I can tell, the layout is as follows:

BAR0 - GTTMMADR. BAR for the global translation tables and MMIO registers. MMIO comes first.

(64 bit BARs consume two slots - see PCI. I think all devices are 64 bit.

BAR2 - GMADR. This maps graphics RAM.

BAR4 - IOBAR. For compatibility with legacy VGA registers.

Command Streamers

There are a number of command streamers, which issue commands to engines. Typically, each engine has at least one command streamer, and sometimes several. The exact details vary across generations.

Before Sandy Bridge, the Blitter and 3D Render engines shared a command streamer, but are now seperate.

Memory Interface

Global GTT

The global GTT is a single level page table that is for privileged objects not tied to a specific process. In particular, display framebuffers use this translation scheme.

Page table entries switched from a 32 bit to a 64 bit format at gen8 graphics (Broadwell), and from describing a 2GB virtual address space to a 4GB address space. Note this address space is 32 bit in all SKUs thus far.

The size of the GTT can vary.

Haswell and Earlier

31:12 11:8 7:4 3 2:1 0
PADDR[31:12] RSV0 PADDR[35:32] RSV0 Mapping Type Valid
Mapping Type Description
0 Targets main memory
1-2 Reserved
3 Targets cacheable main memory with snoop

Note that graphics stolen memory is considered main memory here.

Broadwell and Later

63:HAW HAW-1:12 11:2 1 0
Ignored PADDR[:12] Function Number Local Valid

HAW is 39 for client hardware, and 46 for server hardware.


PADDR - Physical address. 4K aligned, matching x86 paging.

Function Number - Function the page is assigned to if using SRIOV hardware virtualisation. Ignored otherwise.

Local - Targets local graphics memory instead of system memory. Ignored if there isn't dedicated GRAM.

Valid - The page is present.

Per-Process GTT

Address spaces are loaded based in several ways. Earlier models use a mechanism similar to x86's CR3.

i965 - Single Level Paging

The per-process page table is set using PGTBL_CTL2 (MMIO+20C4). It must be a 4K aligned page table, and does not use snoops.

PGTBL_CTL2
31:12 11:8 7:4 3:1 0
PTAB[31:12] RESV0 PTAB[35:32] SIZE Enable

SIZE - The size of the page table, in bytes, is 2^(16+SIZE). Values of size >4 (1MB) are reserved.

The format of page entries is the same as for the GGTT.

G45 - Two Level Paging

The page directory can be set via PP_DIR_BASE (MMIO+2518), but should be set via contexts instead.

Broadwell - IA32e Style

Broadwell cores implement a 4 level paging scheme that closely resembles the format used for IA32e Paging.

Display Engine

This is the most important component to implement, without it, no video for you. Unless you use the firmware provided framebuffer, but if so, why are you here? TODO.

Blitter

Probably the easiest component to understand.

Command Set

Command Opcode
XY_SETUP_BLT 01h