User:Greasemonkey/Intel GenX

From OSDev Wiki
Jump to: navigation, search

The Intel GenX GPU architecture (probably) covers the Intel HD and the later Intel GMA series of GPUs, starting with the Intel 965. Later revisions tend to add and remove and readd and relocate features and instructions over time, but otherwise remain fairly similar.

This will cover Gen4 (963, 965, G35) and later. Earlier GPUs do not have open documentation and are apparently very different.

As there is an awful lot to document, this is probably a better place for tutorials.

This article is a stub! This page or section is a stub. You can help the wiki by accurately contributing to it.


Systems tested

  • Compaq CQ60-210TU: GM45 DevCTG (GMA 4500MHD; Cantiga) with 1366x768 LVDS panel
  • HP Pavilion dv6-6c35tx: i5-2450M DevSNB (HD 3000) with 1366x768 LVDS panel

Anything that doesn't exactly match the specs listed here may differ. Approach at your own risk.

Finding the GPU

Tested on:

  • GM45 1366x768 CQ60-210TU
  • HD3000 1366x768 dv6-6c35tx

Probe the PCI bus. It should be located at Bus 0, Device 2, Function 0.

GMAs tend to be advertised as at least two functions, so you may wish to check Function 1 as well. HDs tend to be advertised as just one.

Either way, there appears to be no difference between using BAR0 of Function 0 and BAR0 of Function 1, and only Function 0 gives a window into stolen memory, so you might be able to get away with just using Function 0.

With that said, make sure you check the IDs to ensure that they match hardware that you've actually tested. Different GPUs have different bugs and thus require different workarounds - even when they're within the same family.

Here's the information, assuming you are accessing everything through Function 0:

  • uint64_t @ PCI 0x10: Location of MMIO registers. (XXX: for devices with a Function 1, this appears to do more than just MMIO registers - Function 1 should be able to provide a space with "just" MMIO registers.)
  • uint64_t @ PCI 0x18: Location of stolen memory.

The 64-bit pointers have the lower 4 bits set to 0x4, so remember to mask it out before you use it, and remember to maintain that mask before writing back.

Getting a display

Tested on:

  • GM45 1366x768 CQ60-210TU
  • HD3000 1366x768 dv6-6c35tx

Proper mode switch methodology

Consult the appropriate section of your manual. It tends to be in Volume 3 at the start of one of the parts.

  • G45: Vol3, pg 28
  • SNB: Vol3.2, pg 8

Simplified version

This assumes that the BIOS configured the LVDS panel properly and/or in a way that we can just take its values and run with them. If it does, you can avoid an awful lot of pain.

This also assumes that your panel is less than 2048 pixels wide, but that can be fixed by adding a few "if" statements. However, the 1366x768 assumption is dropped from this version.

There is a chance that this will work on systems that don't have an LVDS panel, but you will be limited to whatever resolution the BIOS spews out.

If you want a non-native resolution, enable the panelfitter and adjust PIPEnSRC to suit. Note, the GMA panelfitter does love to blur everything. (HD 3000 seems fine.)

Here's how you get from VGA text mode to native 32bpp full-res BGRX mode the easy way, and by easy we mean this is very much empirical, about as risky, and assumes the BIOS doesn't have stupid bugs in it:

	// Set VGA screen off
	outportb(0x3C4, 0x01);
	outportb(0x3C5, inportb(0x3C5)|(1<<5));

	// Wait at least 100us (Gen4.5 only needs 20us, but Gen6 needs 100us)

	// Get correct VGA pipe
	// WARNING: Gen5 changes the location of this register!
	// Pre-Gen5 uses 0x71400, Gen5+ uses 0x41000.
	int real_VGACNTRL = (genx_typ < 0x5000 ? VGACNTRL : VGACNTRL_ILK);
	genx_pipe = (genx_reg32[real_VGACNTRL]>>29)&1;

	// Disable VGA
	genx_reg32[real_VGACNTRL] |= (1<<31);

	// Disable Display n
	genx_reg32[DSPnCNTR(genx_pipe)] &= ~(1<<31);

	// Set PIPEnSRC to screen resolution
	uint32_t vis_w = (genx_reg32[HTOTAL_n(genx_pipe)] & 0xFFFF)+1;
	uint32_t vis_h = (genx_reg32[VTOTAL_n(genx_pipe)] & 0xFFFF)+1;
	genx_reg32[PIPEnSRC(genx_pipe)] = ((vis_w-1)<<16)|(vis_h-1);

	// XXX: Lacking information on DevSNB's panelfitter.
	// Seems to work fine without disabling it.
	// For Gen4.5, however, disabling it is pretty much mandatory,
	// unless you can't stand GPUs that lack antialiasing.
	if(genx_typ >= 0x5000)
		genx_reg32[PIPEnCONF(genx_pipe)] &= ~(1<<31); // Disable Pipe n
		while((genx_reg32[PIPEnCONF(genx_pipe)] & (1<<30))) {} // Wait for Pipe n to stop

		genx_reg32[PFIT_CONTROL] &= ~(1<<31); // Disable panelfitter

		genx_reg32[PIPEnCONF(genx_pipe)] |= (1<<31); // Enable Pipe n

	// Set up Display n and enable
	genx_reg32[DSPnLINOFF(genx_pipe)] = 0x00000000; // linear offset
	genx_reg32[DSPnSTRIDE(genx_pipe)] = 2048*4; // scanline pitch
	genx_reg32[DSPnSURF(genx_pipe)] = 0x00000000; // surface base
	genx_reg32[DSPnCNTR(genx_pipe)] = (genx_reg32[DSPnCNTR(genx_pipe)] & ~(15<<26)) | (6<<26); // bit depth select, 6 = 32bpp BGRX
	genx_reg32[DSPnCNTR(genx_pipe)] = (genx_reg32[DSPnCNTR(genx_pipe)] & ~(3<<20)) | (0<<20); // pixel multiply
	genx_reg32[DSPnCNTR(genx_pipe)] = (genx_reg32[DSPnCNTR(genx_pipe)] & ~(1<<10)) | (0<<10); // tiling flag
	genx_reg32[DSPnCNTR(genx_pipe)] |= (1<<31); // enable

With the above config, your framebuffer should be located at the start of stolen memory, and be 2048 32bpp BGRX pixels wide internally.

Setting monitor timings

Not tested on the dv6-6c35tx.

If you need to set the monitor timings, this code should be useful for a start. These timings work on the CQ60-210TU's surprisingly tolerant LVDS panel:

	uint32_t vis_w = 1366;
	uint32_t vis_stretch_w = vis_w;
	uint32_t vis_sblank_w = 80;
	uint32_t vis_sync_w = 128;
	uint32_t vis_eblank_w = 200;

	uint32_t vis_h = 768;
	uint32_t vis_stretch_h = vis_h;
	uint32_t vis_sblank_h = 3;
	uint32_t vis_sync_h = 5;
	uint32_t vis_eblank_h = 22;

	uint32_t vis_blank_w = vis_sblank_w + vis_sync_w + vis_eblank_w;
	uint32_t vis_blank_h = vis_sblank_h + vis_sync_h + vis_eblank_h;
	genx_reg32[HTOTAL_n(genx_pipe)] = ((vis_w+vis_blank_w-1)<<16) | (vis_w-1);
	genx_reg32[HBLANK_n(genx_pipe)] = ((vis_w+vis_blank_w-1)<<16) | (vis_w-1);
	genx_reg32[HSYNC_n(genx_pipe)] = ((vis_w+vis_sblank_w+vis_sync_w-1)<<16) | (vis_w+vis_sblank_w-1);
	genx_reg32[VTOTAL_n(genx_pipe)] = ((vis_h+vis_blank_h-1)<<16) | (vis_h-1);
	genx_reg32[VBLANK_n(genx_pipe)] = ((vis_h+vis_blank_h-1)<<16) | (vis_h-1);
	genx_reg32[VSYNC_n(genx_pipe)] = ((vis_h+vis_sblank_h+vis_sync_h-1)<<16) | (vis_h+vis_sblank_h-1);
	genx_reg32[PIPEnSRC(genx_pipe)] = ((vis_stretch_w-1)<<16)|(vis_stretch_h-1);

Getting the ring buffer to work

Tested on:

  • GM45 1366x768 CQ60-210TU
  • HD3000 1366x768 dv6-6c35tx

References for RING_BUFFER_* registers

  • G45: Vol1a, pg 238
  • HD3000: Vol1p3, pg 39

References for commands

  • G45: Vol1b
  • HD3000: Vol1p3-5

General notes

The ring buffer is vital for being able to send commands to the GPE (Geometry Processing Engine) and whatnot. Commands are at least 1 DWord long, but the ring buffer indices are aligned to the nearest QWord.

Note, the tail indicates the end of the buffer, and is where you write your commands to. The head indicates the start of the buffer, and is where the GPU reads from. While the head is DWord-aligned, the tail is QWord-aligned, so you may need to pad your instructions by inserting a MI_NOOP (0x00000000 will do).

It's possible to start the ring buffer and then advance the tail when you have a new command or batch of commands.

RING_BUFFER_START denotes the address relative to that stolen memory space.

RING_BUFFER_HEAD and RING_BUFFER_TAIL need to be given byte offsets, so if you add, say, two DWords, you'd add 8 to RING_BUFFER_TAIL.

Apparently you don't need a GTT to get this working.

If you want to check to see if this is working, NOPID is a useful register. The Gen6 docs seem to be missing the location of this register, however it is in the same location as Gen4.5 (0x02094).

Using the blitter

Tested on:

  • GM45 1366x768 CQ60-210TU

The HD3000 was tested at some stage but the ring buffer stopped, suggesting an invalid instruction, although there may have also been a GTT issue.

General notes

Firstly you'll want to know your "raster op" modes. They are conceptually the same as the Amiga's modes, although probably with different sources.

Here's a list of useful modes:

  • 0xF0: Set to pattern (useful for COLOR_BLT)
  • 0xCC: Set to source image (useful for SRC_COPY_BLT)
  • 0xAA: Set to destination (useful for not much)
  • 0x55: Set to opposite of destination (useful for XOR effect)

Top bit determines the result when all of {pattern, source, destination} are 1. Bottom bit determines the result when all are 0.

The rest is All There In The Manual.

WARNING: When the manual says "pitch in dwords", what they really mean is "pitch in bytes, aligned to a dword boundary".

There are two sets of blitter commands: The XY_ commands, and the other commands. The other commands are a bit simpler, but there are only two of them.

  • COLOR_BLT fills a rectangular area with a solid colour. Useful for clearing the screen like a boss.
  • SRC_COPY_BLT copies from one place in memory to another. If you have no GTT, this will be GPU-to-GPU only. Still faster, easier and more powerful than EGA/VGA.

Here's an example of a 32bpp COLOR_BLT used to clear a screen with a nice purple tinge:

		| (0x2<<29) | (0x40<<22)
		| (0x3<<20) // a:rgb mask
		| 0x03
		| (3<<24) // bit depth
		| (0xF0<<16) // raster op
		| ((screen_pitch_pixels*4) & 0xFFFF) // pitch in bytes, dword-aligned
	genx_rb_push(((screen_height)<<16)|((screen_width)<<2)); // height in scanlines, width in bytes
	genx_rb_push(0x00330066); // XXRRGGBB - HTML colour #330066

The XY_ commands allow you to specify ranges using X,Y coordinate pairs and apply clipping based on those pairs.

Here's the XY_COLOR_BLT version of the above:

		| (0x2<<29) | (0x50<<22)
		| (3<<20) // a:rgb mask
		| (0<<11) // tiling enable (tile-X only)
		| 0x04
		| (0<<30) // clipping enable
		| (3<<24) // bit depth
		| (0xF0<<16) // raster op
		| ((screen_pitch_pixels*4) & 0xFFFF) // pitch in bytes, dword-aligned
	genx_rb_push((0<<16) | (0)); // Y1:X1 top-left
	genx_rb_push(((screen_height)<<16)|((screen_width))); // Y2:X2 bottom-right
	genx_rb_push(0x00330066); // XXRRGGBB - HTML colour #330066

Note, if you want clipping, you'll want to run an XY_SETUP_CLIP_BLT command, and then enable the "clipping enable" flag.

Making the GTT behave

Tested on:

  • GM45 1366x768 CQ60-210TU

References for GTT page format

  • G45: Vol1a, pg214

Finding GTTADR

Gen4 only (NOT Gen4.5!): 32-bit address "GTTADR" at PCI B0:D2:F0:0x1C. (TODO: confirm)

Gen4.5 and above: 64-bit address "GTTMMADR" at PCI B0:D2:F0:0x18 , then add 2MB (0x200000). (TODO: confirm the "above" bit) tends to use GTTADR from PCI, which is the 32-bit address at PCI+0x1C.

Allocating space for the GTT

Early Gen4

Allocate a block of memory in the stolen memory space. 512KB is the largest you can use for the GTT, and allows for a 512MB virtual addressing space. Ensure that the block of memory is aligned with its size.

Once you have it in place, ensure that the graphics pipeline is flushed (if you don't know what this is, it probably already is flushed), then:

	genx_reg32[GFX_FLSH_CNTL] = 0;
	genx_reg32[PGTBL_CTL] = 1 | gtt_offset; // GTT: 512KB, enabled
	genx_reg32[PGTBL_CTL2] = 0; // disables the PPGTT
	genx_reg32[GFX_FLSH_CNTL] = 0;

We will get to modifying it pretty soon.

  • Paging type 0 is for stolen memory.
  • Paging type 3 is for main CPU memory. The GPU will snoop the cache for you.

Note that the actual screen is rendered using physical, unmapped "stolen" memory addresses.

Also note that direct access to the stolen memory via GMADR also uses the physical unmapped addresses.

Gen4 "Bearlake-C" (G35?) and onwards

Don't allocate it. The chip allocates it for you. Just leave the upper 31:12 in PGTBL_CTL intact when you mess with it. Once you have identity paging in place, set the lower bit.

Of course, if you are paranoid, you can always allocate some memory anyway.

Paging types above are as per pre-Gen6. Gen6 has different paging types, apparently.

Identity paging

In this example, genx_gtt32 points to GTTADR as calculated.

The GPU will handle all the caching issues for you if you use GTTADR. To add to this, in Gen6 this is the only way to access the GTT, so instead of learning the older method of writing via system RAM and then flushing the GPU's cache, you should just use this instead.

	for(i = 0; i < 512*256; i++)
		genx_gtt32[i] = (((i)<<12) | (3<<1) | (1<<0));

Memory-to-GPU blit (and vice versa)

This is for a 32bpp blit.

blk_(width|height) denotes the size of the blit to perform. (src|dest)_gtt are addresses that the GPU will feed through the Global GTT or PPGTT (src|dest)_pitch are the image pitches in DWords.

You must ensure that the GTT has the correct paging type for the given GPU for each page that this will need to use. For Gen4, use 0 for stolen memory, and 3 for system memory.

	genx_rb_push((2<<29) | (0x43<<22)
		| (3<<20) // a:rgb mask
		| 0x04
		| (0<<30) // reverse X direction
		| (3<<24) // bit depth
		| (0xCC<<16) // raster op
		| (dest_pitch*4) // dest pitch in bytes
	genx_rb_push((blk_height<<16) | (blk_width*4)); // dest dims
		| (src_pitch*4) // src pitch in bytes
	genx_rb_push(src_gtt); // src addr

XY_SRC_COPY_BLT and whatnot should also work just fine, including with clipping and the like.

See Also


External Links

Personal tools