Using hardware acceleration for graphics

I am one of our QWS developers. QWS, the Qt Window System, is the heart of Qt for Embedded Linux, formally Qtopia Core, formally Qt/Embedded. :-) What's great about working on embedded is that you have a view of the system as a whole - the complete stack. So it's my job to have a pretty good idea how QPainter commands you write in a widget's paint event end up as voltage levels rapidly alternating between +3.3v and 0v on wires going to your LCD. While QWS handles all the usual window system tasks such as keyboard focus and mouse events, the biggest component is probably graphics. The window system is inherently part of the graphics stack and I actually spend a lot of my time working with the graphics team.

Over the last year our clear focus has been on performance. This performance push has been in all areas on all platforms and architectures. When it comes to graphics, there's a very broad range of hardware Qt can expect to see - from a simple MMU-less ARM with only a frame-buffer all the way to scary gamer PCs with thousands of graphics cards & neon lights installed. Our lives are made complicated because if we're not careful, one can end up writing code which runs faster on that little arm than it does on the gamer PC. This post hopes to explain why this is so.

Let's begin by categorising the range of hardware available:

First, we need to differentiate Unified Memory Architecture (UMA) devices from those with dedicated graphics memory. Generally, high-end hardware will have dedicated graphics memory whereas low-end devices will just use system memory (sometimes reserving a memory region, sometimes not). This is pretty strait forward on PCs - You can almost tell from a PC's price tag if it has dedicated graphics memory. Sadly, in the world of embedded devices, this is not the case. High-end devices often have UMA and low-end devices (especially set-top-boxes) have dedicated graphics memory.

The next differentiation is the graphics operations supported by the hardware. Generally they are wide ranging but can be loosely categorised as:

1) No acceleration (framebuffer only)
2) Blitter & alpha-blending hardware
3) Path based 2D vector graphics
4) Fixed-function 3D
5) Programmable 3D

Hardware with no acceleration whatsoever or a simple video overlay is the most common we see in embedded devices. This will always be the case until someone figures out how to design and manufacture silicon for free. Blitter and alpha-blending hardware is almost non-existent on desktops these days, but it does seem to still be around in the current generation of embedded hardware. Path based 2D vector graphics is pretty new and looks ready to replace blitter-only style hardware. NOTE: This does not refer to hardware which can draw a 1-pixel wide, non-anti-aliased, non-dashed, solid-colour line without clipping. Fixed-function 3D tends to be the older generation of desktop graphics processors. Generally, fixed function has pretty much been replaced with programmable 3D. This is even the case on mobile hardware.

So, there's five categories of operations and two types of memory architecture leading to ten different overall types of graphics hardware. I've collected an example of each, just so you know we don't make this stuff up. :-)

Type UMA Non-UMA
None Marvel PXA270 Various*
Blitter NXP PNX8935** Fujitsu Lime MB86276***
2D vector Freescale i.MX35
Fixed-3D Freescale i.MX31 nVidia GeForce 2
Programmable-3D TI OMAP3530 AMD Radeon HD 4600

* Various: Some devices use dedicated framebuffer memory to reduce load on the system memory bus
** NXP PNX8935: http://www.nxp.com/applications/set_top_box/ip_stb/stb225/
*** Fujitsu Lime MB86276: http://www.fujitsu.com/downloads/MICRO/fma/pdf/MB86276.pdf

The next question then becomes: How can Qt off-load graphics operations to these different types of hardware? Well this is done through Qt's QPaintEngine API. The idea is that Qt applications (& Qt itself of course) always uses QPainter, which in turn uses one of the paint engines. To take advantage of graphics acceleration, we write a new paint engine (like the OpenGL ES 2.0 engine we've added in 4.5.0). The advantage is that existing applications can benefit from new rendering back ends and new applications can still work on older or less advanced hardware (albeit with lower performance). There seems to be a misconception in the community that Qt is out-of-date because it has no OpenGL scene graph API. While that statement is technically correct, Qt does have QGraphicsView scene graph API which uses QPainter. Because it uses QPainter, if OpenGL (for example) is available, it can be used as the rendering back end.

So, now that's cleared up, what QPaintEngines are there and do we have all the hardware acceleration types covered by them?

Well, for devices with no acceleration, Qt will use it's raster paint engine. The raster engine has seen some very impressive optimizations in Qt 4.5, as Gunnar has previously blogged about. For higher end graphics hardware, there's usually a nice high-level API which is powerful enough to express all of QPainter. I.e. OpenGL & OpenVG. The trouble we've recently hit is the hardware in-between, I.e. those with blitters but not much else. Such hardware is not powerful enough to express the whole of QPainter, so we must fall back to the raster paint engine for unsupported operations. The raster paint engine needs a pointer to the memory it renders to (and reads from). On UMA systems, this is not a problem as the buffer is obviously in system memory (that's all there is!). It's on systems with dedicated graphics memory where the fun begins...

First, on many systems, you simply can not map graphics memory into your process' address space - The architecture simply has no way to do it. On such systems, the buffer must be copied to system memory, rendered to with the raster engine, then copied back. If this happens _every_ time you switch between a fall-back and the hardware, it's going to be _slow_!!

On some systems (particularly PowerPC for some reason?), the graphics controller sits on the SoC's external bus and can be addressed directly by an application. All that needs to happen is for the kernel to configure the process's page table to point to the graphics controller's memory range. It's then up to the graphics controller to access data in it's dedicated memory on behalf of the host CPU. Although this kind of set-up does allow the raster paint engine to get a pointer to graphics memory, all accesses go over this external bus - which is usually slow. On PC/x86 architecture, things get more even more complicated, the kernel has to fiddle with lots more hardware, cache controllers, PCI bus controllers, IOMMUs, etc. However, in all cases, if you're lucky enough to get a pointer to graphics memory, all access must go over a slower external bus.

So now we know what's going on, what conclusion can we draw? Well, reading & writing to external graphics memory is slow. If your on non-UMA, don't have OpenGL or OpenVG available, but do want to use your blitter then you'd better make sure your mostly using QPainter::drawPixmap(). NOTE: Graphics view's cache modes can help you out a lot there - see Andreas' previous posts! ;-) Otherwise falling back to the raster engine is going to be slow. Fortunatly, this type of hardware is (finally) on it's way out.

NOTE: I should also mention that there's a similar issue with X11. There's no API to get a pointer to an X pixmap and X11 does not provide enough API to implement the whole of QPainter. While the X11 paint engine does not inherit from the raster paint engine, it does make use of software fall-backs which involve copying the pixmap, executing the fall-back and then uploading the result. It's for that reason that we've added the raster graphics system which uses system memory (via the MITSHM extension) in Qt 4.5. On desktop, this is a fairly temporary measure until our OpenGL 2.0 engine & graphics system is in a fit state to take over all of Qt's rendering. No promises, but we hope that can happen for Qt 4.6. For X11 on low-end embedded devices (like the n810), MITSHM provides a pretty decent long term solution.

So, when we look to the future of Qt's graphics architecture and the required paint engines, I think we're well on the way to having all the bases covered:

Type UMA Non-UMA
None Raster Raster*
Blitter DirectFB DirectFB**
2D vector OpenVG*** OpenVG***
Fixed-3D OpenGL (ES) 1.x OpenGL (ES) 1.x
Programmable-3D OpenGL (ES) 2.x OpenGL (ES) 2.x****

* When using raster on NUMA, rendering is actually done in system memory first, then flushed to VRAM
** This is the one which is going to be slow when doing anything other than QPainter::drawPixmap()
*** It shouldn't be a big surprise we're researching an OpenVG paint engine!
**** Qt 4.5 contains a new paint engine for OpenGL ES 2.x which we're now making work on desktop OpenGL 2.0

I just want to finish by asking you to take another look at the above table. Do you notice anything interesting? All of the graphics systems (apart from DirectFB) are cross platform which means, when we make something faster in one engine, all platforms will benefit.


Blog Topics:

Comments