We’ve been doing lots of interesting stuff around accelerated compositing in QtWebKit in the last year since my last blog post, and several people have asked for information about it. So here goes a brain-dump, hope some people find it useful 🙂
If you want to know more about the basics of accelerated compositing, please read Ariya’s blog first (or this one, also from Ariya), as this post is maybe a bit more advanced. Please proceed if you’re interested in QtWebKit internals regarding accelerated graphics…
First of all, TextureMapper has now been enabled by default in trunk for a while; This means that QtWebKit now has its own mini-scenegraph, which is optimized for CSS animations and transforms and for nothing else.
Since TextureMapper has no public API, it’s easy to modify it and to optimize it further for any strange CSS ideas that come in the future, such as the new CSS shader proposal, and to incorporate all the odd CSS 3D-transform rules like preserves-3d.
Second, a lot of work was done to make the standard web content work well with the tiled backing store. On WebKit2, the WebKit multi-process model, this becomes ever more important, as the CPU-based rendering is done in a background process, and then passed to the UI process as shared memory that gets uploaded to textures. The UI process can remain responsive even if the web process takes a long time rendering text, paths or other non-trivial tasks. With a good cross-process tiled backing store, we apply interesting user-experience optimizations for touch manipulation, such as rendering tiles that are likely to appear on the screen soon, based on the user’s scrolling trajectory.
Speaking of WebKit2…
Getting accelerated compositing working in a cross process environment was an interesting challenge. Different ports have solved it in different ways; For Apple, accelerated compositing works on top of CoreAnimation which is a cross-process architecture to begin with. Chromium uses an additional “GPU process” which is not feasible for the mobile platforms which we were trying to support. The main question that we had to solve, was where to perform the actual composition. WebKit gives us a layer tree, image updates to each layer, and animations and rendering info for each layer. All of that happens on the web process. In our browsers and in the QML web-views, we use the GPU extensively in the UI process as well, for example for zooming with a two-finger gesture. How do we integrate GPU operations driven by web page in the web process, with GPU operations driven by the user in the UI process?
There were two options:
- Perform the composition of layers in the web process with OpenGL, and composite the result with the application’s scene in the client, using cross-process image sharing such as X11 or EGL.
- Pass all the layer tree information and image information to the UI process, and do the actual OpenGL compositing in the UI process.
We originally started with option (1) as it was straightforward. The problem was that on mobile platforms that resulted in repetitive context-switching, difficulties in syncing, and in general a low frame rate which we couldn’t overcome. So we moved to option (2). This took a lot of boring broiler-plate code to serialize the layer tree and animation information to the UI process, but in the end it paid off. We get a bit of an overhead to get the layer tree set up and the animation started, but when it is, the actual compositing of frames is done entirely in the UI process, and is composited in one place using OpenGL together with the user-driven compositing like zoom/scroll gestures.
When we had that in place for WebKit2, there were a few problems that arose due to the fact that the accelerated compositing path and the tiled backing-store path were different. We had used the tiled backing store path only for the “non-composited” content on the screen, and painted the composited layers on top. While this had let us optimize heavily for the non-composited content, which is the bulk of content on the internet, it created some annoying bugs. Those bugs came about for sites that had big chunks of content, with some effects like animated rotations that usually get composited. We had a lot of intelligence, such as applying contents-scale to the rendered content to make it look crisp if you zoom in or out, that was part of the tiled backing-store code path, so it never got called for the accelerated compositing code-path. This meant that some content was crisp and other content was blurry, based on whether or not it happened to be painted with the accelerated-compositing code path. That is, of course, iffy… not to mention tearing when we had animations driven both in the web-process and in the UI process (for example a “left” transition on one element and a “-webkit-transform” transition on another element).
In the version currently being worked on for trunk, we took one step further towards fixing this. Instead of having the composited layers “on top” of the non-composited layers, we run everything through the accelerated compositing code path, having the non-composited content as one of the layers, just at a lower z-index. This allows us to synchronize all the layers and their tiles together, and to apply the crisp content scaling to all the layers, whether they’re the bottom one or not.
Why not use the QtQuick Scenegraph?
This question came up several times, so I’ll try to answer here. I think an issue we’ve always faced with QGraphicsView is that it was a tool that was used for too many things, and in the end it was hard to optimize it for a single use case without hindering the other use cases. As the name implies, the Qt Quick Scenegraph is built from the bottom up to make QML applications run fast, and it’s very good at doing that. For accelerating the CSS 3D/animation scene, there are a lot of rules and details that need to be applied in order to make it not just fast, but also correct and compliant to the WebKit CSS tests. In our case, the public API is CSS3, and making the code go through another public API (such as QGraphicsView or Qt Quick Scenegraph) adds complexity that is not needed. We want to let the QtQuick scenegraph do what it’s best at, and provide the optimal code path for CSS to render. Those two don’t necessarily have to match. Anyway, it was a tough choice that we didn’t take lightly, and we’ll revise it again if the circumstances change.
There’s tons of problems to solve, and a few people are already hacking away on solving them. I hope this “TODO” list off the top of my would help understand where we are.
- Serializing transform/opacity animations – that part was not upstreamed yet, so right now we pass IPC messages with layer-tree information for every animation frame. It’s, however, in the works.
- Video – We don’t currently connect video with accelerated compositing. This is necessary for doing full-screen video “the right way”, and also if we want to accelerate in-page video without reading the pixels for every frame.
- WebGL – this is an interesting problem to solve in WebKit2, especially on Linux. We’ll have to figure out how to efficiently composite the results of WebGL, drawing into X11 in the web process, with the GL context running in the UI process. Chromium has already done some interesting things in that field that we can look at.
- Reflections – this is an underused CSS3 feature that requires heaps of code to support it efficiently… we currently don’t pass all the tests for reflections with compositing, and it’s a subject in itself.
- CSS shaders – right now it’s not implemented at all in WebKit, but once it does we need to hook it into the WebKit2 serialization and to TextureMapper.
- Exploring direct compositing of other elements – for example, can we use shaders for text or gradients in places where it makes sense?
- Tidbits of testing, fixing and optimizations are always needed and always welcome.
- Eight, eight, I forgot what eight was for.
- Accelerate 4-dimensional transforms for CSS4.
LayoutTests/compositing is a good place to start if you want to see what’s possible with compositing and what works/doesn’t work with our implementation.