Eskil Abrahamsen Blomfeldt

How to shoot yourself in the foot using only a scene graph (neat optimization trick inside)

Published Thursday January 19th, 2017
9 Comments on How to shoot yourself in the foot using only a scene graph (neat optimization trick inside)
Posted in Declarative UI, Dev Loop, Graphics, OpenGL, Performance, Qt, Qt Quick 2, Text and font handling, UI

I am trying to get into the habit of blogging more often, also about topics that may not warrant a white paper worth of text, but that may be interesting to some of you. For those of you who don’t know me, I am the maintainer of the text and font code in Qt, and recently I came across a curious customer case where the optimization mechanisms in the Qt Quick scene graph ended up doing more harm than good. I thought I would share the case with you, along with the work-around I ended up giving to the customer.

Consider an application of considerable complexity: Lots of dials and lists and buttons and functionality crammed into a single screen. On this screen there are obviously also labels. Thousands of static labels, just to describe what all the complex dials and buttons do. All the labels share the same font and style.

Narrowing the example down to just the labels, here is an illustration:

import QtQuick 2.5
import QtQuick.Window 2.2

Window {
    id: window
    visible: true
    title: qsTr("Hello World")
    visibility: Window.Maximized

    Flow {
        anchors.fill: parent
        Repeater {
            model: 1000
            Text {
                text: "Hello World"
            }
        }
    }
}

Now, the way the scene graph was designed, it will make an effort to bundle together as much of a single primitive as possible, in order to minimize the number of draw calls and state changes needed to render a scene (batching). Next, it will try to keep as much as possible of the data in graphics memory between frames to avoid unnecessary uploads (retention). So if you have a set of text labels that never change and are always visible, using the same font, essentially Qt will merge them into a single list of vertices, upload this to the GPU in one go and retain the data in graphics memory for the duration of the application.

Example screenshot

Artist’s impression of a complex Qt application

We can see this in action by setting the environment variable QSG_RENDERER_DEBUG=render before running the application above. In the first frame, all the data will be uploaded, but if we cause the scene graph to re-render (for instance by changing the window size), we see output like this:

Renderer::render() QSGAbstractRenderer(0x2a392d67640) "rebuild: none"
Rendering:
 -> Opaque: 0 nodes in 0 batches...
 -> Alpha: 1000 nodes in 1 batches...
 - 0x2a39351fcb0 [retained] [noclip] [ alpha] [  merged]  Nodes: 1000  Vertices: 40000  Indices: 60000  root: 0x0 opacity: 1
 -> times: build: 0, prepare(opaque/alpha): 0/0, sorting: 0, upload(opaque/alpha): 0/0, render: 0

From this we can read the following: The application has one batch of alpha-blended material, containing 1000 nodes. The full 40000 vertices are retained in graphics memory between the frames, so we can repaint everything without uploading the data again.

So far so good.

But then this happens: Someone adds another label to our UI, somewhere in the depths of this complex graph of buttons, dials and labels. This label is not static, however, but shows a millisecond counter which is updated for every single frame.

While the scene graph does the correct and performant thing for most common use cases, the introduction of this single counter item in our scene breaks the preconditions. Since the counter label will be batched together with the static text, we will invalidate all the geometry in the graph every time it is changed.

To see what I mean, lets change our example and run it again.

import QtQuick 2.5
import QtQuick.Window 2.2

Window {
    id: window
    visible: true
    title: qsTr("Hello World")
    visibility: Window.Maximized
    property int number: 0

    Flow {
        anchors.fill: parent
        Repeater {
            model: 1000
            Text {
                text: index === 500 ? number : "Hello World"
            }
        }
    }

    NumberAnimation on number {
        duration: 200
        from: 0
        to: 9
        loops: Animation.Infinite
    }
}

The example looks the same, except that the 501st Text item is now a counter, looping from 0 to 9 continuously. For every render pass, we now get output like this:

Renderer::render() QSGAbstractRenderer(0x1e671914460) "rebuild: full"
Rendering:
 -> Opaque: 0 nodes in 0 batches...
 -> Alpha: 1000 nodes in 1 batches...
 - 0x1e672111f60 [  upload] [noclip] [ alpha] [  merged]  Nodes: 1000  Vertices: 39964  Indices: 59946  root: 0x0 opacity: 1
 -> times: build: 0, prepare(opaque/alpha): 0/0, sorting: 0, upload(opaque/alpha): 0/1, render: 0

As we can see, we still have a single batch with 1000 nodes, but the data is not retained, causing us to upload almost 40000 vertices per frame. In a more complex application, this may also invalidate other parts of the graph. We could even end up redoing everything for every frame if we are especially unlucky.

So, presented with this case and after analyzing what was actually going on, my first goal was to find a work-around for the customer. I needed to come up with a way to separate out the counter label into its own batch, without changing how anything looked on screen. There may be more ways of doing this, but what I ended up suggesting to the customer was to set clip to true for all the counter labels. Giving the counters a clip node parent in the graph will force them out of the main batch, and the updates will thus be isolated to the clipped part of the scene graph.

import QtQuick 2.5
import QtQuick.Window 2.2

Window {
    id: window
    visible: true
    title: qsTr("Hello World")
    visibility: Window.Maximized
    property int number: 0

    Flow {
        anchors.fill: parent
        Repeater {
            model: 1000
            Text {
                text: index === 500 ? number : "Hello World"
                clip: index === 500
            }
        }
    }

    NumberAnimation on number {
        duration: 200
        from: 0
        to: 9
        loops: Animation.Infinite
    }
}

Still the same code, except that the clip property of the 501st Text item is now set to true. If we run this updated form of the application with the same debug output, we get the following:

Renderer::render() QSGAbstractRenderer(0x143890afa10) "rebuild: partial"
Rendering:
 -> Opaque: 0 nodes in 0 batches...
 -> Alpha: 1000 nodes in 3 batches...
 - 0x143898ee6d0 [retained] [noclip] [ alpha] [  merged]  Nodes:  500  Vertices: 20000  Indices: 30000  root: 0x0 opacity: 1
 - 0x143898ec7e0 [  upload] [  clip] [ alpha] [  merged]  Nodes:    1  Vertices:     4  Indices:     6  root: 0x14389a3a840 opacity: 1
 - 0x143898edb90 [retained] [noclip] [ alpha] [  merged]  Nodes:  499  Vertices: 19960  Indices: 29940  root: 0x0 opacity: 1
 -> times: build: 0, prepare(opaque/alpha): 0/0, sorting: 0, upload(opaque/alpha): 0/0, render: 0

As you can see from the output, the text is now divided into three batches instead of one. The first and last are retained between frames, causing the full upload for each frame to be an insignificant 4 vertices. Since the clip rect contains the bounding rect of the text, we will not actually clip away any pixels, so the application will still look the same as before.

So all is well that ends well: The customer was happy with the solution and their performance problems were fixed.

Also, I gained some ideas on how we can improve the Qt Quick API to make it easier for users to avoid these problems in the future. Since we want performance to be stable from the first frame, I don’t think there is any way to get around the need for users to manually identify which parts of the graph should be isolated from the rest, but I would like to have a more obvious way of doing so than clipping. My current idea is to introduce a set of optimization flags to the Qt Quick Text element, one of which is Text.StaticText, sister to the QStaticText class we have for QPainter-based applications.

In the first iteration of this, the only effect of the flag will be to ensure that no label marked as StaticText will ever be batched together with non-static text. But down the road, maybe there are other optimizations we can do when we know a text label never (or rarely) changes. And this is just one of a few optimization APIs I want to add to Qt Quick Text in the near future, so stay tuned! ๐Ÿ™‚

Do you like this? Share it
Share on LinkedInGoogle+Share on FacebookTweet about this on Twitter

Posted in Declarative UI, Dev Loop, Graphics, OpenGL, Performance, Qt, Qt Quick 2, Text and font handling, UI

9 comments

Sven says:

Thanks for your article.
I expect these optimizations are also done for other elements (e.g. Rectangles, custom components). Wouldn’t it make sense to implement the static property on Item instead of Text, so that every component can profit from this?

By the way, are you going to put all non-static elements into one batch and all static ones into another? It might make sense to go a step further and let the user determine the batch to put elements into (“property int renderBatch: 0”?), so they can group together those elements that “never change”, “change often”, and “change permanently” (or even more fine-grained. And maybe there’s 1000 items changing very often, but 500 of them at every even millisecond and the other 500 at every odd one).

On another note: Is it possible to break apart such a batch afterwards? If it is (and without performance penalty), did you consider putting everything into the same batch as usual, and breaking it up as and when there’s an update that suggests better batches? e.g. if 10 of these texts change at the same frame, split the batch of 1000 texts into 500, 10 and 490 as soon as the first property change comes up, and use these in future.
I realize this goes against the “performance at the first frame” requirement as the frame that causes the split-up takes a bit longer (needs to upload everything again). But this being done automatically, especially without the possibility that the user forgets removing the static property when a static element becomes dynamic, might be worth the compromise. And it inherently provides the feature to batch together those elements that change >at the same time<.

Eskil Abrahamsen Blomfeldt Eskil Abrahamsen Blomfeldt says:

Having optimization flags and the static hint in particular at a higher level (in the Item) definitely sounds like a very good idea to me. I have been so wrapped up in the text use case that I think I lost the view of the full picture.

On first glance, I am not entirely convinced about giving complete control of the batching to users, however.

First of all it is an implementation detail of the renderer, and in principle, the scene graph supports replacing the renderer with something that is written specifically for the target hardware, although most people will be best served by using the batched renderer currently. Having such a property would limit what we could do in the future, though. My gut feeling is that it is more powerful to give people a way to identify the nature of the UI elements and then the renderer can use this information to provide the best optimization on the target machine.

The other thing is that I think it would add a lot to the complexity of using Qt Quick if the feeling was that you had to understand the renderer on this level in order to use it correctly. The hope is that the default is sufficient for most use cases, and then we could provide some extra tools for the cases where it doesn’t. My hope is that separating out the (mostly) static contents of the graph would be sufficient, as most of your graph is most likely static content from frame to frame, and if it isn’t, then you will have big uploads anyway.

But thanks a lot for the ideas! We will definitely take them into consideration.

Jocelyn Turcotte says:

CSS now has a will-change property with a similar purpose; to prevent browsers from having to use heuristics not aware of the future in order to know which element to promote to a layer (or prevent web pages from having to force an element as layer through transform:translateZ(0), where the memory cost could hurt):

https://developer.mozilla.org/en/docs/Web/CSS/will-change

That kind of contract could also be used for batching (or even automatic item layer promotion) in QtQuick too maybe.

S.M.Mousavi says:

Excellent!
I faced same issue on tiny animated Text element such as a counter. Is this trick works for any element?
This seems to be a good idea to make batching controllable by developer.
Also I noticed that a tiny animated progress bar causes high cpu usage (idle ~0.1% and animated ~4.2% on my MBP with i7 CPU). This causes high energy consumption in mobile devices such as Android smart phones. For work around this problem I currently used a Timer element to updating dynamic elements in a lower period of time.

Eskil Abrahamsen Blomfeldt Eskil Abrahamsen Blomfeldt says:

Yes, the trick should also work for other primitives.

As for the CPU usage of the progress bar: In order to match the native look on a particular device, the Android style uses a lot of animated images which will cause both GPU and CPU activity. You could try choosing a different style or use Qt Quick Controls 2 to see if that helps.

StefanB says:

Very nice post, thanks!

Can you give some more information about the debug output (or provide a pointer)?

Nodes:/Vertices:/Indices: seem to be just counts of QML nodes and resulting GPU data elements respectively.
The first column seems to be addresses of scene graph nodes, is it possible to use e.g. gdb to get some more info about it?
The root: property seems to be an address, but seemingly not related to any address in the first column. Is it only set if [clip] is set?
opacity: exists if [merged] is set? Whats the meaning of merged/unmerged?

What are the values in the “-> times: …” row?

I would welcome a timestamp in the output.

Is there some possibility to relate the QSG nodes to screen coordinates (or some other coordinate in a common coordinate system)?

I have seen a scene graph where all nodes are (re)uploaded, although likely most nodes are unchanged. Would it be possible the generate and print a hash of the uploaded data to see which nodes have really changed?

Eskil Abrahamsen Blomfeldt Eskil Abrahamsen Blomfeldt says:

The pointers in the output refer to the graph of batch nodes, i.e. the graph of the geometry that the scene graph has generated and that the renderer has attempted to structure in the optimal way.

With gdb, it might be possible to get some useful information out of it, as you suggest. For instance you should be able to inspect the vertex buffer of the batch fairly easily, I think.

Take a look at the code in qsgbatchrenderer.cpp. You could also add some additional debug info there to find out how your batches are made up. QSG_VISUALIZE is also a handy tool: http://doc.qt.io/qt-5/qtquick-visualcanvas-scenegraph-renderer.html#visualizing-batches

Also note the section about batch roots here: http://doc.qt.io/qt-5/qtquick-visualcanvas-scenegraph-renderer.html#batch-roots

Usually, in a less flat application than the one in my example, the scene graph will split into separate batches based on the size of the subtree (the default limit is 1024 vertices). The problem in my example arises when there is a large number of items sharing the same parent, so it becomes a single batch regardless of its size. So maybe that is something to go on as well. Good luck ๐Ÿ™‚

Alejandro Exojo says:

Will there be public C++ text APIs in Qt Quick at some point? I think those private classes have not changed much (e.g. https://github.com/jorgen/yat seems to compile with different Qt versions even though is using a lot of private headers. It’s kind of silly that the hello world of the scene graph classes is some rectangle, and not actual “Hello world” text. Text is the most important part of an UI after all.

Eskil Abrahamsen Blomfeldt Eskil Abrahamsen Blomfeldt says:

I agree that making those APIs public is overdue. I would like to go through and clean it up a bit first, and I can’t give you any estimate of when that will be exactly, but I can say that it is on my mind and I am planning to do it.

Commenting closed.

Get started today with Qt Download now