Qt Graphics and Performance – The Cost of Convenience

Published Monday January 11th, 2010
25 Comments on Qt Graphics and Performance – The Cost of Convenience
Posted in Graphics Dojo, Graphics View, OpenGL, Painting

Previous posts in this topic:

So, its time for my next post. Todays topic is how convenience relates to performance, specifically in the context of QGraphicsView. My goal is to illustrate that the way to achieve fast graphics is to pack your QPainter draw calls as tightly together as possible. The more stuff that happens in the middle, the slower it gets.

To illustrate this, I’ve implemented a virtual keyboard. Granted, its not a very common layout nor is it usable, but the rendering is the point here, not the functionality. The full source code is here and it looks like this:

Virtual Keyboard Image

I’ve implemented the keyboard using three different approaches. One using proxy widgets, one using graphics items and one where the entire view is one graphics item. In addition to that, I added a number of options to tweak various properties, such as whether or not the text is drawn. I measured this on an N900 rather than a desktop because the difference becomes more profound on a small device. On the desktop it is easy to be fooled because most things complete in a matter of micro seconds anyway. It is only when the entire application comes together one notices that things are not as smooth as in the prototype, but too much work has been invested into the current design that one loses out on the super-slick feeling application.

QGraphicsProxyWidget

Since we’re implementing a series of clickable buttons, a natural and convenient starting point is to use an existing button class, such as the QPushButton. It already implements the logic for mouse/keyboard interaction and has signals for clicking and all sorts of other useful functionality. To get widgets into QGraphicsView, we use a QGraphicsProxyWidget. To make the test “fair”, I actually use a plain QWidget which just paints a pixmap and a draws a text. Had I gone through the styling API, these numbers would have been even worse.

ProxyWidget Results
Milliseconds spent per frame including blit to screen when using QGraphicsProxyWidgets. Low is better!

If we look at the plain “-proxywidgets” run, the fastest engine was the raster engine, running at 26ms per frame. If I wanted to slide this keyboard onto screen, I have 16ms available if I want it running at 60 FPS and 33ms available if I want to do it at 30 FPS. When each frame takes 26ms, I can barely do 30, but with only a little bit of slack, so if another process is soaking up CPU time, that number is also a bit difficult to reach. So, not very good. (BTW, the exact numbers in the graphs are listed as a comment in the top of the .cpp file I linked above).

The first thing I noticed with this approach was that the each button now had a gray background. This is of course the widget background. A QWidget embedded in QGraphicsView will be treated as a top-level and will therefore draw its background. I added an option “-no-widget-background” which sets the Qt::WA_NoBackground on the widget. This brings the rendering speed with raster down to 22ms. 4ms saved per frame, just by setting a flag, not too bad, but still pretty far from being awsome.

I’ve mentioned before that text drawing is not as fast as we would like it, so just to compare how it looks without text, I added a “-no-text” option to the test. This brings the raster results down to 13ms. That is pretty nice and below the 16ms threshold required to achieve 60 FPS, but only with a small margin. And I’m not drawing any text! Before I give up with this approach, I’ll enable item caching. By setting ItemCoordinateCache on each button, I cache both the background pixmap and the text in one single pixmap. This brings the raster results down to 8.5ms, and its starting to look acceptable. But at a very high memory cost… In my original usecase I had one shared pixmap for all the button backgrounds, but now I have one per button.

You may notice that there was a vast difference between item caching and the proxy widget drawing the pixmap. One thing that adds to the proxy widget cost is that the QPainter is recreated and initialized for each button in the buttons paint event. Also, as I mentioned in my previous post, An Overview, you may remember that I said that each widget has a system clip and that there is an overhead involved with calling the paintEvent. For items in QGraphicsView, there is already a painter, and I don’t need a clip, nor do I need any of the other stuff that goes on behind the scenes there. When we enable item coordinate caching, we don’t leave graphics view world and we don’t enter the widget world. This crossing is expensive, so by not going into the widget world, we save a lot.

So, if there is a lesson to be learned it is that QGraphicsProxyWidget should be used with extreme caution. If you really need it, use very few of them.

QGraphicsWidget

If proxy widgets are too slow to be usable in this scenario, then the next best thing is to use a QGraphicsWidget. This is a subclass of both QObject and QGraphicsItem, which gives me signals, slots and properties, but its not a QWidget and therefore still fairly lightweight. The numbers are as follows:

GraphicsWidgets Results
Milliseconds spent per frame including blit to screen when using QGraphicsWidgets. Lower is better!

Compared to the proxy widgets approach we’re starting out quite a bit better, with raster at 13 ms per frame, OpenGL at 20ms and X11 at 22ms. Below this line is a new line: “-no-indexing -optimize-flags”. QGraphicsView will by default put all the items in a view into a BSP tree for fast lookup, this is beneficial when the scene contains many items and you often need to find items that intersect with a small portion of the scene. In the testcase we’re always doing a full update, so there is no benefit from the index, so it can be disabled by calling scene->setItemIndexMethod(QGraphicsScene::NoIndex). Having a BSP is the default behaviour because graphics view was initially intended to be a static scene for many items. The most common usecase today is a few (a few hundred at max) items which tend to move a lot. For this reason, it is always a good idea to try to disable the BSP and see if it makes a difference in performance. If it helps, then leave it off.

I also know that the items play nice, meaning that they don’t change the clip, translate the painter, change the composition mode or modify any other state that would propagate to other items. This means I can safely set the DontSavePainterState optimization flag. Actually, based on an old habit, I set all possible optimization flags. I only consider unsetting them if my drawing code starts to look weird, at which point I would rather fix the drawing code and keep the flags set. By disabling indexing and enabling optimization shaves off 2ms per frame in for all rendering backends, so that is definitely worth it.

If I don’t do text, the performance is about twice as fast. Again we see that text drawing is a huge cost. We’re working on an API to fix this and we’ll have more information for you when we do. You may notice that enabling item caching drops the performance a bit compared to the “-no-text” case. There isn’t much overhead inside QGraphcisView for this path. A likely reason for the decrease is that reading from multiple memory sources (multiple pixmaps) results in a lot of cache misses, compared to the straight approach which draws the same pixmap over and over.

ButtonView Item

In my previous post I briefly mentioned that there is a slight overhead involved with the use of a QGraphicsItem too. Prior to calling the paint function, the painter is transformed to the coordinate system of the item and the painter state is saved. If the item draws a big polygon, this setup cost can be ignored, but when drawing just a pixmap and a few pixels of text, then it may be worth considering. In the spirit of “The more direct the painting code is, the faster it gets”, I implemented the keyboard as a single item. The numbers are as follows:

ButtonView Results
Milliseconds per frame including blit to screen when using a single item. Lower is better!

Raster is now down to 10ms, which is 1ms better than the QGraphicsWidget approach when all optimizations were enabled, so even though graphics items are cheaper than widgets, they still cost a bit. The keyboard is now rendered in a tight loop, and the major difference in performance here is caused by the fact that items in the scene have a transform associated with them. Prior to calling paint() a transform is set to match the painter to the items local coordinate system. This causes a state change in the paint engine. For each button we’re drawing a 32×32 pixmap which means alpha blending 1024 pixels, followed by doing text layout and drawing a single character. Even then do we save about 10% time by not having a QPainter::translate() in the midst, so bear that in mind. By enabling the optimization flags and disabling the index, raster drops a bit more, so having those are still a good idea.

You may have noticed that there is one dataset that is named “cheat” for OpenGL. I was reluctant to include this, because its using a private API that is not, and I really mean NOT, subject to binary compatibility rules. You cannot call this from your application. We’re going to add a public API for this in the future, hopefully 4.7, so until its there, wait. In the interest of showing what we are thinking internally, I thought I would show it.

OpenGL is really great for accelerating graphics, but its way of working does not map optimally to how Qt works. GL is really good at taking a few large datasets of triangles and rendering them, but its not so good at drawing loads of small things. Small things like button backgrounds, icons, single text items, etc. However, all the buttons backgrounds are the same pixmaps, so what if I could tell QPainter to draw the same pixmap in multiple places at once? In GL this would correspond to setting up a texture and one vertex and texture coordinate array and drawing some 40 pixmaps in one go. This fits much better with how GL is made to work. The result is that drawing the buttons drop from 5.2ms to 3.9ms, so another piece of juice squeezed out. Naturally, the more times the pixmap is drawn and the smaller the pixmap gets, the more benefit you get from batching commands like this.

There is a second option to OpenGL for the button view case, which is the “-ordered”. This was done after Tom brought to my attention that the testcase would do a shader program update for each painter call. In the default buttonview implementation we do:


                    for (int i=0; i < m_rects.size(); ++i) {
                        p->drawPixmap(m_rects.at(i), *theButtonPixmap);
                        p->drawText(m_rects.at(i), Qt::AlignCenter, m_texts.at(i));
                    }

Because pixmaps use one shader pipeline and text drawing uses another, the pipeline needs to be switched and reset all the time, which renders at 16m per frame. To see if it makes a difference, I added a second alternative rendering, “-ordered”, where I do all the pixmaps first, then all the text:


                    for (int i=0; i < m_rects.size(); ++i)
                        p->drawPixmap(m_rects.at(i), *theButtonPixmap);
                    for (int i=0; i<m_rects.size(); ++i)
                        p->drawText(m_rects.at(i), Qt::AlignCenter, m_texts.at(i));

This prevents the shader pipeline updates and bring the rendering time per frame down to 13ms, so definitely worth it.

Summing Up

Virtual Keyboard Combined Results
Milliseconds per frame including blit to screen for proxy widgets, graphics widgets and a single widget. Lower is better!

OpenGL comes out rather bad in this testcase, which I was a bit disappointed to see, but it did send Tom into an optimization frenzy, so we’re hoping to remove some of the constant overhead. It should also be said that when using the OpenGL graphics system, we enable multisampling by default, which increases rendering time on the N900 by around 30%. A plain QGLWidget would thus perform slightly better. Another aspect to OpenGL is that it uses a dedicated low-power chip, so even though it for this particular usecase runs at half the speed, it also uses a lot less battery, so it may still be the right choice. OpenGL will also scale significantly better than raster and X11 as the pixmaps get bigger or if the content of the button is slightly more advanced, say like a horizontal gradient.

The best numbers are definitely in the button view case, where all the content is rendered as one item, which is what I wanted to highlight with this blog. The button view item also opens up for other optimizations such as batching. We don’t have that many batching functions in QPainter today, its only drawRects(), drawLines() and drawPoints(), but we’re considering to add more, we are just not sure on how the API’s would look yet.

The bottom line is still that how Qt is used defines how well it performs. On one hand there may be an easy and convenient way to get the job done which performs quite sub-optimally. On the other hand there may be a more involved implementation which performs very well. I’m not trying to suggest that you do one or the other, there are a lot of good reasons for picking either one. But I hope that I’ve illustrated that some features come at a cost and that this is kept in mind along with what the target is when designs evaluated and chosen.

I’ll round off with a question. If you were to implement a particle effect when you press a button, which approach would you choose, having seen the numbers above?

Do you like this? Share it
Share on LinkedInGoogle+Share on FacebookTweet about this on Twitter

Posted in Graphics Dojo, Graphics View, OpenGL, Painting

25 comments

amolokoedov says:

GL isn’t as good as i thought. Seems that using the same drawing code raster engine will be faster (at least for desktop). So i would stick with several QGraphicsWidget’s.

Theo says:

I noticed that since Qt 4.3 the QGraphicsView is becoming more and more a general purpose draw-it-all render-it-all kind of tool including the kitchen sink. All nice these 3D rendering stuff, but our company is more interested in a fast (2D) graphics engine. Are there any plans to split of the QGraphicsView as currently the 4.6 implementation has made our applications unacceptably slow. Note: using valgrind we found that most time was spent in internal bookkeeping and not in drawing.

alexis.menard says:

@Theo : Can you elaborate? 4.6 is improving performance in many areas. Perhaps you forget to read the changelog and you are in one of the area we changed the behavior? Just saying “it’s slow” with no test-case, nothing about what you are using doesn’t help us. And we don’t want to split QGraphicsView.

gunnar says:

amolokoedov: This usecase is very favorable for raster as explained in the text and in the previous raster post. If the sizes of the buttons had been in the 100×100 range, GL would most likely beat both raster and X11 with quite a lot.

Theo: What 3D stuff are you referring to? All the posts in this series have been about Qt performance and the 2D graphics we have. Do you mean OpenGL? We use GL primarily for 2D rendering in Qt. As for splitting, I think the split is pretty good already. There is the scenegraph which is QGraphicsView and there is a fast 2D graphics engine which is QPainter. Going straight to QPainter gives you quite a bit, as you point out and which also the testcase above illustrates.

Theo says:

Well, we were caught by various behaviour changes. The levelOfDetail that no longer functions and the exposedRect no longer functioning as expected.
However, we are now stuck with QGraphicsScenePrivate::_q_polishItems(). It takes an unholy long time (and with unholy I mean several hours) that previous took 30 seconds at most.

Theo says:

Gunnar: We are using QGraphicsView since it’s release with Qt 4.2 as we needed a fast drawing canvas that supports drawing of thousands of graphics items and had facilities for zoom and 2D rotation. So no 3D drawing at all. We were lured because of the promise of drawing thousands of items fast.
Since then QGraphicsView is moving more and more into 3D, and we had to change our code time and time again to get it working in the first place.
We are getting frustrated by all this as we started using Qt since we don’t have the man power to create a full fledged fast 2D drawing canvas. As we are multi-platform we tried various things. Up-to-now OpenGL blew up in our face because of the bad support by Solaris Sparc processors.
The raster engine gives the best results and a reasonable performance on Windows/Linux with simple graphics cards.
Text in OpenGL looks frankly terrible, but maybe again because we are using not very sophisticated graphics cards.
The items we are drawing are bitmaps, polylines, filled polygons, ellipses and text.

DrOctavius says:

Gunnar: I love your articles, Keep doing this great job.

alexis.menard says:

@Theo : In order to get fast speed we did some behaviors changes for default usage, the changelog explains how to work around those behaviors changes. But at least now if you don’t use levelOfDetail, it doesn’t slow down the entire app. For exposedRect again we went for the normal usage where people doesn’t do heavy computation according to the exposedRect, and again it’s easy to get it (why spending time of calculating it if nobody is using it?).

And for the polishItem please sync with the colleague next to you it’s fixed for 4.6.1, you can already grab the patch.

Hmm. Just ported that code to cairo out of interest. Getting similar numbers. Seems the graphics driver is the limiting factor. Otherwise it’s hard to explain, why the code runs 30 times slower than on my notebook.

http://gitorious.org/openismus-playground/vkbd-bench-cairo/blobs/master/src/vkbd-bench-cairo.c

Benjamin says:

@Theo
I don’t get it, what 3D are you talking about in Graphics View?

Brandon says:

Another great post. Keep them coming!

For comparison I did a quick mock up of this in WPF. I basically put the same # of Aero themed Buttons in a Grid & animated a translucent rectangle over them to disable dirty rects. I dont know the exact FPS because it was limited at my monitor refresh rate of 60 FPS.

It’s obviously not a apples-to-apples comparison, but considering Aero buttons require several vector shapes with gradients, I think it’s actually being generous in Qt’s favor. (i.e. I think it’s most comparible to using the styling api with regular buttons & no proxies)

The bad news is that it looks like WPF is faster than Qt/OpenGL even without optimizations.

The good news is that it shows it’s possible to have convienence (this was implemented in XAML except the FPS counter) and performance.

I think it would be cool to have a follow up post that shows a breakdown of where the time is spent for the various paths. I assume the OpenGL path has too many state changes & maybe fillrate issues…is this true?

Theo says:

Alexis: We already decided to stick with 4.5.x. It is too much work to keep track with all the changes that break our software every time there is a new Qt.

Benjamin: I was talking about the transform support by Graphics View. Although considered cool it slowed down our application again by several percentage points.

Enrico Ros says:

Great post! Maybe it’s missing the last bit of comparison: a pure (single) QWidget based implementation, to show the last bits of overhead introduced by graphicsview.

About the particle effect: a qgraphicsitem as big as the particles max boundingrect with a single pixmap and a tight for loop for painting?

Martin P. says:

> The most common usecase today is a few (a few hundred at max) items which tend to move a lot.

I think this is not true. At least when you do not look primarily on Nokia or S60. We, for example, are using scenes with more than 100000 items (It’s a textile-simulation mainly for scarfs with an item for every stitch).

The polishItems-fix is fine. However, i’d like to see an item-flag “QGraphicsItem::ItemNoPolishNeeded”, to avoid the polish-call completely.

Andre says:

I find the changelog a weird place to document these kind of issues. They belong in the normal documentation! If a method is slow to use, document that method as slow in the documentation. Why hide that into a changelog? Why should I care what changed between versions of Qt, if I write new code against the new version? I expect to find all information that is currently relevant in the documentation for the current version.

Dragan says:

Excellent post, saves us significant time to characterize all the available options.

gunnar says:

Brandon: You are comparing a Desktop box with windows against an N900? I’d say that is a little bit unfair 🙂 On my desktop machine, the frames render in 2-3ms for the worstcase scenarios and 0.2ms for the bestcase ones. Hence my comment: I measured this on an N900 rather than a desktop because the difference becomes more profound on a small device. On the desktop it is easy to be fooled because most things complete in a matter of micro seconds anyway. It is only when the entire application comes together one notices that things are not as smooth as in the prototype

Enrico: I didn’t include the pure widget case as it will be pretty much the exact same result as the single-item case. Good choice on particle effect implementation, btw.

Alistair says:

Nice post. Would be interesting to see the comparisons of raster vs OpenVG (obviously on a device with a real OpenVG engine). Another interesting figure would be the CPU utilization for raster vs OpenGL vs X11 as a reason to use OpenGL is to offload the main CPU.

zchydem says:

Thank you for this post. Very useful to see this kind of statistics.

alexis.menard says:

@Martin P. : “The polishItems-fix is fine. However, i’d like to see an item-flag “QGraphicsItem::ItemNoPolishNeeded”, to avoid the polish-call completely.” Then we ends up with 10000 flags that you have to tweak…horrible…Perhaps this one is relevant or not but we don’t want to make everything an option.

@Andre : The changelog is used when moving to 4.6.0 from old code right? So that’s where we put the those behaviors changes. You don’t want to see in the 4.6.0 doc : “Btw if you are coming from 4.3 do that, from 4.4 do that”. As you said you don’t care when writing new code. So i think it was on the right place. However i agree with you that we can mark some functions as slow, i think we are starting to do that in some places. In that case, some attributes were too costly to compute, we by-pass them by default in 4.6.0 so in theory you won’t see them and you don’t need to deal with them.

Kronen says:

Good work!

Tero says:

There is not much said in the article about the ‘-buttonview -item-cache’ case. I guess that is because although it’s numbers are the lowest, it comes with worse applicability: It applies only to a static case where the keyboard item does not change (i.e. ‘sliding into view’ case). When a button is pressed, ‘-buttonview -item-cache’ case behaves like ‘-buttonview’ case + the extra work from updating the item cache.

But how about ‘-graphicswidgets -item-cache’? For me that is the winner since

1) it’s numbers apply roughly as well as those of ‘-buttonview’
2) it is much faster than ‘-buttonview’ with GL backend
3) and at least for me, ‘one button = one item’ feels a better abstraction than ‘a keyboard = one item’.

A downside of course is the memory consumption of item caches which are not needed in ‘-buttonview’.

Christian Brandoni says:

Yes,please I would also like to know how the openvg perform compared to raster!
Openvg should map almost directly to most 2d functions,so it should be the best way to accelerate 2d in embedded platforms.

PS
The security code to post is incredibly long and hard to read 🙁

Anonymous says:

> GL isn’t as good as i thought.

On N900, the graphics chip runs at 110Mhz, the CPU max speed is 600Mhz (scaled according to usage). Using SGX to do dumb memcpy ops on textures is going to be slower than on CPU, but all this depends on what else one does (bouncing data between CPU & GPU is also slow etc).

MEL says:

Gunnar, I would need some performance advice:
I want to zoom into video input (via Firewire, NTSC DV Cam) and draw text and icons on top of it. Performance is really important, it has to run with 30fps. I am using 4.6.1 on Mac OSX 10.6.
What is the best approach, openGL or just Qt?
Are there examples for this?

Commenting closed.

Get started today with Qt Download now