Thoughts about graphics performance

Published Wednesday September 5th, 2007
17 Comments on Thoughts about graphics performance
Posted in Graphics View, KDE, Qt

Ah, performance. Today we had yet another discussion about how to make Qt even faster, preferably all over the place. Generally, we feel that resources spent on hunting down performance problems are resources well spent. But we’re (surprisingly?) not a huge lot of people working on Qt’s 1000000 lines of code, and many of us feel that we’re spread thin trying to juggle major and minor feature development, maintenance and innovation, and so on. So how do you spend your effort? I personally feel strongest about graphics.

Qt is, of course, a kick-ass toolkit written by a bunch of hard-core (yet wonderfully pleasant!) developers, and sometimes we try to make everything as fast as possible, other times we’re forced to make a trade of API quality versus performance. Before he left us, former Troll Zack(r) held a seminar where he stressed how truly high-performant graphics, (and we’re talking the tens-of-thousands-of-polygons full-screen with tons of impressive effects at 75fps,) needs tailoring, and is typically written very close to the actual hardware, in order to run as fast as possible. Convenient APIs can still perform well if they are sufficiently high-level (“I know exactly what you want to do!”), or sufficiently low-level (“I don’t know squat about why you want to draw these trigons but I swear I’ll do it real fast!”). Qt is in the middle. You can’t ask Qt to mock together Black & White III (you would naturally use OpenGL for that!), but you can’t ask Qt to upload a subroutine to the graphics card’s pixel shaders either. It does that for you in the background, but it doesn’t provide that API. Qt can render vector graphics for you. It does it incredibly fast, and beautifully, but pixel perfect vector graphics is not always what you want. Otoh, Qt doesn’t know that. πŸ™‚ Gah!

Now Arthur’s rendering model does give you many options to write beautiful and fast graphics. The default paint engines and QPainter provide a mid-level intuitive API for drawing vector graphics, implemented with the best effort approach (speed + quality). Depending on the platform and complexity of your shapes, Arthur picks the best approach giving you high-quality output at high speed. Through QImage, you can also access pixels directly and work all your magic “by hand”. With Qtopia Core you also have QDirectPainter, which lets you touch pixels directly on the frame buffer. QGLWidget provides two things: Both the QPainter API, which I consider to be truly unique, basically the exact same operations you use with QWidget are translated to OpenGL calls. And, of course, you can use OpenGL directly (the context is set up by default in QGLWidget::paintEvent(), just fire away those GL calls!).

But QPainter, the heart of our graphics API, works on a non-compositional model. QPainter doesn’t know enough about how you want to blend your stuff together; it has to rely on you to do the smart stuff. For example, if you ask QPainter to draw a complex path, say a QPainterPath representation of a text document, and then ask it to do it again, and again, and again, QPainter has a pretty tough time figuring out that it would be nice to cache that path. Even if it could, it doesn’t necessarily know how to make it also look good in your case! It cannot know what’s best for you; to a certain degree it must rely on you knowing what you’re doing. You, on the other hand, do know what you’re doing. You know perfectly well what could make it faster, but maybe you don’t know how to make Qt do what you want. Somewhere between QPainter and you, there’s some smart stuff flying around, I’ve just been feeling that there’s got to be an API in there somewhere ;-). The trick is to pull it out of the hat somehow.

People tend to prefer quality high-level APIs over low-level ones. By that I mean APIs that are easy to understand and use, empowering you and allowing you to quickly transform your brain vibes into shapes on the screen. Now how can you do that with a really addictive, efficient and intuitive API, while still keeping it blindingly fast? The hard problem lies in finding the right level of abstraction. Our closest thing so far to an abstraction over Arthur is the Graphics View API. I’ve spent some time with QGV to bring my ideas into the API somehow. Does QGV “know” enough about what you’re doing, to do it more efficiently?

QGraphicsItem knows that everything going on inside of paint() is basically drawn on one surface using a single homogenous transform, and it can render the item off-screen into a texture in logical coordinates to avoid asking QPainter to redraw and redraw and redraw and redraw. That texture could then be stored in graphics memory, like QPixmap already works with QGLWidget, and you could transform and translate the item without ever “redrawing” it. Your paint() functions wouldn’t even get called at all. I just think it could do miracles for lots of graphics apps that spend a lot of time redrawing. So when the item is exposed, or even transformed, instead of retesselating, rescaling and rerendering the thing, we just blit the texture. Whenever the item needs to redraw parts of itself, it could call QGraphicsItem::invalidate(QRectF), as opposed to update(QRectF), which just reblits the texture. The following screenshot shows an app I wrote to measure just how fast or slow a straight-forward application for Qtopia Core would run. It’s a phone keypad navigator:

Padnavigator-example

Here’s the source code, download and unpack:

The Pad Navigator Source Code

And my patch to Qt 4.3.x, download and unpack:

My Patch to Qt 4.3.x

Now, it goes like this: Download your favorite open source edition of Qt (I prefer the all-package for simplicity), preferably 4.3.1. Unpack, apply the above patch to src/gui/graphicsview – it should apply cleanly with no conflicts. Build Qt, and build the padnavigator example. Now run the example. Play around with the key pad, press enter, bla bla. Resize it, it’s resolution independent. Now, if you hit space, the whole example enables logical caching for all its items. Notice how the quality level goes down, but speed just goes sky-rocketing. OK – I think I’m onto something here, now back to the drawing board.

PS: Try without OpenGL and compare the performance by removing the setViewport() call in main.cpp.

Disclaimer: If you don’t really notice any other difference than image quality degradation, you probably have state-of-the-art hardware and a modern graphics card. Don’t blame me for Qt being fast without any tricks! πŸ˜‰ Try running padnavigator over a remote X connection on Linux, or run it through a heavy profiler like valgrind, just to “emulate” slow hardware.

Do you like this? Share it
Share on LinkedInGoogle+Share on FacebookTweet about this on Twitter

Posted in Graphics View, KDE, Qt

17 comments

Harunobu Oyama says:

I agree with you and am quite sure that built-in cache is a MUST HAVE feature in QGV. I like the API you propose update vs invalidate for cached and non-cached repainting. Cache-scaling mode like QPixmap has would be also useful. Please put it as an official feature in Qt4.3.2 or Qt.4.4.0.

p.s. Your patch also showed me that we can use QVariant as a value in a flexible bag. It seems quite useful to add extra fields without sub-classing.

Marco Bubke says:

What about (vertex) buffer objects. Textures have filterproblems so they are looking blury if you transform than which is tolerable in games but not in text applications. Maybe geometry shader are a way. This would mean that the drawing of Qt have to be changed so there would be big batches. You don’t save the images but the geometry which could be parametrized.

Ralf says:

Sounds good, this is a thing we definetly need in our program, but will it work for overlapping items as well, even with respect to z-order?

Harunobu Oyama says:

Another thought.

Suppose we have some graphics item, with which the rendering complexity totally depends on the data it holds. The rendering complexity of the text example totally depends on the length of the text, for example. In that case, it should be faster to just render the text and we can save memory for the pixmap cache. To do this, QGraphicsItem should have a virtual function to return its internal complexity and another virtual function to return the threshold complexity vaule, or should have a virtual function to ruturn whether or not to use cache.

Gopala Krishna says:

Interesting patch πŸ™‚ Looks like this is an absolutely needed feature.
Infact I tried a program with performance problems with caching enabled and indeed there was 200% improvement(with some other modifications to that program)!!
I encountered that problem in qtcentre here – http://www.qtcentre.org/forum/f-qt-programming-2/t-what-is-the-fastest-way-to-draw-a-circle–8889-post47546.html

Gopala Krishna says:

BTW i guess i exaggerated it. Though using cache the speed improved there were jitters while items were moving. Anyway there was improvement πŸ™‚

Andreas says:

Thanks guys, it looks like everybody loves the feature and want it in Qt. For those interested, the requirement has been there for a while, and customers can vote on http://trolltech.com/developer/task-tracker/index_html?method=entry&id=127051 to show their interest.

Marco: QPainter can render to a QPicture, or a QSvgGenerator, and we (and you!) can also extend this and have it dump whatever vector graphics format you’d like. That opens up the opportunity of supporting truly vectorized server-side rendering, maybe even OpenVG hardware acceleration, I’m very excited just thinking about what the future might bring. If you’re basing your canvas on QGV it seems the future has a lot of good stuff in the line for you.

Ralf: The cache works with overlapping items, Z order works seamlessly, and it supports alpha-composition 100%, you can mix cached items with uncached items, no problem.

Harunobu Oyama-san, we cannot sadly add such functionality to QGraphicsItem, but the good news is you can add it in your own items. You are free to add virtual functions in your own base class, that checks for complexity (like the length of the text, and so on). However, it’s very hard for Qt to guess whether caching is worth it or not, both timewise, and it’s also very hard for Qt to make the judgement over speed versus quality degradation, but hey, you’re the boss there. πŸ™‚ We bring you the tools and you can do your own magic.

Harunobu Oyama says:

Hi Andreas, thank you for your response. No problem at all. It is not a big job to add the complexity checking for our own items.
BTW, how can I make a vote to the cased 127051? The page does not seem to have a button like “Vote” or “I like this idea” or “digg this page.”

Adam Higerd says:

I’m not arguing, but… on my machine (Duron 1.3GHz with GeForce2 video — not exactly recent, y’know?) I actually see a quality IMPROVEMENT in logical caching mode, as well as the performance boost. Some lines that were looking aliased in the default mode end up looking better — the lines don’t look like constant thickness, but they at least look solid instead of having gaps between sections.

You know what might be a good idea? Making the default configurable by the end user instead of by the developer.

Andreas says:

Oyama-san and any others, the vote option is there only for customers, you can vote if you log onto the task tracker with your customer ID and password. I hope we can open voting for the public at some point, but for now that’s how it works.

Adam: I’ll make sure we also make quality bad in your case in time for the 4.4 release. ;-D

Ralf says:

Hi Andreas, will this obsolete bug 158969? I don’t hope so, because this would improve performance in certain cases even more, don’t you think?

Harunobu Oyama says:

Andreas,

Sorry to bother you again. I cannot find a way to log onto the task tracker system. Is there a special url for commercial version customers??

Andreas says:

Ralf, no that task still remains.

Oyama-san, you should probably email support and they’ll help you vote. πŸ™‚

David Johnson says:

The only caveat I have, is to avoid reliance on OpenGL and XRender. The near total lack of Open Source drivers for modern video cards make these problematic for many users.

Andreas says:

Oh, but there is no dependency nor reliance between QtGui, where Graphics View lives, and OpenGL. You can tie those together yourself, using QGraphicsView::setViewport(), but there’s no dependency. Even without OpenGL, caching works very well. On both X11 and Windows you’ll get speed-ups if your item spends enough time painting itself. With device caching, a type of cache that we’ll also support that only invalidates when you move or translate an item (directly or indirectly), speed-ups are 100% paint engine agnostic.

Marius says:

For customers wanting to vote on this task, go to this URL instead of the one Andreas quoted above. (The above is a Read Only URL):
http://trolltech.com/customer/task-tracker/index_html?method=entry&id=127051

Antonio says:

Hi Andreas, this is great, but what about transformations?
If I move a transformed(e.g. rotated) QGraphicsItem, is the transformation continuously computed even if the item uses logical caching?

Commenting closed.

Get started today with Qt Download now