Qt Graphics and Performance - Fast Text

Previously in this topic:

In my previous post, The Cost of Convenience, we saw quite clearly that text drawing was a major bottleneck. Text drawing is quite common in GUI applications though, so we need a solution for that. If we break down what happens behind QPainter::drawText(), it is split into two distinct parts. Converting the string into a set of positioned glyphs, often refer to as "text layout" because it positions the glyphs, does text wrapping and adjustments for alignment. The second part is passing the glyphs to the paint engines to be rendered. When the text is the same all the time, the first part could be done once and the glyphs/positions just reused.

We have a class in Qt which allows you to cache the first part and only do the drawing for each frame. The class is QTextLayout. This is a low-level class, throwing asserts at you for the most trivial of mistakes. It also comes with a really inconvenient API, but it does reduce the most costly step of text drawing, which is the layout part. It is also only fair to mention that QTextLayout uses a lot more memory than just the glyph-array and positions array, as one could expect, so in a memory constrained setup, it should be used with caution. In 4.7, we plan to introduce an API for static text, which takes care of all the layout and stores only the required parts, reducing the overall memory footprint, but for now, QTextLayout is how you do it.

Going back to my virtual keyboard, updated Source Code, I've changed the "-buttonview" example to make use of QTextLayout. In the constructor, I build the layout:


ButtonView() {
QString content;
for (int i='0'; i< ='Z'; ++i) {
content += QLatin1Char(i);
content += QChar(QChar::LineSeparator);
}
m_layout = new QTextLayout(content, font());
QFontMetricsF fm(font());
m_layout->beginLayout();
for (int i=0; i QTextLine line = m_layout->createLine();
line.setNumColumns(1);
int x = (i) % 10;
int y = (i) / 10;
QSizeF s = fm.boundingRect(content.at(i*2)).size();
line.setPosition(QPointF(x * 32, y * 32) + QPointF(16 + s.width() / 2, 16 + s.height() / 2));
}
m_layout->endLayout();
m_layout->setCacheEnabled(true);
}

If you look at the source code, there is more stuff going on in the constructor than I show above. This is because I extracted the text layout relevant parts only. So what we do is to build a string of the characters. Between each character I insert a LineSeparator. Without this, I wouldn't be able to split the text into multiple QTextLine objects. From the content string, I construct the layout. For each character, I find its position in the grid and construct a QTextLine and move the line to its position. Each line is one column/character big. Finally I enabled caching on the layout. This is the step where we start caching the laid out text.

When it comes to the paint method, the code is rather straightforward. All the text is contained inside a single layout object so I can just call its draw function.


void paint(QPainter *p, const QStyleOptionGraphicsItem *, QWidget *) {

// Draw background pixmaps...

m_layout->draw(p, QPointF(0, 0));
}

Now, lets have a look at what this gains us:

Text Layouts

The graph shows the number of milliseconds per frame including the blit. Measured on an N900 with composition disabled. Smaller is better!

If we compare the "-no-indexing -optimize-flags" to the one with "-no-indexing -optimize-flags -text-layout", we see that there is a significant reduction per frame. It brings raster from 9.3 ms per frame down to 5.5, OpenGL drops from 16 ms per frame to 9.1 ms when using a text layout. A drop of about 4 ms is also visible in the X11 paint engine.

Needless to say, using the QTextLayout class introduces a huge benefit, but it requires a bit more setup to get there. In this implementation I merged all the text into a single object which also makes it impossible for me to move one item relative to the others, such as adding an offset when a button is pressed. I could have one QTextLayout for each item, which would have been roughly the same performance, but at a higher memory cost.

Until next time, take care!

PS: A small comment on the item cache / X11 numbers. The connection is asynchronous and Qt completes its job at about 2.7 ms pr frame. With "-sync" on the command line, which makes all X calls synchronous, raises the time to about 10 ms per frame. If I had put a QApp::syncX() into each frame, synchronizing once per frame which is essentially what GL and VG are doing, I would probably get a number that is in between these two. What this means is that the numbers for X11 in this test are actually quite a bit worse than the graphs show.


Blog Topics:

Comments