Daniel Smith

Qt 5.12 LTS – The road to faster QML application startup

Published Friday January 11th, 2019
10 Comments on Qt 5.12 LTS – The road to faster QML application startup
Posted in Dev Loop, Embedded, Performance, Qt Quick 2, Qt Quick 2.0 | Tags: , , , , , , ,

The Qt Company has been running benchmarks like QMLBench for a long time that assist us knowing when a change creates a performance regression, but it’s also important to see how Qt performs at a higher level, allowing components to interact in ways that granular tests like QMLBench can’t show. In this blog post, we’ll walk through new application startup testing results from a more real-world QML benchmark application.

 

The benchmark

For these tests, a relatively simple QML application was developed that utilizes many areas of QtDeclarative and QtGraphicalEffects. The application code was written as a casual developer might, and has not been designed for optimal startup, memory consumption, or performance. Because we’re benchmarking, the application does not make use of Interactive elements or user input. The application is of low complexity, without divergent logic, so that results are as consistent as possible between test runs. Though no benchmark will ever truly simulate real-world performance with user interaction, the test discussed here aims to more accurately represent a real-world QML workload than QMLBench or the QtQuickControls “Gallery” example.

QML Benchmark

The benchmark application. It combines textures, animations, QML shapes, repeaters, complex text, particle effects, and GL shaders to simulate a heavier, more real-world application than other QML benchmarks like QMLBench.

Download the benchmark source code here.

Lars has previously written about The Qt Company’s commitment to improving the performance of Qt, and with the recent release of Qt 5.12 LTS, the efforts made are really showing, especially on QML. Among the improvements, a good number have been towards improving startup performance. Out of the platforms tested, the greatest startup performance improvement was seen on the lowest power device we tested, a Toradex Apalis i.MX6. Let’s explore that.

Startup Performance

overview-chart

The chart above shows how the features in Qt 5.12 LTS really cut down on the startup performance, dropping time-to-first-frame from 5912ms in Qt 5.6 to only 1258ms in Qt 5.12.0, a 79% reduction! This is thanks to a number of new features that can be stacked to improve startup performance. Let’s walk through each.

  1. The Shader Cache – Introduced in Qt 5.9 LTS

    The shader cache saves compiled OpenGL shaders to disk where possible to avoid recompiling GL shaders on each execution.

    Pros: Lowers startup time and avoids application lag when a new shader is encountered if the shader is already in the cache.
    Cons: Systems with small storage can occasionally clear shader caches. If your application uses very complex shaders and runs on a low-power device where compiling the shader may produce undesirable startup times, it may be recommended to use pre-compiled shaders to avoid caching issues. There is no performance difference between cached shaders and pre-compiled shaders.
    Difficulty to adopt: None! This process is automatic and does not need to be manually implemented.

  2. Compiled QML

    Without use of the Qt Quick Compiler detailed below, QML applications built on Qt versions prior to 5.9 LTS would always be compiled at runtime on each and every run of the application. Depending on the application’s size and host’s processing capabilities, this action could lead to undesirably long load times. Two advancements in Qt now make it possible to greatly speed up the startup of complex QML applications. Both of which provide the same startup performance boost. They are:

    Qt Quick Cache – Introduced in Qt 5.9 LTS

    The Qt Quick Cache saves runtime-compiled QML to disk in a temporary location so that after the first run when qml gets compiled, it can be directly loaded on subsequent executions instead of running costly compiles every time.

    Pros: Can greatly speed up complex applications with many qml files
    Cons: If your device has a very small storage device, the operating system may clear caches automatically, leading to occasional unexpected long startup times.
    Difficulty to adopt: None! This process is automatic and does not need to be manually implemented.

    Pre-generated QML (Qt Quick Compiler) – Introduced in Qt 5.3 for commercial licensees, both commercial and open source in Qt 5.11

    The Quick Compiler allows a QML application to be packaged and shipped with pre-compiled QML. Initially available under commercial license from Qt 5.3 onwards, it is available for both commercial and open-source users from Qt 5.11 onwards.

    Pros: Using Quick Compiler has the advantage of not needing to rely on the runtime generated QML cache, so you never need to worry about a suddenly unexpected long startup time after a given application host clears its temporary files.
    Cons: None!
    Difficulty to adopt: Low. See the linked documentation. It’s often as simple as adding “qtquickcompiler” to CONFIG in your project’s .pro file!

  3. Distance Fields – Introduced in Qt 5.12 LTS

    Though Qt has been using Distance Fields in font rendering for a long time in order to have cleaner, crisper, animatable fonts, Qt 5.12 introduces a method for pre-computing the distance fields. Learn more about Distance Fields and implementation in this blog post by Eskil.

    Pros: Using pre-generated Distance Field fonts can drastically reduce start-up performance when using complex fonts like decorative Latin fonts, Chinese, Japanese, or Sanskrit. If your application uses a lot of text, multiple fonts, or complex fonts, pre-generating your distance fields can knock off a huge chunk of time to startup.
    Cons: Generated distance field font files will be marginally larger on disk than standard fonts. This can be optimized by selecting only the glyphs that will appear in your application when using the Distance Field Generator tool. Non-selected glyphs will be calculated as-needed at runtime.
    Difficulty to adopt: Low. See the linked documentation. No additional code is necessary, and generating the distance fields for you font takes seconds.

  4. Compressed textures – Introduced in Qt 5.11

    Providing OpenGL with compressed textures, ready to be uploaded to video memory right out of the gate, saves Qt from need to prepare other file types (jpg, png, etc…) for upload.

    Pros: Using compressed textures provides a faster startup time, decrease in memory usage. It may even provide a bit of performance boost depending on how heavy your texture use is, and how strong of compression you choose to utilize.
    Cons: While the compression algorithms in use for textures inherently require some tradeoff in visual fidelity, all but the most extreme compression schemes will usually not suffer any visible fidelity loss. Choosing the right compression scheme for your application’s use case is an important consideration.
    Difficult to adopt: Low +. See this blog post by Eirik for implementation details. Almost no coding is required, needing only to change texture file extensions in your qt code. Easy-to-use tools for texture compression are available, like the “texture-compressor” package for Node.

 

Conclusions

The i.MX6 is a great representation of mid-tier embedded hardware, and the performance improvements included in Qt 5.12 LTS really shine in this realm. Stack all the improvements together and you can really cut down on the startup time required in low power devices.

With these latest test results for low-power hardware, Qt 5.12 could lend a hand to your development by greatly decreasing startup times, particularly when running on low and mid-tier embedded devices. These new performance improvements are easy to adopt, requiring only the most minor of changes to your codebase, so there’s very little reason to not start using Qt 5.12 right away, especially if your project is cramming heavy QML applications into a fingernail sized SoC. The chart below is a reminder of what’s possible with Qt 5.12 LTS, and faster start-up time makes happier customers.

chart-2

Do you like this? Share it
Share on LinkedInGoogle+Share on FacebookTweet about this on Twitter

Posted in Dev Loop, Embedded, Performance, Qt Quick 2, Qt Quick 2.0 | Tags: , , , , , , ,

10 comments

Julien Bordes says:

The startup performance improvement is impressive.

Can we expect the same results with an iMX.6 on QNX 7.0?

Daniel Smith Daniel Smith says:

Hi Julien,

As far as I’m aware, the enhancements discussed in the article are fully cross-platform and I would expect the same speedups. That said, the OS architecture differs and direct testing would be necessary. If you are already developing for QNX 7.0, I would strongly suggest creating a copy of your project and re-targeting to 5.12.0 to try it out. All of the features discussed take little to no work to implement and I’d love to hear your results.

Louis says:

Hi,

These are great news, thanks for summarizing all this!

A little question about the graph itself: do you have the times of Qt 5.9 and 5.12 as well to compare with 5.6 as well without any optimization? Just to see the price of “not optimizing”.

For QtQuickCompiler, I’ll have a look soon on my own apps because last time I tried it, a multiple amount of stuff was not supported while it was working in the “normal” mode or “on-the-fly” generated QML cache. Last thing I remember about it was the following that wasn’t supported only using QtQuickCompiler:

property point position: Qt.point(12, 35)
Behavior on position.x { NumberAnimation { duration: 1000 } }
Behavior on position.y { NumberAnimation { duration: 3000 } }

Thanks,
Louis

Daniel Smith Daniel Smith says:

Hi Louis,

For 5.9 and 5.12, You can consider the first columns for each the “baseline”. That is, out-of-the-box behavior. While it is possible to disable the QML cache by setting the environment variable “QML_DISABLE_DISK_CACHE=1”, it’s typically only used for debugging, so it wasn’t included in testing. As for the shader cache starting in 5.9, this is baked into Qt at a pretty low level (which just tells the system graphics driver to use caching), and would require some hacking to remove it.

In this benchmark, 5.12 is simply faster out-of-the-box, with or without the added features discussed. 🙂

In regards to Quick Compiler, it may be possible to force support when compiling Qt yourself for some older versions. Note that in 5.12 LTS, Qt Quick Compiler is baked in to all applicable submodules as far as I’ve found. In the past, I’ve done a little testing on 5.6 with Qt Quick Compiler, and to get the greatest effect, you may also need to add “CONFIG += qtquickcompiler” to sub-projects in qt (for example modifying qtquickcontrols\src\layouts\layouts.pro) prior to running qmake and building Qt modules from source.

Gunnar Roth says:

Is there also support for projects using cmake for all of this?

Nuno Mendes says:

Hi Daniel,

I’ve noticed that my QML application which relies heavily on the ChartView component got much slower with the 5.12 update.
I checked the respective github repository on https://github.com/qt/qtcharts and found (correct me if I am wrong) that there were no improvements, just minor bug fixes. Should we expect better performance on the upcoming commits and if so, when? If not, which other alternative do you suggest?

Thanks

Daniel Smith Daniel Smith says:

That’s very interesting, Nuno. Unfortunately, I don’t have any numbers tracking performance of ChartView. I would recommend opening a bugreport on https://bugreports.qt.io. If you have been able to measure the difference, please provide that, and any other detail you may have in the bug report.

Bernhard Zens says:

Unfortunately this doesn’t seem to be limited or related to ChartView itself.
Just building with 5.12 instead of 5.11.3 resulted in the FPS (default OpenGL, Windows 10) of our QML application roughly cut in half in large scenes. What’s even worse is that frametimes are significantly worse now, which is much more noticeable than a simple decrease in average FPS.
A simple before/after example of just navigating laterally through one of the main navigation panels (using SwipeView)…

5.11.3
Avg. time: 7.2 ms (139 FPS)
1% time: 16.2 ms (62 FPS)
0.1% time: 38.3 (26 FPS)

5.12.1
Avg. time: 7.83 ms (128 FPS)
1% time: 20.9 ms (48 FPS)
0.1% time: 181 (6 FPS)

Moving to OpenGLES with D3D11 is even worse (~half FPS again, but that applies to 5.11 as well as 5.12).
I´ve had a look at this issue with the QML profile as well as MSVC2017 for overall profiling, and I don’t see any significant time being spent on our end specifically.
Until now we used to always update to the latest version as soon as possible, but this serious performance regression makes 5.12.x a complete no-go at the moment. What good does a slightly faster startup do if the application no longer feels smooth but like a stuttering, lagging mess?

Daniel Smith Daniel Smith says:

Hi Bernhard,

I’m sorry that 5.12 seems to have seen some regressions in the areas your applications use. You noted that you’re on Windows– While I understand that Qt may still be implicated in the performance drop, can I ask if you are running your applications on Intel graphics? In my testing, I saw a much greater frametime variance on intel graphics hardware (Intel HD 4000), particularly if vSync was forcibly disabled. Unfortunately, I wasn’t recording results for this type of measurement, so the evidence is purely anecdotal.

yuyikai says:

The FPS is slow when ChartView scroll, especially on android devices.

Commenting closed.

Get started today with Qt Download now