Risto Avila

Fast-Booting Qt Devices, Part 2: Optimizing Qt Application

Published Wednesday April 27th, 2016
13 Comments on Fast-Booting Qt Devices, Part 2: Optimizing Qt Application
Posted in Automotive, Biz Circuit & Dev Loop, Boot time, Embedded

Welcome back to the fast-boot blog post series! Last week we covered the fast-boot demo that showed you a video of an i.MX6 board booting under 2 seconds. In this blog post we will cover how the Qt QML cluster application was optimized.

The original demo shown in Qt World Summit 2015 was designed in PC environment and the startup time was not addressed with the initial design. Design already incorporated usage of Loaders so that some parts of the UI were asynchronously loaded but the startup sequence was not thought at all. So, to begin the optimization for the startup we first needed to think of the use case: What is it that we want the user to see first? We selected that the first thing the user must see is the frame of the cluster after which we load and animate the rest of the objects to the screen.

In the cluster image below the red overlay marks the parts we decided that user must see when the application starts:

mask_layer_red

When looking at the application code we noticed that our dashboard was actually combination of multiple different mask images and some of these were fullscreen. So we combined all the visible parts into one single full screen image that gets loaded on top of the UI:

DashboardFrameSport-mask

 

To make the startup of the application as fast as possible we adjusted the internal design of the application as well. We separated the dashboard frame into it’s own QML file that gets loaded as a very first item. After cluster frame is loaded and drawn we enable the loader underneath to load rest of the UI.

frame_qml2

We also used the QML profiler in Qt Creator to find out what was taking time. Initially the demo used the original Qt Quick Controls that were designed for desktop. This caused the creation of these gauges take some extra time (note that, Qt Quick Controls are being redesigned for embedded use cases for Qt 5.7!) To solve this part for now, we replaced the gauges with images and created a fragment shader to color the parts of gauges that needs animation.

As a final touch we added the flip animation to gauges and fade in to the car to make startup feel more natural:

After these optimizations we got the Qt application to show the first frame under 300 milliseconds on the target device (from the time operating system kernel is loaded).

Optimizing Your Application: ‘Do’s and ‘Do not’s!

Finally, based on our experience, here is a summary of tips’n’tricks for optimizing your Qt Quick applications. In case you feel like you could use additional help, feel free to contact us or any of our Qt Partners and we’re happy to help you with your application!

Do:

  • Design your application to start fast from the beginning. Think what is it that you want the user to see first.
  • Make startup animations to allow parallel loading.
  • Use chain loading. Run only as many loaders as you have cores in your CPU (e.g two cores: two loaders running at the same time).
  • First loader should not be asynchronous. Trigger the rest of the loaders.
  • Create QML plugins that are loaded when required.
  • Connect to back-end services only when required.
  • Let the QML plugins start up non-critical services and close those down when not required anymore.
  • Optimize your png / jpg images.
  • Optimize your 3d models by reducing the amount of vertices and removing parts that are not visible.
  • Optimise the 3D model loading by using glTF.
  • Use Qt Quick Controls 2.0. These are designed for embedded use and the creation times are a lot better than in Quick Controls 1.0 for embedded use cases.
  • Limit the usage of clip & opacity.
  • Measure GPU limitations and take those into account when designing the UI.
  • Use Qt Quick Compiler to pre-compile the QML files.
  • Investigate if static linking is possible for your architecture.
  • Strive for declarative bindings instead of imperative signal handlers.
  • Keep property bindings simple. In general, keep QML code simple, fun and readable. Good performance follows.
  • When targeting multiple platforms and form factors, use file selectors instead of loaders and dynamic component instantiation. Don’t be shy to “duplicate” simple QML code and use file selectors to load tailored versions.

Do not:

  • Go overboard with QML. Even if you use QML, you don’t need to do absolutely everything in QML.
  • Initialize everything in your main.cpp.
  • Create big singletons that contain all the require interfaces.
  • Create complex delegates for Listviews.
  • Use Qt Quick Controls 1.0 for embedded.
  • Clip should be avoided altogether if possible. (98% of the use cases this should be possible).
  • Fall into the common trap of overusing Loaders. Loader is great for lazy-loading larger things like application pages, but introduces too much overhead for loading simple things. It’s not black magic that speeds up anything and everything. It’s an extra item with an extra QML context.
  • Overdo re-use. Maximum code re-use often leads to more bindings, more complexity, and less performance.

In the next part of the series, we will look more closely into the operating system side of the startup optimization. Stay tuned!

Do you like this? Share it
Share on LinkedInGoogle+Share on FacebookTweet about this on Twitter

Posted in Automotive, Biz Circuit & Dev Loop, Boot time, Embedded

13 comments

Sune says:

Are the demo code available? and what are the changesets you did to optimize it?

Thanks in advance

@Sune: The source code of the cluster application used as an example is not currently available. But the Do’s and Do not’s are valid for pretty much any Qt application.

Sune says:

I’m hoping you will publish the code (or something similar), especially the changes needed to such improvements.

While the do’s and don’t’s are valid as such, I just learn better by reading the diffs than by just looking at such a list.

John says:

“Clip should be avoided altogether if possible. (98% of the use cases this should be possible).”

What are the basis for determining that concrete number?

Alexander Lanin says:

Could you elaborate on not overusing qml? Currently all my GUI code is in qml which was the point or so I thought.

@Alexander: The main point was that application engine logic and “heavy lifting” is good to do in C++.

Marco Piccolino Marco Piccolino says:

Does having several smaller QML files incur great penalty as opposed to just a big one (excluding necessary extra signals / property aliases)? Does the file reading overhead impact much? Is there anything happening behind the scenes?

Lioric says:

I have never been a fan of showing UI in stages as an optimization, IMNHO it clearly looks as a “trick” (or worst, it tells about underpowered underneaths) . Obvioulsy you guys have done a great job with that demo cluster and the trick factor is really reduced.

What we do in our cluster/infotainment/onboard-computer is to write a (compressed) fb image early in the boot sequence (in our case we use car’s branding image that transitiones to the UI, but it can be a screen capture of the complete cluster) :

> lzopcat /boot/uiBaseLayer.fb.lzo > /dev/fb0

Writting this full screen image to the fb takes only about ~30ms to be displayed on a Omap4660 (that is a 5 years old platform). This is done at about 1.2 secs that takes the whole cold boot. So on a 5 years old system we have a 1.2s boot time to UI (and this is using a class 8 SD card). Obviuosly extensive optimizations needs to be done to the whole boot chain, u-boot, kernel, custom init daemon, and very optimized Qt app, similar to what you have made with this demo.

Lioric says:

On platforms where the bottelneck is the IO performance, reducing size provides the best gains, specially Qt static linking with -fdata-sections -ffunction-sections and -Wl,–gc-sections flags, gives tremendous gains

Similar on u-boot, a fixed static env improves boot time big time:
/lioric/pandaboard/uboot/u-boot-main/include/configs/omap3_panda.h
undef CONFIG_USB_TTY
undef CONFIG_ENV_IS_IN_NAND
define CONFIG_ENV_IS_NOWHERE
/home/lioric/pandaboard/uboot/u-boot-main/board/ti/panda/panda.c
comment switch statement for get_expansion_id() in misc_init_r method

Set fixed boot environment (Added defined to compile time select between NFS and MMC root FS)
/lioric/pandaboard/uboot/u-boot-main/include/configs/omap3_panda.h
#define FS_IN_MMC       “root=/dev/mmcblk0p2 ro ” \
                        “rootfstype=ext4 rootwait ”
#define FS_IN_NFS       “root=/dev/nfs ” \
                        “rw ” \
                        “nfsroot=192.168.0.100:/home/lioric/pandaboard/rootFS,nolock,rsize=1024,wsize=1024 ”
// Set this define for NFS
#define USE_NFS_ROOT 1
// Set this define for MMC
//#define USE_MMC_ROOT
#ifdef USE_MMC_ROOT
    #define FS_ROOT FS_IN_MMC
#endif
#ifdef USE_NFS_ROOT
    #define FS_ROOT FS_IN_NFS
#endif

Lioric says:

#define CONFIG_EXTRA_ENV_SETTINGS \
     “loadaddr=0x82000000\0” \
     “usbtty=cdc_acm\0” \
        “mmcdev=0\0” \
        “verify=no\0” \
        “bootargs=console=ttyO2,115200n8 ” \
                “noinitrd ” \
                “mem=64M ” \
                “ip=192.168.0.50 ” \
                FS_ROOT \
                “console=tty0 ” \
                “mpurate=800 ” \
                “vram=12M ” \
                “omapfb.mode=dvi:800x600MR-16@60\0”

#define CONFIG_BOOTCOMMAND \
“mmc rescan 0; fatload mmc 0:1 0x80300000 uImage; bootm 0x80300000”

The env var “verify=no” disables kernel crc test

I haven’t had the itme to read the first part of this “qt fast botting” series, but will do soon, we have worked on our project for 5 years now so we have collected largs amounts of data on this particular fast boot area, it is really nice to see Digia/Qt finally blogging about this

Alex says:

I have no idea how to create complex UIs like that in *any* language and UI framework, let alone Qt. What would be the quickest way to start learning this stuff?

Scorp1us says:

While not talked about here, there are many more tips for writing embedded code in general. It should be mentioned that since you’re using C++, you can’t be garbage collected and compacted. Therefore for any long-running system without VM and the ability to compact memory should only use dynamic memory at initialization, and never allocate/deallocate at runtime, as memory fragmentation will eventually start preventing memory allocations even though there is enough “free”.

There’s a list of 10, used by NASA, here’s a PDF.

http://pixelscommander.com/wp-content/uploads/2014/12/P10.pdf

Lioric says:

What “being C++” has to do with “can’t be garbage collected”?

Please don’t perpetuate the myth that C++ can’t be garbage collected. C++ standard library in fact doesn’t include any pre-made garbage collector implementation, but you can implement any memory management policy you need, including garbage colecting (mark-region, copying, mark-seep, or any other type you can come up with)

We are using a custom inhouse IMMIX based garbage collector, in our embedded projects. Still if you need a ready made GC library that is just a drop in lib without any other modifications to an existing C++ code base you can use Boehm GC, but performance should be profiled first to see if it is acceptable in each specific case

The list in your link is about “Safety critical code” and has noting to do with embedded

*As a matter of fact, if you are using qml, you are already using a garbage collector (from the v4 engine)

Commenting closed.

Get started today with Qt Download now