Significant Performance Improvements with Qt 3D Studio 2.4

Speed of the 3D rendering is essential for a 3D engine, in addition to efficient use of system resources. The upcoming new Qt 3D Studio 2.4 release brings a significant boost to rendering performance, as well as provides further savings on CPU and RAM utilization. With our example high-end embedded 3D application the rendering speed is improved whopping 565%, while the RAM use and CPU load are down 20% and 51% respectively. 

Performance is a key driver for Qt and especially important for being able to run complex 3D applications on embedded devices. We have been constantly improving the resource efficiency with earlier releases of Qt 3D Studio and with the upcoming Qt 3D Studio 2.4 takes a major step forward in rendering performance. The exact performance increase depends a lot on the application and used hardware, so we have taken two example applications and embedded hardware for a closer look in this blog post. The example applications used in this post are automotive instrument clusters, but similar improvement can be seen in any application using Qt 3D Studio runtime.

Entry-level embedded example with Renesas R-Car D3

The entry-level embedded device used in the measurement is Renesas R-Car D3, which has the Imagination PowerVR GE8300 entry-class GPU (https://www.imgtec.com/powervr-gpu/ge8300/) and one ARM Cortex A53 CPU core. Operating system is Linux.

The example application used is the low-end cluster, available at https://git.qt.io/public-demos/qt3dstudio/tree/master/LowEndCluster. The low-end cluster example is well optimized, as described in a detailed blog post about optimizing 3D applications.

lowendcluster

 

In order to make the application as lightweight as possible, only the ADAS view is created as a real-time 3D user interface. Other parts of the instrument cluster are created with Qt Quick. This allows having a real-time 3D user interface even on a entry-class hardware like Renesas R-Car D3.

High-end embedded example with NVIDIA Tegra X2

The high-end embedded device used in the measurement is NVIDIA Jetson TX2 development board equipped with Tegra X2 SoC, which has 256-core NVIDIA Pascal™ GPU and Dual-Core NVIDIA Denver 2 64-Bit as well as Quad-Core ARM Cortex-A57 MPCore CPUs. Operating system is Linux.

The example application used is the Kria cluster, available at https://git.qt.io/public-demos/qt3dstudio/tree/master/kria-cluster-3d-demo. The Kria cluster example is made intentionally heavy with large and not fully optimized textures, high resolution etc.

kria3dclusterdemo

In the high-end example all the gauges and other elements are real-time 3D, rendered with the Qt 3D Studio runtime. There are very few Qt Quick parts and these are brought into the 3D user interface using texture sharing via QML streams.

Rendering performance improvement

The biggest improvement with the new Qt 3D Studio 2.4 release is to the rendering performance - getting the same application to render more Frames Per Second (FPS) on the same hardware. As always with Qt we aim to run steady 60 FPS, but on embedded devices pure performance is not enough. When there are items like heat management and tackling different usage scenarios it typically pays off not to run on the very edge of the SoC's graphics capabilities. In the case of an application such as an instrument cluster, the performance needs to be smooth in all operation conditions, including under maximum load of the system. For measurement purposes with the high-end example we have disabled vsync, allowing the system to draw as many frames it can. In a typical real-life application there always is the vsync set, so anything that we can go over 60 FPS means saved processing resources.

The graphs below show the measured Frames Per Second with the high-end example on NVIDIA TX2 (vsync off) and with the low-end example on Renesas R-Car D3 (vsync on):

studio_fps

High end example: With the new Qt 3D Studio 2.4 we see a a whopping 565% improvement in the rendering performance. With Qt 3D Studio 2.3 the application was running only at 20 FPS, but the new Qt 3D Studio 2.4 allows the application to run 133 FPS. This is measured turning off vsync, just to measure the capability of the new runtime. In practice running 60 FPS is enough, and the additional capacity of the processor can be leveraged to have a larger screen (or another screen) or more complex application - or simply by not using the maximal capacity of the SoC to save on power.

Low-end example: The improvement is 46% because the maximum FPS is capped to 60 FPS by Qt Quick. With Qt 3D Studio 2.3 the application achieved 41 FPS, and with the new 2.4 runtime it reaches 60 FPS easily. Just like with the more powerful high-end hardware the excess capacity of the SoC can be used for running a more complex 3D user interface, or simply left unused.

CPU load improvement

The overall CPU load of an application is a sum of multiple things, one of them being the load caused by the 3D engine. In embedded applications it is important that using 3D in the application does not cause excessive load for the CPU. If the application exceeds the available CPU, it will not be able to render at target FPS and stuttering or other artefacts may appear on the screen.

The graphs below show the measured CPU load with the high-end example on NVIDIA TX2 and with the low-end example on Renesas R-Car D3:

studio_cpu

High-end example: With the new Qt 3D Studio 2.4 we see a hefty 51% improvement in the CPU load compared to Qt 3D Studio 2.3 while at the same time the FPS goes from 20 FPS to 133 FPS. The overall load with the Runtime 2.3 is 167% (of total 400%) and with the Runtime 2.4 the load drops to 81%. Note that the increased rendering speed has its effect on the CPU load as well. With the vsync on and FPS capped to 60 FPS, the CPU load is 74%.

Low-end example: We see only a modest 5% improvement in the CPU load, mainly due to the application being mostly Qt Quick. But this is with FPS going from 41 FPS up to 60 FPS at the same time. It should also be noted that the CPU of R-Car D3 is not very powerful, so the increased FPS of the overall application has its effect to the overall CPU load.

Memory usage improvement

For any graphics and especially 3D it is the assets that typically takes most of the RAM. There are ways to optimize, most notably avoiding unnecessary level of detail and leveraging texture compression. For the purposes of this blog post, we do not leverage any specific optimization methods. The measurements are done with exactly the same application, no other changes than using a different version of the Qt 3D Studio runtime.

The graphs below show the measured RAM use with the high-end example on NVIDIA TX2 and with the low-end example on Renesas R-Car D3:

studio_ram

High-end example: With the new Qt 3D Studio 2.4 we see a reduction of 48MB compared to Qt 3D Studio 2.3. This is 20% reduction to the overall RAM usage of the application.

Low-end Example: In the simpler example the reduction of RAM use is 9MB when using the new 2.4 runtime. Percentage-wise this is is a 15% reduction to the overall RAM usage of the application.

How was this achieved?

The improvements are really big especially on embedded, so one may wonder what was changed in the new version? What we did is to use the same runtime architecture as with Qt 3D Studio 1.x releases instead of running on top of Qt 3D. The core logic of the 3D engine is still the same as before, but it is running directly on top of OpenGL instead of using Qt 3D. This provides significantly improved performance especially on embedded devices, but also on more powerful desktop systems. By running Studio's 3D engine directly on top of OpenGL we avoid overhead in rendering and simplify the architecture. The simpler architecture translates to less internal signalling, less objects in memory and reduced synchronization needs between multiple rendering threads. All this has allowed us to make further optimizations over the Qt 3D Studio 1.x - and of course to bring the new features developed in the Qt 3D Studio 2.x releases on top of the OpenGL based runtime.

The change in 3D runtime does not require any changes for most projects. Just change the import statement (import QtStudio3D.OpenGL 2.4 instead of import QtStudio3D 2.3) and then recompilation with new Qt 3D Studio 2.4 is enough. As API and the parts of the 3D engine relevant for the application are the same as earlier, all the same materials, shaders etc work just like before. In the rare cases where some changes are needed e.g. for some custom material, these are rather small.

Get Qt 3D Studio 2.4

If you have not yet tried out the Qt 3D Studio 2.4 pre-releases, you should take that for a spin. It is available with the online installer under the preview node. Currently we have the third Beta release out and soon provide the Release Candidate. Final release is targeted to be out before end of June. Qt 3D Studio is available under both the commercial and open-source licenses.


Blog Topics:

Comments