Performance regression testing of Qt Quick

We recently added a new toy to The Qt Project, and I wanted to give an overview on what it is and how it can be used.

Grafana showing results of running benchmarks Grafana showing results of running benchmarks

qmlbench testing

The new toy is automated performance testing, and the background (long story short) is that in order to avoid regressions in performance, automated testing is required. This is no different from avoiding regressions in functionality, and it may sound obvious. But while we have many benchmarks in Qt, we have unfortunately only been running them manually, and, as a result, regressions have snuck into the released product from time to time.

This year, The Qt Company decided to invest into fixing the problem, and I was given the task to set up performance testing of Qt Quick and the scene graph. The plan is to add benchmarks for all areas of Qt to the same system, but in order to get the infrastructure in place first, we decided to use Qt Quick as the pilot project.

I decided to go with the tool that Robin Burchell and Gunnar Sletta wrote for this, recently contributed to The Qt Project: qmlbench.

The tool is designed to measure the performance of Qt Quick primitives and functionality, and it has a suite of QML files that are intended for automated testing.

To be able to see how performance evolves over time and to detect even smaller regressions, we decided to go with a graph approach that we manually monitor for changes. While we are still working to reduce it as much as possible, there will always be a bit of noise and fluctuation in the benchmarks, due to the hardware and operating systems themselves not being 100% predictable.

The resulting graphs can be seen in testresults.qt.io.

And the news are good: Qt 5.9 is a great improvement on Qt 5.6 (I expect around 15% on average when it is released) and the tool has already detected performance regressions which we have subsequently fixed. Lets go into some more detail.

How to use the UI

When you first look at the page linked above, there will be a lot of information at once, but luckily it is easy to change the view to only show what you are interested in.

Say for instance that I only want to see the results of a single benchmark, e.g. "changing_over_isolated_with_clip_rotated". I can do that by using the "benchmark" selector in the top left corner.

select benchmark

In the same row, you can also select which machine you want to see results for. The currently available ones are "eskil_linux_foucault", which is a Ubuntu desktop machine with an NVIDIA Quadro graphics card, and "eskil_linux_tx1" which is an arm-based NVIDIA Jetson TX1, also running Ubuntu.

In the view of the graphs themselves, you can turn on and off data sources by clicking on the labels (hold shift to select multiple). By default, you will see the benchmark results and the coefficient of variation for each branch. On desktop, it is currently running for 5.6, 5.8, 5.9, dev and v5.8.0. The last one is there as a control, to be able to exclude changes to qmlbench itself as well as OS and hardware related issues. On the TX1, we are only running for 5.6 and 5.9 at the moment, given the time it takes for a single test run on that hardware.

selectsources

In the screenshot above, I have disabled everything except the 5.6 and 5.9 benchmark results.

Finally: In the top right corner, you can select the time resolution of the current view. You can also zoom in on an area of a graph by clicking and dragging over the area. All the different tables and graphs will adapt automatically to the currently selected time interval.

This was just a very quick overview of Grafana. Consult the full documentation for more information.

How to use the graphs

The main way to use the graphs is to monitor them and look for changes that persist over time. Note that higher is better for these benchmarks, as they are counting how many frames can be drawn during a predefined time period (20 seconds) for a predefined test size. There is also a table that shows the median in the current interval, for an easy way to compare branches and detect regressions/improvements.

Also note that the rendering caps at 60 fps, so the maximum result we can get is 1200 frames per 20 seconds. So graphs will cap at this level and we will not get much useful information from them before we increase the complexity of the scenes they are rendering.

In addition to the performance data, you also have access to the "coefficient of variation" for each test run. The scale of the CoV is on the right side of the graph view. When running the benchmark, we will do five runs in sequence and report the average of this to the database. This is to avoid always running on cold data and it decreases the fluctuations in the graphs. The CoV is based on the standard deviation of these datasets of five results. It can give you an idea of the stability of the test results, though there are some very stable test that still report a high CoV, such as the "sum10k.qml" test. For some reason which we have yet to figure out, the first run will always be around 230 frames on that particular machine and then all the subsequent runs will be 440. So the average is stable, but the CoV is through the roof. Since this is not reproducible anywhere else, Robin suggested the machine is haunted, which is so far our best guess.

If you look at each of the graphs, you will see that for most of them the "5.9" line is above the "5.6" line, which is good, since it means performance is improving. On average, I expect that Qt 5.9 will give approximately 15% better performance than Qt 5.6. If you look specifically at any of the layout tests, you will see that you can get around 50% performance improvement on your Qt Quick layouts just by upgrading to Qt 5.9 from Qt 5.6.

Graphs showing 53% performance improvement on creation of column layouts. Graphs showing 53% performance improvement on creation of column layouts in Qt 5.9 compared to Qt 5.6.

The main objective of having this tool, however, is to detect and fix regressions, and there was one test that worried us: the "changing_over_isolated_with_clip_rotated" test case.

This test verifies that we can isolate a subtree in the scene graph by using a clip node even if the clip is rotated, so that it is put into its own batch in the renderer. Thus changing it does not affect the other parts of the scene graph. If you look at the graph in Grafana, you will see that until very recently, we had a significant regression in that area. On April 25th, the regression was fixed, so it will never be released to users.

changing_over_isolated_with_clip_rotated

Now, if I saw this graph and wanted to find out what happened, there is another helpful tool which I haven't yet talked about. You can enable it by checking the "annotations" checkbox on the top of the page. This will show, as events in the graphs, the SHA1s for the different modules we are testing at any given time. Note that this is still a bit of a work in progress: To avoid cluttering the graphs too much, we are only reporting the SHA1s for the eskil_linux_foucault machine. If there are several events at the same time, you will also have to zoom in a bit to be able to separate them. So the UI isn't perfect, but it works when you need to dig for information.

If I do zoom in on the first result where performance has been improved, I will see that one of the updates was to Qt Declarative. I can hover over the event to see the SHA1 of the update.

annotation

By going to the git repository, I can easily find the diff between the two commits listed. If I do, I will see that one of them is Fix a performance regression noted by qmlbench by Robin Burchell.

So the system is working. If regressions pop up while the tool is running, we will be able to find the offending commit just as easily.

The future

What's next you ask? We will of course continue to improve the qmlbench test set as well as the tool itself. We may also add more platforms to the farm when the need arises.

In addition, there are several other people in The Qt Company who are working on getting benchmarks for other parts of Qt into the same system. In the time to come, we will get unit benchmarks for different parts of Qt, startup time measurements, Qt Creator performance testing, QML engine testing, etc. into the same system. Look for updates here and on the main benchmark home page.


Blog Topics:

Comments