Reducing Application Size using Link Time Optimization

January 02, 2019 by Simon Hausmann | Comments

We need to talk about calories! Not the calories from your Christmas cookies -- those don’t count. But, calories in your Qt application. We’re going to take a look at a technique that is easy to enable and helps you save precious bytes around your application’s waistline.

The Old vs The New

Traditionally, you would build your application by letting the compiler translate your .cpp source files to machine code. The result is stored in .o object files, which we then pass over to the linker, to resolve references between the files. At this point, the linker does not change the machine code that was generated. This division of work between the compiler and the linker allows for quick development cycles. If you modify one source file, only that file gets recompiled and then the linker quickly re-assembles the application’s binary. Unfortunately, this also means that we are missing out on an opportunity to optimize.

Imagine that your application has two functions: main() in main.cpp and render() in graphics.cpp. As an experienced developer, you keep all your graphics code encapsulated in the render() function -- anyone can call it from anywhere! In reality, it is only the application’s main() that calls render(). Theoretically, we could just copy and paste the code in render() into main() -- inlining it. This would save the machine code instructions in main() to call render(). Once that’s done, we may even see opportunities to reuse some variables and save even more space and code. Now, if we tried to do this by hand, it would quickly escalate into Spaghetti code with lots of sauce.

Luckily, most compilers these days offer a technique that allows you apply such optimizations (and deal with the spaghetti mess) while retaining the modularity and cleanliness of your code. This is commonly called “Link Time Optimizations” or “Link Time Code Generation”. The latter describes best what really happens: Instead of compiling each source file to machine code one-by-one, we delay the code generation step until the very end -- linking time. Code generation at linking time not only enables smart inlining of code, but it also allows for optimizations such as de-virtualizing methods and improved elimination of unused code.

Link Time Optimization in Qt

To enable this technique in Qt, you have to build from source. At the configure step, add -ltcg to the command line options. We thought hard, and this is the most cryptic and vowel-free name we could come up with ;-)

To demonstrate the effectiveness of Link Time Code Generation, let’s look at a fresh build of the Qt 5.12 branch, compiled with GCC 7.3.0 for ARMv7 against an imx6 Boot2Qt sysroot. For analysis, we’re going to use Bloaty McBloatface (https://github.com/google/bloaty), which is a lovely size profiler for binaries. The Qt Quick Controls 2 Gallery, statically linked, serves as a sample executable. When running bloaty on it, with a regular Qt build, you’ll see output like this:

    VM SIZE                      FILE SIZE

 --------------                --------------

   0.0%       0 .debug_info      529Mi  83.2%

   0.0%       0 .debug_loc      30.4Mi   4.8%

   0.0%       0 .debug_str      18.6Mi   2.9%

   0.0%       0 .debug_line     14.2Mi   2.2%

  68.1%  13.9Mi .text           13.9Mi   2.2%

   0.0%       0 .debug_ranges   9.60Mi   1.5%

   0.0%       0 .debug_abbrev   6.29Mi   1.0%

  29.5%  6.01Mi .rodata         6.01Mi   0.9%

   0.0%       0 .strtab         3.17Mi   0.5%

   0.0%       0 .symtab         2.35Mi   0.4%

   0.0%       0 .debug_frame    1.80Mi   0.3%

   0.0%       0 .debug_aranges   485Ki   0.1%

   1.2%   249Ki .data.rel.ro     249Ki   0.0%

   0.3%  68.2Ki .ARM.extab      68.2Ki   0.0%

   0.2%  38.2Ki .bss                 0   0.0%

   0.1%  30.3Ki [25 Others]     35.4Ki   0.0%

   0.1%  30.3Ki .got            30.3Ki   0.0%

   0.1%  24.1Ki .ARM.exidx      24.1Ki   0.0%

   0.1%  15.1Ki .dynstr         15.1Ki   0.0%

   0.1%  13.6Ki .data           13.6Ki   0.0%

   0.1%  13.2Ki .dynsym         13.2Ki   0.0%

 100.0%  20.4Mi TOTAL            637Mi 100.0%

The “VM SIZE” column is what’s particularly interesting to us -- it tells us how much space the different sections of the program consume when loaded into memory. Here, we see that the total cost will be ~20 MB.

Now, let’s compare that to a build with -ltcg enabled.

Comparison between regular and LTCG build The new VM size is at 17.3 MiB -- that’s nearly a 15% reduction in cost, just by passing a parameter to configure.

This drastic gain here is because we chose a static build. However, even when you use a dynamic build, this optimization is worth it. In this case, LTCG is applied at the boundary of shared libraries.

Bloaty can show this by comparing a regular build against an LTCG-enabled build of libQt5Core.so.5.12.0:

    VM SIZE                      FILE SIZE

 --------------                --------------

...

 -53.8%     -28 [LOAD [RW]]          0  [ = ]

...

 -11.9% -1.78Ki .got           -1.78Ki -11.9%

  -0.2% -3.05Ki .rodata        -3.05Ki  -0.2%

 -10.0% -3.54Ki .rel.dyn       -3.54Ki -10.0%

 -17.2% -7.52Ki .ARM.exidx     -7.52Ki -17.2%

 -16.9% -18.4Ki .ARM.extab     -18.4Ki -16.9%

...

 -21.2%  -691Ki .text           -691Ki -21.2%

 -13.9%  -727Ki TOTAL           -838Ki -13.8%

The linker produced a smaller library with less code, less relocations, and a smaller read/write data section.

Conclusion

At this point, this seems like a win-win situation, and you may wonder: Why isn’t this enabled by default? No, it’s not because we’re stingy ;-)

One issue is that in the Qt build system, currently, this is a global option. So if we were to enable this with the Qt binaries, everyone using them will be slowed down and it requires them to opt-out explicitly, in the build system. We’re working on fixing that, so that eventually, we can ship Qt with LTCG enabled, and then you can enable this at application level.

Another issue is that by delaying the code generation to link time, we are increasing the time it takes from modifying a single source file to creating a new program or library. It’s almost as if you touch every single source file every time, making it less practical for day-to-day use. But, this optimization is definitely something that fits well into the release process, when creating your final build. So, your Release Manager can use it.

Development Framework & Tools

Qt Framework

Qt Development Tools

Qt Design Studio

Qt Quality Assurance

Qt Digital Ads

Qt Insight

Quality Assurance Tools

Squish

Coco

Test Center

Axivion Static Code Analysis

Axivion Architecture Verification

More

Qt 6

Licensing

Qt Features

Qt for Python

Industry & Platform Solutions

Industry

Automotive

Micro-Mobility Interfaces

Consumer Electronics

Industrial Automation

Medical Devices

Platform

Desktop, Mobile & Web

Embedded Devices

MCU (Microcontrollers)

Cloud Solutions

More

Next-Gen UX

Limitless Scalability

Productivity

Our Ultimate Collection of Resources

Development Framework & Tools

Qt Resource Center

Qt Blog

Qt Success Stories

Qt Demos

Quality Assurance Tools

QA Resources

QA Blog

QA Success Stories

More

Live Events & Webinars

Documentation

Take Learning Qt to the Next Level

Learn with us

Qt Academy

Qt Educational License

Qt Documentation

Qt Forum

We're Here for You—Support and Services

Helpful Links

Contact Us

Qt Partners

Qt Support

Qt Customer Portal

Qt Customer Success

Qt Professional Services

Reducing Application Size using Link Time Optimization

The Old vs The New

Link Time Optimization in Qt

Conclusion

Blog Topics:

Comments

LEARN MORE

Embedded Devices with Qt

Subscribe to our newsletter

Subscribe Newsletter

Try Qt 6.7 Now!

We're Hiring

Read Next

Qt for MCUs 2.5.3 LTS Released

Qt for Python release: 6.7 is now available! 🐍

Qt Creator 13 - CMake Update