Simon Hausmann

Reducing Application Size using Link Time Optimization

Published Wednesday January 2nd, 2019
39 Comments on Reducing Application Size using Link Time Optimization
Posted in Compilers, Dev Loop, Embedded, Performance

We need to talk about calories! Not the calories from your Christmas cookies — those don’t count. But, calories in your Qt application. We’re going to take a look at a technique that is easy to enable and helps you save precious bytes around your application’s waistline.

The Old vs The New

Traditionally, you would build your application by letting the compiler translate your .cpp source files to machine code. The result is stored in .o object files, which we then pass over to the linker, to resolve references between the files. At this point, the linker does not change the machine code that was generated. This division of work between the compiler and the linker allows for quick development cycles. If you modify one source file, only that file gets recompiled and then the linker quickly re-assembles the application’s binary. Unfortunately, this also means that we are missing out on an opportunity to optimize.

Imagine that your application has two functions: main() in main.cpp and render() in graphics.cpp. As an experienced developer, you keep all your graphics code encapsulated in the render() function — anyone can call it from anywhere! In reality, it is only the application’s main() that calls render(). Theoretically, we could just copy and paste the code in render() into main() — inlining it. This would save the machine code instructions in main() to call render(). Once that’s done, we may even see opportunities to reuse some variables and save even more space and code. Now, if we tried to do this by hand, it would quickly escalate into Spaghetti code with lots of sauce.

Luckily, most compilers these days offer a technique that allows you apply such optimizations (and deal with the spaghetti mess) while retaining the modularity and cleanliness of your code. This is commonly called “Link Time Optimizations” or “Link Time Code Generation”. The latter describes best what really happens: Instead of compiling each source file to machine code one-by-one, we delay the code generation step until the very end — linking time. Code generation at linking time not only enables smart inlining of code, but it also allows for optimizations such as de-virtualizing methods and improved elimination of unused code.

Link Time Optimization in Qt

To enable this technique in Qt, you have to build from source. At the configure step, add -ltcg to the command line options. We thought hard, and this is the most cryptic and vowel-free name we could come up with 😉

To demonstrate the effectiveness of Link Time Code Generation, let’s look at a fresh build of the Qt 5.12 branch, compiled with GCC 7.3.0 for ARMv7 against an imx6 Boot2Qt sysroot. For analysis, we’re going to use Bloaty McBloatface (https://github.com/google/bloaty), which is a lovely size profiler for binaries. The Qt Quick Controls 2 Gallery, statically linked, serves as a sample executable. When running bloaty on it, with a regular Qt build, you’ll see output like this:

    VM SIZE                      FILE SIZE
 --------------                --------------
   0.0%       0 .debug_info      529Mi  83.2%
   0.0%       0 .debug_loc      30.4Mi   4.8%
   0.0%       0 .debug_str      18.6Mi   2.9%
   0.0%       0 .debug_line     14.2Mi   2.2%
  68.1%  13.9Mi .text           13.9Mi   2.2%
   0.0%       0 .debug_ranges   9.60Mi   1.5%
   0.0%       0 .debug_abbrev   6.29Mi   1.0%
  29.5%  6.01Mi .rodata         6.01Mi   0.9%
   0.0%       0 .strtab         3.17Mi   0.5%
   0.0%       0 .symtab         2.35Mi   0.4%
   0.0%       0 .debug_frame    1.80Mi   0.3%
   0.0%       0 .debug_aranges   485Ki   0.1%
   1.2%   249Ki .data.rel.ro     249Ki   0.0%
   0.3%  68.2Ki .ARM.extab      68.2Ki   0.0%
   0.2%  38.2Ki .bss                 0   0.0%
   0.1%  30.3Ki [25 Others]     35.4Ki   0.0%
   0.1%  30.3Ki .got            30.3Ki   0.0%
   0.1%  24.1Ki .ARM.exidx      24.1Ki   0.0%
   0.1%  15.1Ki .dynstr         15.1Ki   0.0%
   0.1%  13.6Ki .data           13.6Ki   0.0%
   0.1%  13.2Ki .dynsym         13.2Ki   0.0%
 100.0%  20.4Mi TOTAL            637Mi 100.0%

 

The “VM SIZE” column is what’s particularly interesting to us — it tells us how much space the different sections of the program consume when loaded into memory. Here, we see that the total cost will be ~20 MB.

Now, let’s compare that to a build with -ltcg enabled.

Comparison between regular and LTCG buildThe new VM size is at 17.3 MiB — that’s nearly a 15% reduction in cost, just by passing a parameter to configure.

This drastic gain here is because we chose a static build. However, even when you use a dynamic build, this optimization is worth it. In this case, LTCG is applied at the boundary of shared libraries.

Bloaty can show this by comparing a regular build against an LTCG-enabled build of libQt5Core.so.5.12.0:

    VM SIZE                      FILE SIZE
 --------------                --------------
...
 -53.8%     -28 [LOAD [RW]]          0  [ = ]
...
 -11.9% -1.78Ki .got           -1.78Ki -11.9%
  -0.2% -3.05Ki .rodata        -3.05Ki  -0.2%
 -10.0% -3.54Ki .rel.dyn       -3.54Ki -10.0%
 -17.2% -7.52Ki .ARM.exidx     -7.52Ki -17.2%
 -16.9% -18.4Ki .ARM.extab     -18.4Ki -16.9%
...
 -21.2%  -691Ki .text           -691Ki -21.2%
 -13.9%  -727Ki TOTAL           -838Ki -13.8%

 

The linker produced a smaller library with less code, less relocations, and a smaller read/write data section.

Conclusion

At this point, this seems like a win-win situation, and you may wonder: Why isn’t this enabled by default? No, it’s not because we’re stingy 😉

One issue is that in the Qt build system, currently, this is a global option. So if we were to enable this with the Qt binaries, everyone using them will be slowed down and it requires them to opt-out explicitly, in the build system. We’re working on fixing that, so that eventually, we can ship Qt with LTCG enabled, and then you can enable this at application level.

Another issue is that by delaying the code generation to link time, we are increasing the time it takes from modifying a single source file to creating a new program or library. It’s almost as if you touch every single source file every time, making it less practical for day-to-day use. But, this optimization is definitely something that fits well into the release process, when creating your final build. So, your Release Manager can use it.

Do you like this? Share it
Share on LinkedInGoogle+Share on FacebookTweet about this on Twitter

Posted in Compilers, Dev Loop, Embedded, Performance

39 comments

John says:

>At the configure step, add -ltcg to the command line options.
>…
>One issue is that in the Qt build system, currently, this is a global option.

Is it about qmake?

Simon Hausmann Simon Hausmann says:

Yes

tim blechmann says:

when using other build-systems, can i just enable lto for qt and in my application build-system compile/link with lto enabled or disabled? i’m mainly concerned about using statically linked qt

Simon Hausmann Simon Hausmann says:

If you build Qt with LTO enabled, then you face three options:

(1) If your Qt build is dynamically linked, then you can freely choose LTO/non-LTO in your application.
(2) If your Qt build is statically linked, then by default your application needs to use LTO as well.
(3) If you’re willing to hack the build system in Qt, then you can edit the mkspecs and change the compiler flags to instruct the compiler to create “fat” object files, which contain the IR for use with LTO as well as normally compiled code. Then you can freely choose on the application level again.

tim blechmann says:

`(3)` is indeed the use case i’m talking about: in the compile/run workflow i want to avoid lto, to reduce turnaround times but for release builds i’d like to have LTO enabled. so i’m wondering: any thoughts about defaulting to fat object files?

Cochise says:

So, every Linux distribution can update their packages to use -ltcg on Qt libs without breaking all apps using Qt or forcing users to use -ltcg if they want to develop against the system libraries?
This is really nice.

Vladimir says:

As far as I know, we can theoretically use LGPL license with static builds as long as we publish the object files so that the users can link these with Qt themselves. Please correct me if I am wrong. How would this option go with this link time optimization?

Simon Hausmann Simon Hausmann says:

I can’t correct you because I’m not a lawyer 🙂

In my opinion it does not matter what the object files contain (native code or some compiler-specific representation) – what matters is that the user of the software has the freedom to change the LGPL licensed parts of the work against a modified version. With “change” I mean whatever necessary steps that allow running the application afterwards, with the modifications included.

grecko says:

Kinda deceptive chart

Andre Somers says:

Indeed. Make graphs start at 0 please.

Mike says:

I was going to mention the same thing. The y-axis should start at zero to allow readers to easily see the 15% difference.

Brendon says:

I was going to say that. It’s a canonical example of its type, really – there’s no justifiable reason to offset the zero on the y-axis like that. It’s an odd distraction from an otherwise informative article.

Nyall says:

s/kinda/very/

There’s no justification for the misleading vertical axis here.

Simon Hausmann Simon Hausmann says:

I honestly have no intention of deceiving or misleading anybody. That would imply an ulterior motive, which I don’t really have here. I mean, you’re free to use LTO or not use it, I won’t judge anybody :-).

That said, I merely entered the numbers into Excel and the axis formatting defaulted to this. I think it does nicely emphasize the benefit, while still showing the absolute numbers – which are also mentioned in the text.

Mike says:

Sometimes the defaults do not yield the best way to convey the information. When the y-axis starts at zero, the relative heights of the bars have real meaning as opposed to when an arbitrary minimum value for the y-axis is chosen.

To me this graph is easier to read than same size graph starting from zero. As said in the text gain is 15% and surely everyone knows what a 15% gain looks visually. The interesting item of the chart are the numbers in my opinion.

Mike says:

Then just use a table if it’s the numbers that you want to highlight and you don’t feel that the bar heights are important.

Jean-Michaël Celerier says:

Was this bug fixed ? https://bugreports.qt.io/browse/QTBUG-61710

Simon Hausmann Simon Hausmann says:

Unfortunately I don’t know if this was fixed in newer versions of clang (version 4 was released almost two years ago). Static builds of Qt are not affected by this.

Thiago Macieira says:

Just use GCC. I’ve been running GCC LTO build of Qt for the past 4 years with no trouble. It’s the only configuration guaranteed to work because I test it.

Anything else, YMMV and you may need to send patches.

David Grayson says:

The debug information sections are distracting and confusing so I’d definitely recommend running `strip` on binaries before generating the size profiles.

I personally care about file size a bit more than memory footprint because I want fast downloads. I suppose the file size will go down by about the same number of bytes as the memory footprint, but it would still be good to double-check that and mention it in the article.

Also, snippets of code like `-ltcg` should be in a monospace font, and not have linewrapping (I’m getting line wrapping after the dash in my browser).

Simon Hausmann Simon Hausmann says:

Yes, the file size shrinks as well.

Thanks for the formatting feedback, I’ll fix that 🙂

Thiago Macieira says:

Please note the side-effect to static libraries in the Qt build with LTCG. Even when building a shared Qt (dynamic libraries), there are a handful of static libraries produced. All but one of them are private, so if you use any of those, you ought to know what you’re doing and we won’t care if we break your build.

But then there’s libQt5UiTools.a. For some legacy reasons, it’s always a static library. And if you built Qt with LTCG, then that library will also contain intermediate representation code (GCC Gimple, Clang LLVM, etc.). That means you MUST use the exact same compiler that was used to build Qt or your build will fail. Read: same OS, same compiler version and release.

Nick says:

Do you think it’s safe (or a good idea) to enable ltcg in a distro packaged Qt? (For a KDE desktop, for example.)

Simon Hausmann Simon Hausmann says:

Sounds like a good idea to me, yes. After all, other parts of your Linux desktop are built with ltcg as well, such as Firefox.

Sandro F says:

Hmm..unfortunately this will not work for Android builds because it is not possible to build Qt statically for Android.

So for the most important platform all the improvements are not useable at all. This is very bad :-(. On desktop platform the user does not really care if the application is 15MB or 25MB big.

Simon Hausmann Simon Hausmann says:

I think in theory a static build should be possible for Android, no? From the Android runtime perspective, we have a Java program that starts up and that dynamically opens one shared object. The runtime doesn’t care if that shared object was created from a bunch of static libraries with link time code generation (as long as the final code is position independent).

I understand that this may be a fair amount of work to implement though, on the build system side in particular.

But even with a build using shared libraries, I think link time code generation is worth it and should give you benefits. I wouldn’t call it “not useable at all”.

Sandro F says:

Well, in theory it is all possible – it is just software, right ;-)?

But it seems that Qt Company will not put any effort of fixing this:

https://bugreports.qt.io/browse/QTBUG-32618 (out of scope)

“We have decided not to support static builds on Android due to the technical challenges involved.”

Of course I can build my application statically against dynamic build Qt shared libs. But that are only minor improvements because the main size of an Android application takes the shared Qt libraries.

Simon Hausmann Simon Hausmann says:

Right, and the shared Qt libraries still become smaller (and faster) if you build them with LTCG – even if they are shared. What do you loose if you enable it?

Sandro F says:

Okay, you are right. If the size of the shared libraries become smaller just with enabled LTCG option I will give it a try!

Sandro F says:

Hm. Building Qt with -ltcg enabled for Android ends with following error (Qt 5.9.7):

/opt/Android/android-ndk/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/arm-linux-androideabi-g++ –sysroot=/opt/Android/android-ndk/platforms/android-16/arch-arm/ -D__ANDROID_API__=16 -isystem /opt/Android/android-ndk/sources/cxx-stl/gnu-libstdc++/4.9/include /opt/Android/android-ndk/sources/cxx-stl/gnu-libstdc++/4.9/libs/armeabi-v7a/include -fstack-protector-strong -DANDROID -march=armv7-a -mfloat-abi=softfp -mfpu=vfp -fno-builtin-memmove -Os -mthumb -std=gnu++11 -fno-exceptions -flto=8 -fno-fat-lto-objects -fuse-linker-plugin -Wl,-soname,libjava.so -Wl,–no-undefined -Wl,-z,noexecstack -shared -fPIC -o libjava.so -L/home/s.frenzel/Projects/technihome_app/contrib/openssl/android_arm -L/opt/Android/android-ndk/sources/cxx-stl/gnu-libstdc++/4.9/libs/armeabi-v7a -L/opt/Android/android-ndk/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/../lib/gcc/arm-linux-androideabi/4.9 -lgnustl_shared -lgcc -llog -lz -lm -ldl -lc
/opt/Android/android-ndk/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/../lib/gcc/arm-linux-androideabi/4.9/../../../../arm-linux-androideabi/bin/ld: fatal error: /opt/Android/android-ndk/sources/cxx-stl/gnu-libstdc++/4.9/libs/armeabi-v7a/include: pread failed: is a directory
collect2: error: ld returned 1 exit status
make[4]: *** [libjava.so] error 1

Konstantin says:

Please teel me how to add Link Time Optimization option to gcc compiler in my pro file?

Dave says:

I do this:
QMAKE_CFLAGS_RELEASE += -flto
QMAKE_CXXFLAGS_RELEASE += -flto
QMAKE_LFLAGS_RELEASE += -flto

Nick says:

CONFIG += ltcg

dj-jatt says:

thanks for the article keep writing..

Dave says:

Everyone talks about building Qt yourself to enable the link time options. Why does the Qt official build just do it for shared libraries. Then licensed customers as well as distro custromers will just have it. I have found that using -flto with Qt, depending on the project doesn’t always work. Especially when using plugins, I get linker failures to find objects. Remove the LTO and it is fine. So, again, lets just get Qt distributed with this optimization and distribute a leaner, meaner Qt.

Vivi says:

>At this point, this seems like a win-win situation
It seem like only one compiler on one platform was tested to say that it is win-win everywhere … or I missed something?

>everyone using them will be slowed down and it requires them to opt-out explicitly, in the build system.
>We’re working on fixing that, so that eventually, we can ship Qt with LTCG enabled,
>and then you can enable this at application level.
Before wasting time on making Qt to be defaultly LTCG-enabled please test that it worth it. I tried ltcg on Windows on msvc compiler from 2010 – 2017 and with Qt 5.4 and 5.12. It is interesting that enabling of ltcg gives 1-3% INCREASE of executable size (compiler was asked to optimise executable size (-O1) … NOT SPEED as it is by default). Plus folder size of ltcg-enabled Qt library became 3.5 times larger. Plus FULL rebuild of application became 3 times longer.
Probably ltcg gives some benefits for MinGW (because mingw’s executable size is 1.8-2 times latger then msvc’s ones … so there is “more space” to be better). I did not tested ltcg on mingw yet.
But please make real testing on platforms before putting resources in making ltcg the default for Qt on these platforms.
P.S.
It would be nice to have some open-source examples recognized as test cases for different Qt-program types … like Widget-based, Qml-based and so on as typical application of that type so that community members can compare results in more precise manner

Nick says:

> I did not tested ltcg on mingw yet.

I just tried a static build of 5.12.0 on MXE (mingw on Linux). The Qt build process fails when linking libQt5Core.a with “undefined reference to” errors for qUnregisterResourceData, qRegisterStaticPluginFunction, and others.

I also tried a static iOS build of Qt, and that works fine.

Nick says:

Nope, strike that about mingw. It’s one of the Qt tests of MXE that fail. Need to dig some more.

Commenting closed.

Get started today with Qt Download now