LLVM 11.0.0 Released - Here are some highlights for C/C++ developers

LLVM 11.0.0 Released - Here are some highlights for C/C++ developers

On Monday 12th of October 2020 LLVM 11.0.0 was released. You can read the release notice from release-manager Hans Wennborg here.

The release notes are as usual very comprehensive and it can be hard to find the things you really want to know. So I thought I would go over some of the highlights for this release when it comes to C/C++ developers and toolchain aficionados. This won't cover the internals of LLVM that much - more about using Clang/LLVM as a C/C++ developer.

Core changes

Compile-time Performance

If you have been building larger C++ code bases with Clang you might have noticed that LLVM 10 actually ended up having a pretty big performance regression compared to LLVM 9. This has been rectified in LLVM 11 thanks to the really great work by the Rust community. The details of that speed-up can be found here.

To test this I did a recompile of the latest OpenCV master in Release configuration on my ThreadRipper 1950X running latest version of Clear Linux:

I ran ninja twice and averaged the number below since filesystem cache can be a thing at times.

  • LLVM 9: 221 seconds
  • LLVM 10: 231 seconds
  • LLVM 11: 210 seconds

As you see out of the box LLVM 11 is the fastest recent release with a 10% faster compile time compared to LLVM 10 and even 5% faster than 9.

There are also other options that can make LLVM 11 even faster. We will talk about that in the Clang section below.

Better debugging for optimized binaries

LLVM can now emit the DW_OP_entry_value instruction for debugging information. This allows the debugger to get more information about local variables that has been removed by the optimizer. This will make debugging better with optimized binaries.

Take this example program:

#include <iostream>
#include <string>

void test_function(const std::string &foo, bool bar)
{
  std::cout << foo << std::endl;
}

int main(int argc, char** argv)
{
  test_function("hello world", false);
}

When compiled with LLVM like this: clang++ -o dwp_op_entry_value dwp_op_entry_value.cpp -O3 -g

Then run lldb and set a breakpoint in test_function, you can't print what bar was in this case because it was optimized out.

error: Couldn't materialize: couldn't get the value of variable bar: no location, value may have been optimized out
error: errored out in DoExecute, couldn't PrepareToExecuteJITExpression

The promise with LLVM 11 is that this will not be the case anymore. Unfortunately I was not able to get this to work. Neither in gdb nor lldb. I wonder if the debuggers themselves haven't added support for this at this point. I am sure someone will enlighten me what's missing here.

Python 3 migration has started.

If you are building LLVM/Clang yourself you might want to start to migrate your build machines to Python 3. Python 2 is not deprecated yet - that will happen in LLVM 12, but the build system will now prefer python3 over python2 in your path.

Clang changes

-fno-common is now the default

When compiling C code with Clang 11 you might get linker errors like this:

other.o:(.bss+0x0): multiple definition of `hello_global'; main.o:(.bss+0x0): first defined here

This is because Clang changed so that -fno-common is passed by default (the former default was -fcommon). What does this do? I plugged this into Compiler Explorer and you can see that with -fno-common the global variable hello is now a symbol in the assembler.

So if you compile two TU's including a header that defines a global variable you need to make sure to mark it as extern or static.

This change is following a similar change in GCC - the GCC bug tracker has some good notes on the change.

Here is a small example:

globals.h

#pragma once
int hello_global;

other.c

#include "globals.h"
#include <stdio.h>

void test() {
        hello_global = 1;
        fprintf(stderr, "%d\n", hello_global);
}

main.c

#include "globals.h"

int main(int argc, char** argv)
{
  return 0;
}

With -fno-common this leads to the duplicate symbol linker error above. Passing -fcommon makes it work or change the following files to this:

globals.h

#pragma once
extern int hello_global;

other.c

#include "globals.h"
#include <stdio.h>

int hello_global;

void test() {
        hello_global = 1;
        fprintf(stderr, "%d\n", hello_global);
}

Recovery AST

The AST of Clang can now include more metadata about errors and will in the future lead to better error handling. Here is the test case used in LLVM to test this feature:

int call(int);

void test1(int s) {
  s = call();
  (float)call();
}

If we inspect the AST with the -dump-ast argument to clang 10:

TranslationUnitDecl 0x11ac778 <<invalid sloc>> <invalid sloc>
|-TypedefDecl 0x11ad050 <<invalid sloc>> <invalid sloc> implicit __int128_t '__int128'
| `-BuiltinType 0x11acd10 '__int128'
|-TypedefDecl 0x11ad0c0 <<invalid sloc>> <invalid sloc> implicit __uint128_t 'unsigned __int128'
| `-BuiltinType 0x11acd30 'unsigned __int128'
|-TypedefDecl 0x11ad438 <<invalid sloc>> <invalid sloc> implicit __NSConstantString '__NSConstantString_tag'
| `-RecordType 0x11ad1b0 '__NSConstantString_tag'
|   `-CXXRecord 0x11ad118 '__NSConstantString_tag'
|-TypedefDecl 0x11ad4d0 <<invalid sloc>> <invalid sloc> implicit __builtin_ms_va_list 'char *'
| `-PointerType 0x11ad490 'char *'
|   `-BuiltinType 0x11ac810 'char'
|-TypedefDecl 0x11ea4f8 <<invalid sloc>> <invalid sloc> implicit __builtin_va_list '__va_list_tag [1]'
| `-ConstantArrayType 0x11ea4a0 '__va_list_tag [1]' 1
|   `-RecordType 0x11ad5c0 '__va_list_tag'
|     `-CXXRecord 0x11ad528 '__va_list_tag'
|-FunctionDecl 0x11ea630 <recovery-ast.cpp:1:1, col:13> col:5 call 'int (int)'
| `-ParmVarDecl 0x11ea568 <col:10> col:13 'int'
`-FunctionDecl 0x11ea7f8 <line:3:1, line:6:1> line:3:6 test1 'void (int)'
  |-ParmVarDecl 0x11ea738 <col:12, col:16> col:16 used s 'int'
  `-CompoundStmt 0x11ea950 <col:19, line:6:1>

And then compare it to the AST of Clang 11

TranslationUnitDecl 0x88f5c98 <<invalid sloc>> <invalid sloc>
|-TypedefDecl 0x88f6598 <<invalid sloc>> <invalid sloc> implicit __int128_t '__int128'
| `-BuiltinType 0x88f6230 '__int128'
|-TypedefDecl 0x88f6608 <<invalid sloc>> <invalid sloc> implicit __uint128_t 'unsigned __int128'
| `-BuiltinType 0x88f6250 'unsigned __int128'
|-TypedefDecl 0x88f6990 <<invalid sloc>> <invalid sloc> implicit __NSConstantString '__NSConstantString_tag'
| `-RecordType 0x88f66f0 '__NSConstantString_tag'
|   `-CXXRecord 0x88f6660 '__NSConstantString_tag'
|-TypedefDecl 0x88f6a38 <<invalid sloc>> <invalid sloc> implicit __builtin_ms_va_list 'char *'
| `-PointerType 0x88f69f0 'char *'
|   `-BuiltinType 0x88f5d30 'char'
|-TypedefDecl 0x8935540 <<invalid sloc>> <invalid sloc> implicit __builtin_va_list '__va_list_tag [1]'
| `-ConstantArrayType 0x89354e0 '__va_list_tag [1]' 1
|   `-RecordType 0x88f6b20 '__va_list_tag'
|     `-CXXRecord 0x88f6a90 '__va_list_tag'
|-FunctionDecl 0x8935688 <recovery-ast.cpp:1:1, col:13> col:5 call 'int (int)'
| `-ParmVarDecl 0x89355b0 <col:10> col:13 'int'
`-FunctionDecl 0x8935860 <line:3:1, line:6:1> line:3:6 test1 'void (int)'
  |-ParmVarDecl 0x8935790 <col:12, col:16> col:16 used s 'int'
  `-CompoundStmt 0x8935a68 <col:19, line:6:1>
    |-BinaryOperator 0x8935998 <line:4:3, col:12> '<dependent type>' contains-errors '='
    | |-DeclRefExpr 0x8935908 <col:3> 'int' lvalue ParmVar 0x8935790 's' 'int'
    | `-RecoveryExpr 0x8935970 <col:7, col:12> '<dependent type>' contains-errors lvalue
    |   `-UnresolvedLookupExpr 0x8935928 <col:7> '<overloaded function type>' lvalue (ADL) = 'call' 0x8935688
    `-CStyleCastExpr 0x8935a40 <line:5:3, col:15> 'float' contains-errors <Dependent>
      `-RecoveryExpr 0x8935a00 <col:10, col:15> '<dependent type>' contains-errors lvalue
        `-UnresolvedLookupExpr 0x89359b8 <col:10> '<overloaded function type>' lvalue (ADL) = 'call' 0x8935688

As you can see the AST now contains even more information around the actual error. It can look back at the declaration of the error etc. This doesn't have that much practical difference yet because it seems like the clang front end doesn't use this data in the actual warning. But in the future it will hopefully lead to better diagnostics for complicated errors especially in C++.

New warnings

As is normal with every release Clang has added more diagnostics. This time we have two flags that stand out:

-Wpointer-to-int-cast

This warning makes sure you won't cast a pointer to a int when the pointer can contain a bigger value than your int. For example the int type is usually a 32 bit integer while on x86_64 a pointer is 64 bit. So when casting a pointer to a 32 bit int you can lose information due to truncation.

Here is a dumb example that will warn now with clang 11:

#include <stdio.h>

int main(int argc, char** argv){
        void* hello = NULL;
        int bar = (int)hello;
}
pointer_to_int_cast.c:5:12: warning: cast to smaller integer type 'int' from 'void *' [-Wvoid-pointer-to-int-cast]
        int bar = (int)hello;
                  ^~~~~~~~~~

Solution is to cast your pointer to a int64_t instead (or you know - don't cast your pointer at all?).

Update: Arno Lepisk pointed out that intptr_t is the best type to use if you need to cast your pointer to an int.

-Wuninitialized-const-reference

A new warning that will make sure you don't pass a uninitialized const reference to a method. Before we could get warnings from the -Wuninitialized like this:

int foobar(int i) { }

int i;
foobar(i);
uninitialized-const-reference.cpp:14:11: warning: variable 'i' is uninitialized when used here [-Wuninitialized]
  foobar(i);
          ^
uninitialized-const-reference.cpp:13:8: note: initialize the variable 'i' to silence this warning
  int i;
       ^
        = 0

Which is great - because passing uninitialized variables around can be really bad since it's undefined behavior.

Now this didn't work if foobar took a const& int instead:

int foobar(const int& i) {}

int i;
foobar(i);

No warning!

But with clang 11 it will say this instead:

uninitialized-const-reference.cpp:15:10: warning: variable 'i' is uninitialized when passed as a const reference argument here [-Wuninitialized-const-reference]
  foobar(i);

Support for -Oz, -Os and -Og in LTO

When using -flto=thin/full before clang 11 it didn't like if you used -Os/z/g it just gave an error. This fortunately fixed now and you can pass these optimization flags to the linker as well.

Instantiate templates in PCH files

This is a nice one! If you use PCH files and use a lot of templates in your codebase this flag can really speed things up. But before we can talk about this flag we need to understand how C++ templates are instatiated.

Let's consider this code:

template<class T> T add (T a, T b)
{
  return a + b;
}

int main()
{
  int i = add<int>(2, 3);
  long l = add<long>(5, 5);
}

When the compiler compiles this code it will take the template definition and expand it into all the different versions needed. This can be seen by going back to the AST again:

clang++ -Xclang -ast-print -fsyntax-only templates.cpp

template <class T> T add(T a, T b) {
    return a + b;
}
template<> int add<int>(int a, int b) {
    return a + b;
}
template<> long add<long>(long a, long b) {
    return a + b;
}
int main() {
    int i = add<int>(2, 3);
    long l = add<long>(5, 5);
}

Now you see that the compiler added versions of the function with long and int arguments.

Templates in C++ are a great way to generate a lot of similar functions, but they are also very expensive for the compiler to compile since it has to compile them for each instance. The standard library uses a lot of templates so just including some simple headers will easily add a lot of template instatiations.

In Clang 11 we will get a new option to reduce the time that's spent on creating all these instances with the help of the precompiled header. If you pass -fpch-instantiate-templates to Clang when you create the precompiled header the template instantiation will happen directly and stored in the PCH instead of happening once per translation unit.

If your project is using CMake it's actually really simple to turn this on and it has no real downsides. You can either upgrade to CMake 3.19 which will turn this on by default if you using Clang 11 or newer. See the documentation on PCH_INSTANTIATE_TEMPLATES

If you are using older versions of CMake you can do this:

list(APPEND CMAKE_CXX_COMPILE_OPTIONS_CREATE_PCH -fpch-instantiate-templates)

In our code base this gave us around a 15% compile time boost. So I highly recommend you look into enabling this.

Template codegen in the PCH

There is yet another option to do codegen in the PCH as well. That means that the template instances and other things are compiled into a object in the PCH phase and then you have to link to this in the link phase.

This is a bit more complicated to integrate into your build system, but can once again give a boost to compile speed.

There are discussions on the CMake bug tracker on how to integrate this. That's a good place to find information on how to enable this until we get upstream CMake support.

Whole program de-virtualization

Added in Clang 11 is a feature that is meant to be used together with LTO. The flag is called -fwhole-program-vtables and will try to de-virtualize your whole program instead of a per translation unit basis.

This is important because of how much more efficient de-virtualized code is. You can see an example of de-virtualization on compiler explorer, just compare the bottom assembler output (de-virtualized) with the top one. And read more about de-virtualization in this article by Marco Foco.

What this flag does in conjuction with -flto=thin/full is that it allows the compiler to de-virtualize even more functions across the whole application instead of just when compiling a single TU.

As far as I know there are no downsides to using this but only upsides.

Update: Hans pointed out on twitter that this flag isn't actually new. Instead how it interact with LTO Visibility is new.

Many many other changes

There are so many more changes here that I can't go through them all, but I wanted to highlight a few very quickly:

  • The LLVM linker LLD saw huge updates during the development cycle and I think it can be considered to be production safe at this point. LLD Release Notes
  • Clangd also saw a lot of changes and it seems a lot more usable and robust now. clang-tools-extra Release Notes
  • According to the release notes for libc++ nothing much happened during this cycle. Basically only adding <numbers> from C++20. But looking at the commit log there seems to be a lot of changes that was not documented.

Hope you enjoyed this deeper dive into LLVM 11. Let me know if this is something I should continue to do for future releases, and if there is something you would like to see a even deeper dive into.

Thanks to Tamàs Szelei, Fredrik Björeman and Björn Fahller for proof-reading this article!

Thanks to Alex Stevenson-Price for helping me with pixels!