Tag Archives: c++

From a CI bug to systemd, to GCC

I was tracing a bug in sdbusplus found by OpenBMC CI, it leads me into the code in systemd, and eventually get me into GCC.
To get a short introduction, refer to https://lists.ozlabs.org/pipermail/openbmc/2019-December/019884.html
Here is the full story of the investigation.

The sdbusplus CI issue

A CI issue is found in sdbusplus, that the valgrind reports the below error.

==5290== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==5290== at 0x4F2FB08: epoll_ctl (syscall-template.S:79)
==5290== by 0x493A8F7: UnknownInlinedFun (sd-event.c:961)
==5290== by 0x493A8F7: sd_event_add_time (sd-event.c:1019)
==5290== by 0x190BB3: phosphor::Timer::Timer(sd_event*, std::function) (timer.hpp:62)
==5290== by 0x192B93: TimerTest::TimerTest() (timer.cpp:25)
==5290== by 0x193A13: TimerTest_timerExpiresAfter2seconds_Test::TimerTest_timerExpiresAfter2seconds_Test() (timer.cpp:85)
==5290== by 0x4A90917: main (gmock_main.cc:69)
==5290== Address 0x1fff00eafc is on thread 1's stack
==5290== in frame #0, created by epoll_ctl (syscall-template.S:78)

Clearly, valgrind detects that some uninitialized data is used.
However, this issue is not 100% reproduced, it only occurs sometimes, how could that be?

At a first glance, it’s sdbusplus‘s Timer class that invokes sd_event_add_time() from libsystemd, and eventually invokes epoll_ctl().
So I would suspect something may be wrong in Timer that it may pass uninitialized data to sd_event_add_time().

Investigation in sdbusplus

Let’s see the related code.

void initialize()
    auto r = sd_event_add_time(
        event, &eventSource,
        CLOCK_MONOTONIC, // Time base
        UINT64_MAX,      // Expire time - way long time
        0,               // Use default event accuracy
        [](sd_event_source* eventSource, uint64_t usec, void* userData) {
            auto timer = static_cast<Timer*>(userData);
            return timer->timeoutHandler();
        },     // Callback handler on timeout
        this); // User data

The event is a pointer that is already initialized;
The eventSource is the out-parameter.
Others are just simple data or a lambda, nothing suspicious.

Investigation in libsystemd

So let’s dive into libsystemd to see what exactly happens.
The related code is in sd_event_add_time().

_public_ int sd_event_add_time(sd_event *e, ...) {
    if (d->fd < 0) {
        r = event_setup_timer_fd(e, d, clock);
        if (r < 0)
            return r;


  • e is sd_event*, and clock is clockid_t, both are passed into this function
  • d is struct clock_data* initialized in this function So nothing is wrong.

Let’s see event_setup_timer_fd() then.

static int event_setup_timer_fd(...) {
    struct epoll_event ev;
    ev = (struct epoll_event) {
        .events = EPOLLIN,
        .data.ptr = d,
    r = epoll_ctl(e->epoll_fd, EPOLL_CTL_ADD, fd, &ev);

The epoll_fd, fd, and ev, are all initialized, are they?

Let’s see how epoll_ctl is implemented in kernel source.

SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
        struct epoll_event __user *, event)
    if (ep_op_has_event(op) &&
        copy_from_user(&epds, event, sizeof(struct epoll_event)))
        goto error_return;

Be noted that valgrind says Syscall param epoll_ctl(event) points to uninitialised byte(s), and here we do see that it’s copying the whole event struct from userspace, which means the event contains uninitialized bytes.
Let’s go back to see how the event struct is initialized:

    ev = (struct epoll_event) {
        .events = EPOLLIN,
        .data.ptr = d,

And let’s see how the struct is defined in glibc.

typedef union epoll_data
  void *ptr;
  int fd;
  uint32_t u32;
  uint64_t u64;
} epoll_data_t;

struct epoll_event
  uint32_t events;  /* Epoll events */
  epoll_data_t data;    /* User data variable */

Hmm, the events is a uint32_t and is initialized, and the data is initialized as well, it looks fine…
Is it really fine?
data is a union and should be at least 64 bits, while events is uint32_t, which is 32 bits, there could be padding inside the epoll_event padding if it’s not packed.
Hey, there is __EPOLL_PACKED… let’s grep it in glibc:

ChangeLog.old/ChangeLog.18:     (__EPOLL_PACKED): Define to empty if not defined by
ChangeLog.old/ChangeLog.18:     (struct epoll_event): Use __EPOLL_PACKED to make possibly packed.
sysdeps/unix/sysv/linux/sys/epoll.h:#ifndef __EPOLL_PACKED
sysdeps/unix/sysv/linux/sys/epoll.h:# define __EPOLL_PACKED
sysdeps/unix/sysv/linux/sys/epoll.h:} __EPOLL_PACKED;
sysdeps/unix/sysv/linux/x86/bits/epoll.h:#define __EPOLL_PACKED __attribute__ ((__packed__))

It is defined to __attribute__ ((__packed__)) for x86, and not defined for other architectures.
Remember that the issue is not 100% reproduced, right?
The OpenBMC CI backend has both x86-64 and ppc64le servers, so we could guess that the padding causes the valgrind error, and it only happens on ppc64le but not on x86-64, because on x86-64 there is no padding at all.
From the CI log, it confirms that the guess is correct: the issue only occurs on the ppc64le CI server!

So let’s go back to the question code:

    ev = (struct epoll_event) {
        .events = EPOLLIN,
        .data.ptr = d,

It’s using GCC’s Designated Initializers extension
I tried to google how GCC initialize the padding, there are discussions in StackOverlows and blogs, e.g. according to https://stackoverflow.com/questions/37642026/does-c-initialize-structure-padding-to-zero, it looks like the case:

padding for the remaining objects are guaranteed to be 0, but not for the members which has received the initializers.

But I do not see an official GCC doc talking about this.
Let’s do some experiments.

Testing in GCC

Giving below demo code to test how the padding is initialized:

#include <string.h>
#include <stdint.h>
#include <stdio.h>

struct struct_with_padding {
        uint32_t a;
        uint64_t b;
        uint32_t c;
int main()
        struct struct_with_padding s;
        memset(&s, 0xff, sizeof(s));
        s = (struct struct_with_padding) {
                .a = 0xaaaaaaaa,
                .b = 0xbbbbbbbbbbbbbbbb,
                .c = 0xdddddddd,
        uint8_t* p8 = (uint8_t*)(&s);
        printf("data: ");
        for (size_t i = 0; i < sizeof(s); ++i)
                printf("0x%02x ", p8[i]);
        return 0;

Compile with or without SHOW_GCC_BUG, the result is different:

$ gcc -o test_padding test_padding.c
$ ./test_padding
data: 0xaa 0xaa 0xaa 0xaa 0x00 0x00 0x00 0x00 0xbb 0xbb 0xbb 0xbb 0xbb 0xbb 0xbb 0xbb 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00

$ gcc -DSHOW_GCC_BUG -o test_padding test_padding.c
$ ./test_padding
data: 0xaa 0xaa 0xaa 0xaa 0xff 0xff 0xff 0xff 0xbb 0xbb 0xbb 0xbb 0xbb 0xbb 0xbb 0xbb 0xdd 0xdd 0xdd 0xdd 0xff 0xff 0xff 0xff

GCC behaves like:

  • If a struct is partial initialized, the all the padding are initialized to zero;
  • If a struct is fully initialized, the padding remains the old data. This is exactly what happens in my case!

How about clang?

$ clang -o test_padding test_padding.c
$ ./test_padding
data: 0xaa 0xaa 0xaa 0xaa 0x00 0x00 0x00 0x00 0xbb 0xbb 0xbb 0xbb 0xbb 0xbb 0xbb 0xbb 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00

$ clang -DSHOW_GCC_BUG -o test_padding test_padding.c
$ ./test_padding
data: 0xaa 0xaa 0xaa 0xaa 0x00 0x00 0x00 0x00 0xbb 0xbb 0xbb 0xbb 0xbb 0xbb 0xbb 0xbb 0xdd 0xdd 0xdd 0xdd 0x00 0x00 0x00 0x00

OK, clang initializes the padding in both cases, good!

While I tried to file a bug to GCC, I found an exact same bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77992, which is reported in 2016, and it looks like GCC is not going to fix it…


  1. I tried to send a PR to systemd as a workaround that manually zero initialize the struct epoll_event.
    It’s under discussion, but likely the maintainer of systemd (@poettering) will not accept the PR, because it’s not really a systemd bug, instead, @poettering treats it a Valgrind bug (if not a GCC bug).
  2. Although I do not think it’s a Valgrind bug, a bug is filed to https://bugs.kde.org/show_bug.cgi?id=415621, there is no further feedback yet.
  3. Without GCC or systemd or valgrind’s fix, I have to add a valgrind suppression to OpenBMC CI https://gerrit.openbmc-project.xyz/c/openbmc/sdbusplus/+/25548, problem solved.


  • GCC has a bug of not initializing the padding.
  • systemd hits the bug on initializing struct epoll_event on non-x86 systems.
  • OpenBMC CI has both x86-64 and ppc64le systems. If a CI is run on ppc64le, the issue occurs.
  • Adding a valgrind suppression file fixes (or workarounds) the issue.




这个文件只是定义了一个嵌套了几层的std::map,而且,如果用GCC 4.8来编译,很快就编译完了,占的内存也不多。
然而,如果用GCC 6.2 (或者最新的7.1)来编译,会发现占用的时间非常长,而且占用内存特别多,最终(如果内存不够)会报internal compiler error: Killed的错。


在一个动态库里,遇到一个std::unique_ptr相关的segment fault。这个变量在constructor里明明创建过了,但是运行时发现它还是nullptr

std::unique_ptr<MyStruct> s = nullptr;
void onLoad() __attribute__((constructor));
void onLoad()
  s = std::make_unique<MyStruct>();

当这个动态库被dlopen的时候,我期望s会被onLoad()初始化好,然而在别的函数里用这个s的时候,会segment fault,因为s里面是个空指针!

简单的说,动态库里 constructor 和 global/static 变量的初始化顺序是unspecified,所以有可能onLoad()先执行,然后再初始化global变量s, 把它初始化成nullptr。注意这个时候只是初始化,原来的变量并不会被析构,其实也是内存泄露。

std::unique_ptr<MyStruct> s __attribute__((init_priority(101))) = nullptr;

注意init_priority的值设为101~65534都可以。 0~100是reserved,用了会有warning; 最大值是65535,也是默认值,要是设成65535,就和__attribute__((constructor))一样,顺序就变成unspecified了。


关于动态链接库(dynamic library)里static变量和函数的问题


  1. link(dlopen)时报错,变量重定义
  2. link(dlopen)时没错,执行时用用同一个变量
  3. link(dlopen)时没错,执行时分别有不同的变量
  4. link(dlopen)时报错,未定义的变量



    common.cpp  // 生成libcommon.a
    dynami1.cpp  // 生成libdynamic1.so
    dynamic2.cpp // 生成libdynamic2.so
    main.cpp  // 生成main


    g++  -g -Wall -Wextra  -c -o common.o common.cpp
    ar rcs libcommon.a common.o
  • Case 0: 动态链接库不link .a,main直接link .so,生成main的时候也不link .a
        g++ -g -Wall -Wextra -fPIC -shared -o libdynamic1.so dynamic1.cpp
        g++ -g -Wall -Wextra -fPIC -shared -o libdynamic2.so dynamic2.cpp
        g++ -DDIRECT_CALL_SO -o main main.cpp -L. -ldynamic1 -ldynamic2


  • Case 1: 动态链接库不link .a,main直接link .so,生成main的时候link .a
        g++ -g -Wall -Wextra -fPIC -shared -o libdynamic1.so dynamic1.cpp
        g++ -g -Wall -Wextra -fPIC -shared -o libdynamic2.so dynamic2.cpp
        g++ -DDIRECT_CALL_SO -o main main.cpp -L. -ldynamic1 -ldynamic2 -lcommon
        LD_LIBRARY_PATH=. ./main


  • Case 2: 动态链接库link .a,main直接link .so,生成main的时候无所谓要不要link .a
        g++ -g -Wall -Wextra -fPIC -shared -o libdynamic1.so dynamic1.cpp -L. -lcommon
        g++ -g -Wall -Wextra -fPIC -shared -o libdynamic2.so dynamic2.cpp -L. -lcommon
        g++ -DDIRECT_CALL_SO -o main main.cpp -L. -ldynamic1 -ldynamic2
        LD_LIBRARY_PATH=. ./main


  • Case 3: 动态链接库不link .a,main不链接.so,通过dlopen()调用so,无所谓链接.a
        g++ -g -Wall -Wextra -fPIC -shared -o libdynamic1.so dynamic1.cpp
        g++ -g -Wall -Wextra -fPIC -shared -o libdynamic2.so dynamic2.cpp
        g++ -o main main.cpp -ldl [-L. -lcommon]
        LD_LIBRARY_PATH=. ./main

    执行结果是4,dlopen的时候找不到 GetGlobalStatic()

  • Case 4: 动态链接库link .a,main不链接.so,通过dlopen()调用so(不用RTLD_GLOBAL),无所谓链接.a
        g++ -g -Wall -Wextra -fPIC -shared -o libdynamic1.so dynamic1.cpp -L. -lcommon
        g++ -g -Wall -Wextra -fPIC -shared -o libdynamic2.so dynamic2.cpp -L. -lcommon
        g++ -o main main.cpp -ldl
        LD_LIBRARY_PATH=. ./main


  • Case 5: 动态链接库link .a,main不链接.so,通过dlopen()调用so(使用RTLD_GLOBAL),无所谓链接.a
        g++ -g -Wall -Wextra -fPIC -shared -o libdynamic1.so dynamic1.cpp -L. -lcommon
        g++ -g -Wall -Wextra -fPIC -shared -o libdynamic2.so dynamic2.cpp -L. -lcommon
        g++ -DUSE_RTLD_GLOBAL -o main main.cpp -ldl -L. -lcommon
        LD_LIBRARY_PATH=. ./main



    g++ -DDIRECT_CALL_SO -o main main.cpp dynamic1.cpp dynamic2.cpp common.cpp



  • 对于不同so link的.a里的变量:
    1. 如果是直接link的,总是用同一个变量。仔细想想肯定是这样,否则一定会出现multiple definition
    2. 如果是dlopen,它依赖dlopen()的flag:
      • 如果是RTLD_LOCAL(默认),各个so会使用各自的.a里的变量。
      • 如果用RTLD_GLOBAL,就跟直接link一样,用同一个变量了。
  • 对于不同so里的同名函数:
    1. 通过dynamic link或者dlopen时不会出现变量重定义;
    2. 直接link时的顺序决定了main使用哪个so的实现
    3. dlopen的话,它们有各自的函数


今天有同事问起来关于boost的smart pointer的事情,原因是别人有一段code用了boost::scoped_ptr<>*,review的时候被揪出来说这不符合常理,讨论应该用啥容器。



boost::scoped_ptr someResource[number];
for (int i = 0; i < number; ++i) {
  someResource[i].reset(new T);

很显然,这样的code能编译,能正常工作(只要OtherFunction的实现没问题);只是,真的太不符合common sense了。怎么整?


不过看到瑞典同事有人用boost::ptr_vector,这个新鲜的玩意儿不常见,研究一下,原来是Boost.Pointer Container的一部分,用来保存heap-allocated objects,有放进去的指针会在出了作用域之后自动删除,所以有”own”的语义。

相比起shared_ptr的容器,有各种优点(见上面link里的advantages 1~8)。

  • boost::ptr_vector保存的是“own”的对象;
  • std::vector<boost::shared_ptr<>>保存的对象可以被别人own




#include <boost/ptr_container/ptr_vector.hpp>
#include <boost/shared_ptr.hpp>
#include <vector>
#include <cstdio>
#include <ctime>
#include <stdint.h>

class TSomeData
  int data;
  TSomeData(int d)
    : data(d)
    // Empty

const int TEST_ITERATIONS = 10000000;

typedef std::vector<boost::shared_ptr > TVectorOfShared;
typedef boost::ptr_vector TPtrVector;

int main()
  clock_t start;
  clock_t end;

  start = ::clock();
  TVectorOfShared vectorOfShared;
  for (int i = 0; i < TEST_ITERATIONS; ++i) {
  // Test vector of shared_ptr
    boost::shared_ptr data(new TSomeData(i));
  end = ::clock();
  printf("Vector of shared:\n  Time executed: %u\n",
         static_cast<uint32_t>((end - start) / (CLOCKS_PER_SEC/1000)));

  start = ::clock();
  TPtrVector ptrVector;
  for (int i = 0; i < TEST_ITERATIONS; ++i) {
  // Test ptr_vector
    TSomeData* data = new TSomeData(i);
  end = ::clock();
  printf("PtrVector:\n  Time executed: %u\n",
         static_cast<uint32_t>((end - start) / (CLOCKS_PER_SEC/1000)));
  return 0;

跑一下结果如下(老式T400,跑着Ubuntu 14.04).

# g++ -O0
Vector of shared:
  Time executed: 7227
  Time executed: 1507

# g++ -O2
Vector of shared:
  Time executed: 5090
  Time executed: 731




  for (int i = 0; i < TEST_ITERATIONS; ++i) {
    std::unique_ptr data(new TSomeData(i));


# g++ -O0 -std=c++0x
Vector of Unique:
  Time executed: 5057
Vector of shared:
  Time executed: 4080
  Time executed: 1681

# g++ -O2 -std=c++0x
Vector of Unique:
  Time executed: 835
Vector of shared:
  Time executed: 1794
  Time executed: 743


  • 在-O0下,unique_ptr反而是最慢的,不明白(如果有人知道为什么,请告诉我)
  • 在-O2下,unique_ptr和ptr_vector有着差不多的效率;但是vector<shared_ptr>相比没有c++0x的时候效率也提升了很多!


  • 在合适的语义下,ptr_vector最好(好吧,我更习惯用boost…)
  • 能体会到,c++0x在标准库下有着更好的效率,至于是不是适合项目使用,看项目情况吧
  • 在不用考虑效率的时候(个人觉得,开发中不需要在执行效率上太抠,把优化留到实际使用发现性能之后),vector<shared_ptr>最万能。