I was tracing a bug in sdbusplus found by OpenBMC CI, it leads me into the code in systemd, and eventually get me into GCC.
To get a short introduction, refer to https://lists.ozlabs.org/pipermail/openbmc/2019-December/019884.html
Here is the full story of the investigation.
The sdbusplus CI issue
A CI issue is found in sdbusplus, that the valgrind reports the below error.
==5290== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==5290== at 0x4F2FB08: epoll_ctl (syscall-template.S:79)
==5290== by 0x493A8F7: UnknownInlinedFun (sd-event.c:961)
==5290== by 0x493A8F7: sd_event_add_time (sd-event.c:1019)
==5290== by 0x190BB3: phosphor::Timer::Timer(sd_event*, std::function) (timer.hpp:62)
==5290== by 0x192B93: TimerTest::TimerTest() (timer.cpp:25)
==5290== by 0x193A13: TimerTest_timerExpiresAfter2seconds_Test::TimerTest_timerExpiresAfter2seconds_Test() (timer.cpp:85)
...
==5290== by 0x4A90917: main (gmock_main.cc:69)
==5290== Address 0x1fff00eafc is on thread 1's stack
==5290== in frame #0, created by epoll_ctl (syscall-template.S:78)
==5290==
Clearly, valgrind detects that some uninitialized data is used.
However, this issue is not 100% reproduced, it only occurs sometimes, how could that be?
At a first glance, it’s sdbusplus
‘s Timer
class that invokes sd_event_add_time()
from libsystemd
, and eventually invokes epoll_ctl()
.
So I would suspect something may be wrong in Timer
that it may pass uninitialized data to sd_event_add_time()
.
Investigation in sdbusplus
Let’s see the related code.
void initialize()
{
...
auto r = sd_event_add_time(
event, &eventSource,
CLOCK_MONOTONIC, // Time base
UINT64_MAX, // Expire time - way long time
0, // Use default event accuracy
[](sd_event_source* eventSource, uint64_t usec, void* userData) {
auto timer = static_cast<Timer*>(userData);
return timer->timeoutHandler();
}, // Callback handler on timeout
this); // User data
...
}
The event
is a pointer that is already initialized;
The eventSource
is the out-parameter.
Others are just simple data or a lambda, nothing suspicious.
Investigation in libsystemd
So let’s dive into libsystemd to see what exactly happens.
The related code is in sd_event_add_time().
_public_ int sd_event_add_time(sd_event *e, ...) {
...
if (d->fd < 0) {
r = event_setup_timer_fd(e, d, clock);
if (r < 0)
return r;
}
...
}
Where:
e
is sd_event*
, and clock
is clockid_t
, both are passed into this function
d
is struct clock_data*
initialized in this function So nothing is wrong.
Let’s see event_setup_timer_fd() then.
static int event_setup_timer_fd(...) {
struct epoll_event ev;
...
ev = (struct epoll_event) {
.events = EPOLLIN,
.data.ptr = d,
};
r = epoll_ctl(e->epoll_fd, EPOLL_CTL_ADD, fd, &ev);
...
}
The epoll_fd
, fd
, and ev
, are all initialized, are they?
Let’s see how epoll_ctl
is implemented in kernel source.
SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
struct epoll_event __user *, event)
{
...
if (ep_op_has_event(op) &&
copy_from_user(&epds, event, sizeof(struct epoll_event)))
goto error_return;
...
}
Be noted that valgrind says Syscall param epoll_ctl(event) points to uninitialised byte(s)
, and here we do see that it’s copying the whole event
struct from userspace, which means the event
contains uninitialized bytes.
Let’s go back to see how the event
struct is initialized:
ev = (struct epoll_event) {
.events = EPOLLIN,
.data.ptr = d,
};
And let’s see how the struct is defined in glibc.
typedef union epoll_data
{
void *ptr;
int fd;
uint32_t u32;
uint64_t u64;
} epoll_data_t;
struct epoll_event
{
uint32_t events; /* Epoll events */
epoll_data_t data; /* User data variable */
} __EPOLL_PACKED;
Hmm, the events
is a uint32_t
and is initialized, and the data
is initialized as well, it looks fine…
Is it really fine?
data
is a union and should be at least 64 bits, while events
is uint32_t
, which is 32 bits, there could be padding inside the epoll_event
padding if it’s not packed.
Hey, there is __EPOLL_PACKED
… let’s grep it in glibc:
$ gg __EPOLL_PACKED
ChangeLog.old/ChangeLog.18: (__EPOLL_PACKED): Define to empty if not defined by
ChangeLog.old/ChangeLog.18: (struct epoll_event): Use __EPOLL_PACKED to make possibly packed.
sysdeps/unix/sysv/linux/sys/epoll.h:#ifndef __EPOLL_PACKED
sysdeps/unix/sysv/linux/sys/epoll.h:# define __EPOLL_PACKED
sysdeps/unix/sysv/linux/sys/epoll.h:} __EPOLL_PACKED;
sysdeps/unix/sysv/linux/x86/bits/epoll.h:#define __EPOLL_PACKED __attribute__ ((__packed__))
It is defined to __attribute__ ((__packed__))
for x86, and not defined for other architectures.
Remember that the issue is not 100% reproduced, right?
The OpenBMC CI backend has both x86-64 and ppc64le servers, so we could guess that the padding causes the valgrind error, and it only happens on ppc64le but not on x86-64, because on x86-64 there is no padding at all.
From the CI log, it confirms that the guess is correct: the issue only occurs on the ppc64le CI server!
So let’s go back to the question code:
ev = (struct epoll_event) {
.events = EPOLLIN,
.data.ptr = d,
};
It’s using GCC’s Designated Initializers extension
I tried to google how GCC initialize the padding, there are discussions in StackOverlows and blogs, e.g. according to https://stackoverflow.com/questions/37642026/does-c-initialize-structure-padding-to-zero, it looks like the case:
padding for the remaining objects are guaranteed to be 0, but not for the members which has received the initializers.
But I do not see an official GCC doc talking about this.
Let’s do some experiments.
Testing in GCC
Giving below demo code to test how the padding is initialized:
#include <string.h>
#include <stdint.h>
#include <stdio.h>
struct struct_with_padding {
uint32_t a;
uint64_t b;
uint32_t c;
};
int main()
{
struct struct_with_padding s;
memset(&s, 0xff, sizeof(s));
s = (struct struct_with_padding) {
.a = 0xaaaaaaaa,
.b = 0xbbbbbbbbbbbbbbbb,
#ifdef SHOW_GCC_BUG
.c = 0xdddddddd,
#endif
};
uint8_t* p8 = (uint8_t*)(&s);
printf("data: ");
for (size_t i = 0; i < sizeof(s); ++i)
{
printf("0x%02x ", p8[i]);
}
printf("\n");
return 0;
}
Compile with or without SHOW_GCC_BUG
, the result is different:
$ gcc -o test_padding test_padding.c
$ ./test_padding
data: 0xaa 0xaa 0xaa 0xaa 0x00 0x00 0x00 0x00 0xbb 0xbb 0xbb 0xbb 0xbb 0xbb 0xbb 0xbb 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
$ gcc -DSHOW_GCC_BUG -o test_padding test_padding.c
$ ./test_padding
data: 0xaa 0xaa 0xaa 0xaa 0xff 0xff 0xff 0xff 0xbb 0xbb 0xbb 0xbb 0xbb 0xbb 0xbb 0xbb 0xdd 0xdd 0xdd 0xdd 0xff 0xff 0xff 0xff
GCC behaves like:
- If a struct is partial initialized, the all the padding are initialized to zero;
- If a struct is fully initialized, the padding remains the old data. This is exactly what happens in my case!
How about clang?
$ clang -o test_padding test_padding.c
$ ./test_padding
data: 0xaa 0xaa 0xaa 0xaa 0x00 0x00 0x00 0x00 0xbb 0xbb 0xbb 0xbb 0xbb 0xbb 0xbb 0xbb 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
$ clang -DSHOW_GCC_BUG -o test_padding test_padding.c
$ ./test_padding
data: 0xaa 0xaa 0xaa 0xaa 0x00 0x00 0x00 0x00 0xbb 0xbb 0xbb 0xbb 0xbb 0xbb 0xbb 0xbb 0xdd 0xdd 0xdd 0xdd 0x00 0x00 0x00 0x00
OK, clang initializes the padding in both cases, good!
While I tried to file a bug to GCC, I found an exact same bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77992, which is reported in 2016, and it looks like GCC is not going to fix it…
Follow-ups
- I tried to send a PR to systemd as a workaround that manually zero initialize the
struct epoll_event
.
It’s under discussion, but likely the maintainer of systemd (@poettering) will not accept the PR, because it’s not really a systemd bug, instead, @poettering treats it a Valgrind bug (if not a GCC bug).
- Although I do not think it’s a Valgrind bug, a bug is filed to https://bugs.kde.org/show_bug.cgi?id=415621, there is no further feedback yet.
- Without GCC or systemd or valgrind’s fix, I have to add a valgrind suppression to OpenBMC CI https://gerrit.openbmc-project.xyz/c/openbmc/sdbusplus/+/25548, problem solved.
Summary
- GCC has a bug of not initializing the padding.
- systemd hits the bug on initializing
struct epoll_event
on non-x86 systems.
- OpenBMC CI has both x86-64 and ppc64le systems. If a CI is run on ppc64le, the issue occurs.
- Adding a valgrind suppression file fixes (or workarounds) the issue.