Linux Mystery: linux-vdso.so.1
25 Apr 2021Everytime I compile and link a Linux binary, I see a dynamically linked library called
linux-vdso.so.1
.
[austin@localhost]$ cat hello.c
#include <stdio.h>
int main(int argc, char *argv[]) {
printf("Hello, World!\n");
return 0;
}
[austin@localhost]$ gcc -o hello hello.c
[austin@localhost]$ ldd hello
linux-vdso.so.1 (0x00007ffee39dd000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f91baaec000)
/lib64/ld-linux-x86-64.so.2 (0x00007f91bad03000)
The libc and ld-linux I know about, but who is this linux-vdso? There is a man page that describes it.
The "vDSO" (virtual dynamic shared object) is a small shared library that the
kernel automatically maps into the address space of all user-space applications.
Applications usually do not need to concern themselves with these details as the
vDSO is most commonly called by the C library.
...
Why does the vDSO exist at all? There are some system calls the kernel provides
that user-space code ends up using frequently, to the point that such calls can
dominate overall performance. This is due both to the frequency of the call as
well as the context-switch overhead that results from exiting user space and
entering the kernel.
The rest of this documentation is geared toward the curious and/or C library writers
rather than general developers. If you're trying to call the vDSO in your own
application rather than using the C library, you're most likely doing it wrong.
Huh. Neat. We are bringing in kernel functionality into user space in the form of shared object.
The man page goes on to describe that making system calls is expensive because we need to do a context switch to the kernel and back and there are some system calls that could really just be implemented as user space functions and it would save us a lot of time.
This seemed suspicious to me. Isn’t the point of a system call to clearly distinguish between user space code and kernel code? What system calls am I now bringing into my user space?
Well it’s actually just four syscalls (on x86-64).
clock_gettime
getcpu
gettimeofday
time
To verify, I dumped the memory contents of the process where the vdso exists.
[austin@localhost]$ gdb -q ./hello
Reading symbols from ./hello...
(No debugging symbols found in ./hello)
(gdb) b main
Breakpoint 1 at 0x1149
(gdb) r
Starting program: /home/austin/projects/elf-collection/hello
Breakpoint 1, 0x0000555555555149 in main ()
(gdb) info proc map
process 132181
Mapped address spaces:
Start Addr End Addr Size Offset objfile
0x555555554000 0x555555555000 0x1000 0x0 /home/austin/projects/elf-collection/hello
0x555555555000 0x555555556000 0x1000 0x1000 /home/austin/projects/elf-collection/hello
0x555555556000 0x555555557000 0x1000 0x2000 /home/austin/projects/elf-collection/hello
0x555555557000 0x555555558000 0x1000 0x2000 /home/austin/projects/elf-collection/hello
0x555555558000 0x555555559000 0x1000 0x3000 /home/austin/projects/elf-collection/hello
0x7ffff7db4000 0x7ffff7db6000 0x2000 0x0
0x7ffff7db6000 0x7ffff7ddc000 0x26000 0x0 /usr/lib/x86_64-linux-gnu/libc-2.32.so
0x7ffff7ddc000 0x7ffff7f49000 0x16d000 0x26000 /usr/lib/x86_64-linux-gnu/libc-2.32.so
0x7ffff7f49000 0x7ffff7f95000 0x4c000 0x193000 /usr/lib/x86_64-linux-gnu/libc-2.32.so
0x7ffff7f95000 0x7ffff7f96000 0x1000 0x1df000 /usr/lib/x86_64-linux-gnu/libc-2.32.so
0x7ffff7f96000 0x7ffff7f99000 0x3000 0x1df000 /usr/lib/x86_64-linux-gnu/libc-2.32.so
0x7ffff7f99000 0x7ffff7f9c000 0x3000 0x1e2000 /usr/lib/x86_64-linux-gnu/libc-2.32.so
0x7ffff7f9c000 0x7ffff7fa2000 0x6000 0x0
0x7ffff7fc8000 0x7ffff7fcc000 0x4000 0x0 [vvar]
0x7ffff7fcc000 0x7ffff7fce000 0x2000 0x0 [vdso]
0x7ffff7fce000 0x7ffff7fcf000 0x1000 0x0 /usr/lib/x86_64-linux-gnu/ld-2.32.so
0x7ffff7fcf000 0x7ffff7ff3000 0x24000 0x1000 /usr/lib/x86_64-linux-gnu/ld-2.32.so
0x7ffff7ff3000 0x7ffff7ffc000 0x9000 0x25000 /usr/lib/x86_64-linux-gnu/ld-2.32.so
0x7ffff7ffc000 0x7ffff7ffd000 0x1000 0x2d000 /usr/lib/x86_64-linux-gnu/ld-2.32.so
0x7ffff7ffd000 0x7ffff7fff000 0x2000 0x2e000 /usr/lib/x86_64-linux-gnu/ld-2.32.so
0x7ffffffde000 0x7ffffffff000 0x21000 0x0 [stack]
0xffffffffff600000 0xffffffffff601000 0x1000 0x0 [vsyscall]
(gdb) dump binary memory vdso.so 0x7ffff7fcc000 0x7ffff7fce000
(gdb) q
It’s really just an ELF so file!
[austin@localhost]$ file vdso.so
vdso.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=112feb4b14e301806a8eafdcdd804c88bfa191d8, stripped
And here are the functions, as expected:
[austin@localhost]$ objdump -T vdso.so
vdso.so: file format elf64-x86-64
DYNAMIC SYMBOL TABLE:
0000000000000bc0 w DF .text 0000000000000005 LINUX_2.6 clock_gettime
0000000000000b80 g DF .text 0000000000000005 LINUX_2.6 __vdso_gettimeofday
0000000000000bd0 w DF .text 0000000000000060 LINUX_2.6 clock_getres
0000000000000bd0 g DF .text 0000000000000060 LINUX_2.6 __vdso_clock_getres
0000000000000b80 w DF .text 0000000000000005 LINUX_2.6 gettimeofday
0000000000000b90 g DF .text 0000000000000029 LINUX_2.6 __vdso_time
0000000000000b90 w DF .text 0000000000000029 LINUX_2.6 time
0000000000000bc0 g DF .text 0000000000000005 LINUX_2.6 __vdso_clock_gettime
0000000000000000 g DO *ABS* 0000000000000000 LINUX_2.6 LINUX_2.6
0000000000000c30 g DF .text 0000000000000025 LINUX_2.6 __vdso_getcpu
0000000000000c30 w DF .text 0000000000000025 LINUX_2.6 getcpu
It seems strange that the output of ldd
doesn’t show an .so file on the disk to dynamically
load but this is coming from the kernel and disk files are more of a user land thing.
Let’s make sure we aren’t making the syscall by actually using one of those vdso functions.
[austin@localhost]$ cat hello-vdso.c
#include <stdio.h>
#include <sys/time.h>
int main(int argc, char *argv[]) {
struct timeval t;
gettimeofday(&t, NULL);
printf("Seconds: %lu\n", t.tv_sec);
return 0;
}
We can use strace
to see what syscalls are being made. We should not see the gettimeofday
syscall in this case.
[austin@localhost]$ strace ./hello-vdso 2>&1 | grep "gettimeofday\|write"
write(1, "Seconds: 1619316657\n", 20Seconds: 1619316657
We must be using the VDSO version. My next question is to see if I can force the syscall
to happen. I could not find a GNU linker option to turn it off. You can turn it off system wide
using various kernel options, but not “per application” at link time. I also thought
linking statically would do it (since the VDSO shows up in ldd
) but even that didn’t
work. The kernel/glibc really want to make sure I’m using the optimized version!
I was able to do a junk hack to make it happen by statically linking and then using a
hex editor to mangle the string __vdso_gettimeofday
to make it think that the VDSO
version was never loaded.
[austin@localhost]$ strace ./hello-vdso 2>&1 | grep "gettimeofday\|write"
gettimeofday({tv_sec=1619316899, tv_usec=828106}, NULL) = 0
write(1, "Seconds: 1619316899\n", 20Seconds: 1619316899
This was kind of a dumb experiment, but it was a good way to learn about how the kernel and user land interact in ways that most people don’t think about too hard.