Start your free 14-day ContainIQ trial

BTF (BPF Type Format): A Practical Guide

June 24, 2022

Getting to know a new technology, like BTF, is always difficult, but knowing where the right resources are helps a lot. In this article, we explore BTF, or BPF Type Format, with practical tips and examples.

Aniket Bhattacharyea
Software Engineer

BPF is a register-based virtual machine in the Linux kernel that can execute bytecode in a secure, efficient, event-driven manner. Unlike kernel modules, BPF programs are verified to ensure they terminate and don’t contain any loop that could lock up the kernel. The kernel function calls allowed by the program are also restricted to ensure maximum safety against unrestricted access.

Even though BPF offers an efficient solution for writing event-driven kernel space code, the developer experience is not yet comparable to other programming languages or frameworks. Two of the most significant concerns with BPF development are its lack of easy debugging and portability.

To mitigate these issues, we turn to BPF Type Format (BTF). It’s a file format that encodes the type information of a BPF program and provides better introspection and visibility into the program. So let’s take a look at the typical limitations of BPF and how BTF can be used to overcome them.

Note that this article uses the term BPF to mean eBPF (extended Berkeley Packet Filter), which extends the “classic” BPF.

Common Limitations of BPF

During the development and execution of BPF programs, you can often face debugging limitations and portability issues, as mentioned earlier.

Debugging Limitations

Almost all modern programming languages come with debuggers that can help you gain visibility into a running program. For example, GDB is a commonly used debugger for C and C++ that can, among many other things, dump the values of variables in a running program.

Screenshot of GDB pretty-printing a variable
GDB pretty-printing a variable


There are no such tools available for BPF programs. Even though inspecting data is only a tiny part of debugging, achieving a similar result for BPF can open up the doors to a wide range of debugging tools in the future. To enable this, BPF needs to know some metadata about the program.

One such piece of metadata is the type information, and that’s precisely what BTF encapsulates.

Portability

BPF programs operate in the kernel space and can access internal kernel states and data structures. However, the kernel data structures and types are not guaranteed to be the same across different kernel versions or even different machines with the same kernel version. This means that BPF programs compiled on one machine aren’t guaranteed to run correctly on another machine.

Imagine your program is reading a field from a kernel struct and the field is at offset 8 from the start of the struct. Now in a later version of the kernel, more fields are added before that one, and now suddenly it’s at offset 24, but your program is still reading (possibly garbage) data from offset 8. It might also happen that some field ends up getting renamed in a later version. For example, <terminal inline>thread_struct's fs<terminal inline> field might get renamed to <terminal inline>fsbase<terminal inline> between kernel version 4.6 and 4.7. It’s also possible that your program is running on a different kernel configuration, which has disabled some features and compiled out parts of the struct.

All this means that you cannot compile your BPF program on your machine and distribute the binary to other systems.

A standard solution has been to use BPF Compiler Collection (BCC). With BCC, you usually embed your BPF programs as a plain string into the user space program (for example, a Python program). During execution in the target machine, BCC uses its embedded Clang/LLVM combo and compiles the program on the fly using the locally installed kernel headers.

However, this approach introduces more problems. First of all, the Clang/LLVM combo is huge, and embedding it with the application results in a large binary size. It’s also resource-heavy and can use up a significant amount of resources during the compilation. Finally, this approach requires the kernel headers to be installed on the target machine, which might not always be the case.

The solution is BPF CO-RE (Compile Once—Run Everywhere). Using BTF, you can eliminate the need to install the kernel headers on the target machine or embed Clang/LLVM with the application and compiling on the target machine.

What Is BTF?

As mentioned before, BPF is the metadata format that encodes the debug info related to BPF programs and maps. BTF can encode metadata data types, function info, and line info into a compact format.

In a non-BPF program, these metadata are usually stored using the DWARF format. Still, DWARF is quite complicated and verbose, making it unsuitable to include in the kernel due to size overhead. BTF, on the other hand, is a compact and simple format that can be included in the kernel image.

BTF represents each data type using one of few type descriptors:

  • <terminal inline>BTF_KIND_INT<terminal inline>
  • <terminal inline>BTF_KIND_PTR
  • <terminal inline>BTF_KIND_ARRAY
  • <terminal inline>BTF_KIND_STRUCT<terminal inline>
  • etc.

The type information is stored in the <terminal inline>.BTF<terminal inline> section of the produced ELF. Other than the type descriptors, this section also encodes the strings. The function and line info is stored in the <terminal inline>.BTF.ext<terminal inline> section.

For a detailed description of the BTF, you can check the Linux Kernel documentation.
K8s Metrics, Logging, and Tracing
Monitor the health of your cluster and troubleshoot issues faster with pre-built dashboards that just work.
Start Free Trial Book a Demo

BTF Quickstart

Now let’s get more hands-on with a tutorial for using BTF to pretty-print a BPF map, significantly improving debugging.

To get started, you need to have a Linux kernel compiled with the <terminal inline>CONFIG_DEBUG_INFO_BTF<terminal inline> option enabled. Most distributions come with this option enabled, but you can check by running the following command:


zgrep CONFIG_DEBUG_INFO_BTF=y /proc/config.gz

You’ll also need to install Clang and LLVM on your machine.

Since you’ll be writing XDP programs that will manipulate packets on your network devices, it’s a good idea to create a virtual network interface so you don’t end up losing internet connection in your physical interfaces. The easiest way to set up a virtual interface is with this repo.

Clone the repo and set up a virtual interface named <terminal inline>test1<terminal inline>.


git clone git@github.com:xdp-project/xdp-tutorial.git
cd xdp-tutorial/testenv
sudo ./testenv.sh setup --name=test1 --legacy-ip

Now write a BPF program that counts the number of IPv4 and IPv6 packets received on an interface. Save the following in <terminal inline>xdp_count.c<terminal inline>:


#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <linux/if_ether.h>
#include <arpa/inet.h>

struct bpf_map_def SEC("maps") cnt = {
        .type = BPF_MAP_TYPE_ARRAY,
        .key_size = sizeof(__u32),
        .value_size = sizeof(long),
        .max_entries = 2,
};


SEC("xdp_count")
int xdp_count_prog(struct xdp_md *ctx)
{
        void *data_end = (void *)(long)ctx->data_end;
        void *data = (void *)(long)ctx->data;
        __u32 ipv6_key = 0;
        __u32 ipv4_key = 1;
        long *value;
        __u16 h_proto;
        struct ethhdr *eth = data;
        if (data + sizeof(struct ethhdr) > data_end) // This check is necessary to pass verification
                return XDP_DROP;
        
        h_proto = eth->h_proto;
        if (h_proto == htons(ETH_P_IPV6)) { // Check if IPv6 packet
                value = bpf_map_lookup_elem(&cnt, &ipv6_key);
                if (value)
                        *value += 1;
                return XDP_PASS;
        }
        value = bpf_map_lookup_elem(&cnt, &ipv4_key);
        if (value)
            *value += 1;
        return XDP_PASS;

}

char _license[] SEC("license") = "GPL";

In the previous code, a BPF map named <terminal inline>cnt<terminal inline> stores the number of packets. <terminal inline>cnt<terminal inline> is an array of two elements. The number of IPv6 packets is stored in key 0, and the number of IPv4 packets is stored in key 1.

Compile the code with Clang:


clang -O2 -Wall -g -target bpf -c xdp_count.c -o xdp_count.o

Next, load the program using <terminal inline>bpftool<terminal inline>:


sudo bpftool prog load xdp_count.o /sys/fs/bpf/xdp_count type xdp

Run the following command and note down the ID of the program you just loaded and the ID of the map(s) being used by the program:


sudo bpftool prog list
Output of bpftool prog list
Output of bpftool prog list


You can also get the map ID(s) by running sudo bpftool map list.

Output of the bpftool map list command
Output of the bpftool map list command


This command will also give you the name, type, key size, value size, and the max entries of the map.

Now, attach the program to a networking device.


sudo bpftool net attach xdpgeneric id <program_id> dev test1

Replace <terminal inline>program_id<terminal inline> with the ID of the program, and <terminal inline>device_name<terminal inline> with the name of the networking device you want to attach the program to (eg, <terminal inline>enp34s0<terminal inline>).

Now, send some packets to this device. The test environment script already provides a handy ping command to do so:


sudo ./testenv.sh ping # For IPv6
sudo ./testenv.sh ping --legacy-ip # For IPv4

Dump the map and check how many packets have been processed.


sudo bpftool map dump id <map_id>
The dumped value of the map
The dumped value of the map


As you can see, you have two elements in the map as expected. The values are in hexadecimal format and will also depend on your machine’s endianness. In the screenshot, it’s in the little-endian format, which means 22 IPv6 and 4 IPv4 packets have been processed.

Obviously, the result being in hexadecimal and little-endian format is not easy to debug at a glance. So you’ll need to annotate the map with BTF to allow better introspection.

Change the declaration of <terminal inline>cnt<terminal inline> as follows and save the new code in <terminal inline>xdp_count_btf.c<terminal inline> -


...
struct {
        __uint(type, BPF_MAP_TYPE_ARRAY);
        __type(key, __u32);
        __type(value, long);
        __uint(max_entries, 2);
} cnt SEC(".maps");
...

Observe that the section name is now <terminal inline>.maps<terminal inline>, and the map itself has been annotated with BTF enabled macros <terminal inline>__uint<terminal inline> and <terminal inline>__type<terminal inline>.

Compile the code with Clang:


clang -O2 -Wall -g -target bpf -c xdp_count_btf.c -o xdp_count_btf.o

The use of the <terminal inline>-g<terminal inline> flag will create the debugging information and generate the BTF. Note that the <terminal inline>-g<terminal inline> flag has been used previously, too, as it’s required by <terminal inline>libbpf<terminal inline> to load the program; however, previously the map wasn’t annotated by BTF, so <terminal inline>bpftool<terminal inline> couldn’t pretty-print it.

Verify the BTF sections are present in the generated object file.


llvm-objdump -h xdp_count_btf.o
Output of llvm-objdump
Output of llvm-objdump


As explained before, the .BTF section contains the type and string data and the .BTF.ext section encodes the <terminal inline>func_info<terminal inline> and <terminal inline>line_info<terminal inline> data.

First, detach the previous program.


sudo bpftool net detach xdpgeneric dev test1

Then follow a similar procedure to load and attach the new program to the interface and send some packets to the interface.


sudo bpftool prog load xdp_count_btf.o /sys/fs/bpf/xdp_count_btf type xdp
sudo bpftool prog list
sudo bpftool net attach xdpgeneric id <program_id> dev test1
sudo ./testenv.sh ping
sudo ./testenv.sh ping --legacy-ip

Finally, dump the map corresponding to the new program. If you did everything correctly, this time, the output will be much different.

The dumped value of the map
The dumped value of the map


Not only is it pretty-printed in JSON format, but the values are also in decimal, making it much more readable and understandable.

BTF and CO-RE

As mentioned before, BTF can enable the use of CO-RE to make your BPF programs portable across different kernel versions or user configurations. You can eliminate the need for local kernel headers by generating the BTF information of the kernel itself.


bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h

This will create a huge <terminal inline>vmlinux.h<terminal inline> file that contains all the kernel types, including types that are exposed as part of UAPI, types that are internal and available through <terminal inline>kernel-devel<terminal inline>, and some more internal types not available anywhere else. In your BPF program, you can just <terminal inline>#include "vmlinux.h"<terminal inline> and get rid of other kernel headers like <terminal inline><linux/fs.h><terminal inline>, <terminal inline><linux/sched.h><terminal inline>, and so on.

Getting rid of kernel header dependency is just the tip of what BTF can achieve. For a thorough explanation of BTF and CO-RE, you can go through this article.

Conclusion

BTF is an extremely powerful tool that can make BPF programs more debuggable and portable. Since it’s a relatively new technology, the development is ongoing, and you can expect to see plenty of improvements in the future.

This article gave you a glimpse of what BTF can achieve. You learned the shortcomings of BPF, what BPF is, and how to annotate maps with BTF and pretty-print maps. Finally, you also learned how BTF acts as the starting place for enhancing portability with CO-RE.

Start your free 14-day ContainIQ trial
Start Free TrialBook a Demo
No card required
Aniket Bhattacharyea
Software Engineer

Aniket is a software engineer currently working towards a Master's in Mathematics from Ramakrishna Mission Vivekananda Educational and Research Institute. Previously, Aniket was Head of Technology at PiParadox, where he worked in C, C++, Python, JavaScript, and many other languages. Prior to that, in 2016, Aniket participated in Google Code where he finished in 6th position in TopCoder. Aniket has a Bachelor of Science degree in Mathematics from St. Xavier's College, Kolkata.

READ MORE