eBPF: Programmable Linux Kernel Without Kernel Modules

Extended Berkeley Packet Filter (eBPF) has fundamentally changed how we interact with the Linux kernel. After years of building monitoring systems and dealing with the limitations of traditional kernel modules, I can say eBPF represents one of the most significant innovations in Linux kernel technology in the past decade.

Let’s break this down: eBPF allows you to safely run custom programs directly in the kernel, without writing kernel modules or risking system stability. The implications are massive for observability, security, and networking.

The Kernel Module Problem

Before eBPF, extending kernel functionality required writing kernel modules. I’ve done this, and it’s painful:

Stability risks: A single bug in a kernel module can crash your entire system. I’ve triggered kernel panics from null pointer dereferences, race conditions, and memory corruption—all resulting in immediate system crashes and angry pages from operations teams.

Security concerns: Kernel modules run with full privileges. A compromised module gives an attacker complete system control. There’s no sandboxing, no safety net.

Compatibility nightmares: Kernel modules must match your exact kernel version. Every kernel update requires recompiling modules, and internal kernel APIs change frequently. I’ve spent days tracking down API changes between kernel versions 5.10 and 5.15.

Development friction: The kernel module development cycle is slow. Compile, load, test, crash, reboot, repeat. A simple change can take 10-15 minutes to test, making rapid iteration impossible.

eBPF solves all of these problems through sandboxing and verification.

What eBPF Actually Is

eBPF provides a virtual machine inside the Linux kernel. You write programs in a restricted subset of C (or higher-level languages like Go via libraries), compile them to eBPF bytecode, and load them into the kernel. The kernel’s verifier ensures your program is safe before running it.

Here’s what you need to know about how it works:

The Verifier

The eBPF verifier analyzes every program before execution. It checks:

Bounded loops: Loops must have a maximum iteration count the verifier can prove
No infinite loops: Programs must terminate in finite time
Memory safety: All memory accesses must be to valid addresses
No kernel crashes: Can’t dereference null pointers or access invalid kernel memory
Type safety: Proper type checking for all operations

If verification fails, your program is rejected before execution—no kernel panic, no crash, just a clear error message. This is transformative compared to kernel modules that can crash the system on first execution.

Attachment Points

eBPF programs attach to specific kernel events or locations:

Tracepoints: Stable, versioned kernel events like sys_enter_open or net_dev_xmit

Kprobes: Dynamic instrumentation at any kernel function

Uprobes: User-space function instrumentation

XDP (eXpress Data Path): Packet processing at the earliest point in the network stack

TC (Traffic Control): Network packet filtering and modification

LSM (Linux Security Modules): Security policy enforcement

Each attachment point gives you different capabilities and performance characteristics.

Real-World eBPF in Action

Let me show you what eBPF enables in practice. Here’s a simple program that traces every file opened by processes on your system:

#include <uapi/linux/ptrace.h>
#include <linux/fs.h>

BPF_HASH(files, u32, u64);

// Trace the open() system call
int trace_open(struct pt_regs *ctx, const char __user *filename)
{
    u32 pid = bpf_get_current_pid_tgid() >> 32;
    u64 count = 0;
    u64 *val;
    
    // Get or initialize counter for this PID
    val = files.lookup(&pid);
    if (val) {
        count = *val;
    }
    count++;
    files.update(&pid, &count);
    
    // Log the filename
    char comm[16];
    char fname[256];
    bpf_get_current_comm(&comm, sizeof(comm));
    bpf_probe_read_user_str(&fname, sizeof(fname), (void *)filename);
    
    bpf_trace_printk("PID %d (%s) opened: %s\n", pid, comm, fname);
    return 0;
}

This 25-line program runs safely in the kernel, captures every file open, and maintains per-process statistics—all with zero kernel modifications and microsecond overhead per event.

Compare this to traditional approaches:

Auditd: 10-50x higher overhead, less flexible filtering
SystemTap: Requires kernel debug symbols, stability concerns
Kernel module: Development time measured in days, stability risks

The eBPF version is production-safe, has minimal overhead, and took 10 minutes to write.

Performance Characteristics

Let’s talk numbers. In production deployments I’ve instrumented:

Tracing overhead: 100-500 nanoseconds per event for simple tracing programs. For a system doing 1 million events/second, that’s 0.05% CPU overhead.

Network processing with XDP: Processes packets at 24+ million packets/second on a single core (40Gbps line rate). Traditional iptables maxes out around 2-3 million pps.

Memory overhead: eBPF programs are tiny—most are 1-10KB of JIT-compiled code. Maps for storing data can be sized as needed, from kilobytes to gigabytes.

I ran a comparison on a production Kubernetes cluster:

Traditional monitoring (node_exporter + cAdvisor + kube-state-metrics):

CPU: 0.8-1.2% per node
Memory: 250MB per node
Metrics lag: 30-60 seconds

eBPF-based monitoring (Cilium + Pixie):

CPU: 0.3-0.5% per node
Memory: 80MB per node
Metrics lag: < 1 second
Bonus: Network-level tracing, TCP retransmit tracking, DNS query logging

The eBPF solution uses 2.5x less CPU and provides 30x faster metrics with richer data.

Observability: Beyond Traditional Monitoring

eBPF transforms observability by giving you kernel-level visibility without instrumentation. Here’s a practical example using bpftrace to analyze disk I/O latency:

# One-liner to trace all block I/O with latency histogram
bpftrace -e 'kprobe:blk_account_io_start { @start[arg0] = nsecs; }
             kretprobe:blk_account_io_done /@start[arg0]/ {
               @usecs = hist((nsecs - @start[arg0]) / 1000);
               delete(@start[arg0]);
             }'

This instantly shows a latency distribution for all disk I/O system-wide. No application instrumentation, no performance impact, instant results.

In production, I use this for:

Identifying performance outliers: Trace TCP retransmits to find network issues

Resource attribution: Track exact CPU time spent in different code paths per process

Application profiling: Generate flamegraphs of CPU usage without modifying applications

Dependency mapping: Automatically discover service-to-service communication by tracing network connections

Traditional APM tools require agent installation, code changes, or sampling. eBPF sees everything, always, with negligible overhead.

Network Programming with XDP

XDP (eXpress Data Path) is eBPF for packet processing. It runs at the earliest point in the network stack—before sk_buff allocation, before most kernel processing. This enables line-rate packet processing.

Here’s a simple XDP program that drops packets from a blocklist:

#include <linux/bpf.h>
#include <linux/if_ether.h>
#include <linux/ip.h>

BPF_HASH(blocklist, u32, u8);

int xdp_filter(struct xdp_md *ctx) {
    void *data_end = (void *)(long)ctx->data_end;
    void *data = (void *)(long)ctx->data;
    
    // Parse Ethernet header
    struct ethhdr *eth = data;
    if ((void *)(eth + 1) > data_end)
        return XDP_PASS;
    
    // Only process IPv4
    if (eth->h_proto != htons(ETH_P_IP))
        return XDP_PASS;
    
    // Parse IP header
    struct iphdr *ip = (void *)(eth + 1);
    if ((void *)(ip + 1) > data_end)
        return XDP_PASS;
    
    // Check blocklist
    u32 src_ip = ip->saddr;
    if (blocklist.lookup(&src_ip))
        return XDP_DROP;  // Drop the packet
    
    return XDP_PASS;  // Allow the packet
}

This runs at 24+ million packets/second per core. A traditional iptables rule for the same task handles 2-3 million pps.

I deployed XDP-based DDoS mitigation in production:

Before (iptables + connection tracking):

Max: 4 Gbps attack traffic before packet loss
CPU: 100% on all cores during attack
Legitimate traffic impacted

After (XDP filtering):

Max: 40+ Gbps attack traffic handled
CPU: 15-20% during attack
Legitimate traffic unaffected
Response time: 2ms to deploy new filter rules

XDP made the difference between a service outage and business as usual during attacks.

Security Applications

eBPF enables zero-overhead security monitoring. Traditional security tools often add 20-40% CPU overhead. eBPF-based security is different.

Runtime Security with LSM BPF

LSM (Linux Security Modules) hooks let you enforce security policies. Here’s an example that restricts which binaries can make network connections:

#include <vmlinux.h>
#include <bpf/bpf_helpers.h>

SEC("lsm/socket_connect")
int BPF_PROG(restrict_connect, struct socket *sock, struct sockaddr *address)
{
    char comm[16];
    bpf_get_current_comm(&comm, sizeof(comm));
    
    // Only allow specific binaries to make connections
    if (strncmp(comm, "nginx", 5) == 0 ||
        strncmp(comm, "postgres", 8) == 0 ||
        strncmp(comm, "app", 3) == 0) {
        return 0;  // Allow
    }
    
    // Log and deny
    bpf_printk("SECURITY: Blocked network connection from %s\n", comm);
    return -EPERM;  // Deny
}

This runs at every socket connect call with < 1 microsecond overhead. Deploying this in a production container cluster caught several cases of:

Compromised WordPress containers attempting reverse shells
Cryptocurrency miners trying to phone home
Misconfigured services making unexpected network connections

All blocked in real-time with minimal performance impact.

Container Security

eBPF excels at container security because it operates at the kernel level, seeing through container boundaries:

# Trace all exec calls in containers and detect suspicious patterns
bpftrace -e 'tracepoint:syscalls:sys_enter_execve {
  if (str(args->filename) =~ /sh|bash|nc|nmap/) {
    printf("ALERT: Shell/tool execution in container\n");
    printf("  PID: %d, Command: %s\n", pid, comm);
    printf("  Path: %s\n", str(args->filename));
  }
}'

I’ve used this to detect:

Crypto-mining malware executing in compromised containers
Privilege escalation attempts
Data exfiltration via reverse shells
Supply chain attacks in third-party containers

Traditional container security relies on image scanning (static analysis). eBPF provides runtime security that catches attacks image scanning misses.

eBPF maps enable bidirectional data sharing between kernel and user-space programs. Maps come in several types:

BPF_MAP_TYPE_HASH: Hash table for arbitrary key-value storage

BPF_MAP_TYPE_ARRAY: Fixed-size array, very fast

BPF_MAP_TYPE_LRU_HASH: Hash with automatic eviction of least-recently-used entries

BPF_MAP_TYPE_RINGBUF: Lock-free ring buffer for event streaming

BPF_MAP_TYPE_PERCPU_HASH: Per-CPU hash tables for scalability

Here’s a practical example—a user-space program that reads statistics from a kernel eBPF program:

from bcc import BPF

# Load eBPF program
b = BPF(src_file="trace_open.c")

# Attach to open syscall
b.attach_kprobe(event="do_sys_open", fn_name="trace_open")

# Read the files map every second
import time
while True:
    time.sleep(1)
    
    print("\n=== File Open Statistics ===")
    files = b["files"]
    for k, v in files.items():
        pid = k.value
        count = v.value
        print(f"PID {pid}: {count} files opened")

Maps enable rich interactions—updating configuration in real-time, exporting metrics, implementing complex stateful logic across events.

In production, I use maps for:

Rate limiting: Track request rates per IP, enforce limits in XDP
Connection tracking: Maintain TCP connection state for observability
Performance metrics: Export detailed statistics to Prometheus
Configuration: Push updated policies to kernel programs without reloading

Getting Started with eBPF

The eBPF ecosystem offers several development paths:

BCC (BPF Compiler Collection)

BCC provides Python bindings for eBPF. Great for quick prototyping and scripts:

from bcc import BPF

program = """
int hello(void *ctx) {
    bpf_trace_printk("Hello, World!\\n");
    return 0;
}
"""

b = BPF(text=program)
b.attach_kprobe(event="do_sys_open", fn_name="hello")
b.trace_print()

BCC is perfect for ad-hoc tracing and one-off investigations. However, it requires LLVM/Clang on target systems and recompiles eBPF code on each run.

libbpf + CO-RE

libbpf with CO-RE (Compile Once, Run Everywhere) is the modern approach. Compile eBPF programs once, run on any kernel 5.2+:

// example.bpf.c
#include <vmlinux.h>
#include <bpf/bpf_helpers.h>

SEC("kprobe/do_sys_open")
int BPF_KPROBE(trace_open, const char *filename)
{
    pid_t pid = bpf_get_current_pid_tgid() >> 32;
    bpf_printk("PID %d opened file\n", pid);
    return 0;
}

char LICENSE[] SEC("license") = "GPL";

Compile with:

clang -O2 -g -target bpf -c example.bpf.c -o example.bpf.o

Load with a user-space program. This approach is production-ready and used by Cilium, Falco, and other major projects.

bpftrace

For quick one-liners and system exploration, bpftrace is unmatched:

# Trace TCP retransmits with details
bpftrace -e 'kprobe:tcp_retransmit_skb {
  printf("TCP retransmit: %s -> %s\n",
    ntop(((struct sock *)arg0)->__sk_common.skc_rcv_saddr),
    ntop(((struct sock *)arg0)->__sk_common.skc_daddr));
}'

# Profile CPU usage by function
bpftrace -e 'profile:hz:99 { @[kstack] = count(); }'

# Trace file deletes
bpftrace -e 'kprobe:vfs_unlink { printf("Deleted: %s\n", str(arg1)); }'

bpftrace is my go-to for investigating production issues. It’s like a programmable debugging superpower.

Common Pitfalls and Solutions

From deploying eBPF in production, here are issues I’ve encountered:

Kernel version requirements: eBPF works best on Linux 5.2+. Earlier kernels have limited features. Always check uname -r and kernel config options like CONFIG_BPF=y and CONFIG_BPF_SYSCALL=y.

Verifier limitations: The verifier is conservative. Complex programs might be rejected even if safe. Solutions: simplify logic, use bounded loops, split into multiple programs.

BTF (BPF Type Format) requirement: CO-RE requires BTF information. Ensure your kernel is compiled with CONFIG_DEBUG_INFO_BTF=y. Ubuntu 20.04+ and RHEL 8+ have this by default.

Performance monitoring overhead: While eBPF is efficient, enabling high-frequency tracing (millions of events/second) can still impact performance. Start with sampling and increase granularity as needed.

Map size limits: Maps have size limits based on kernel memory. For large datasets, use LRU maps or external storage with eBPF for aggregation.

Production Deployment Considerations

Here’s what you need to know for production eBPF:

Kernel support: Check kernel version and required config options. Use bpftool feature to probe capabilities.

Resource limits: Set RLIMIT_MEMLOCK appropriately. eBPF programs and maps use locked memory. Default limits are often too low.

Monitoring: Track eBPF program performance with bpftool prog show and map sizes with bpftool map show.

Updates: Design for hot-swapping programs. Load new version, update references, unload old version—zero downtime.

Security: eBPF requires CAP_BPF or CAP_SYS_ADMIN. In containers, use seccomp profiles that allow the BPF syscall.

Debugging: Use bpf_printk() for logging (output appears in /sys/kernel/debug/tracing/trace_pipe). The verifier provides detailed error messages on rejection.

I run eBPF programs across hundreds of production servers. The key to success: start simple, test thoroughly, and monitor resource usage. eBPF’s safety guarantees mean failures are graceful—programs are rejected or removed, never crash the kernel.

The eBPF Ecosystem

eBPF has spawned an entire ecosystem of tools:

Cilium: eBPF-based Kubernetes networking and security

Falco: Runtime security and threat detection

Pixie: Auto-instrumented observability platform

Katran: XDP-based L4 load balancer

Cloudflare’s Unimog: XDP/eBPF load balancer handling 72+ million requests/second

Parca: Continuous profiling with eBPF

These aren’t toy projects—they’re production systems handling millions of requests/second at companies like Google, Netflix, Facebook, and Cloudflare.

Future Directions

eBPF continues to evolve rapidly:

Signed eBPF programs: Kernel 5.15+ supports signed programs, enabling eBPF in lockdown mode

BPF iterators: Efficient iteration over kernel data structures (processes, files, network connections)

Sleepable programs: BPF_PROG_TYPE_LSM now supports sleeping, enabling more complex logic

BPF tokens: Fine-grained capability delegation for eBPF operations

Kfuncs: Stable kernel function ABI for eBPF, replacing unstable kernel helper functions

The trajectory is clear: eBPF is becoming the standard way to extend the kernel safely.

Should You Use eBPF?

Consider eBPF if you need:

Deep observability: Kernel-level visibility without instrumentation

High-performance networking: Packet processing at line rate (10-100Gbps+)

Runtime security: Real-time threat detection and policy enforcement

Custom kernel extensions: Without the risk of kernel modules

Don’t use eBPF if:

Your kernel is too old: eBPF really shines on 5.2+, minimal features on < 4.15

Simple tools suffice: Don’t over-engineer. Sometimes strace or tcpdump is enough

Development skills are limited: eBPF requires understanding kernel concepts

For my infrastructure, eBPF is now standard. The performance, safety, and capabilities it provides are unmatched. It’s transformed how we do observability, security, and networking—and I expect it to become universal in Linux infrastructure over the next few years.

eBPF represents a fundamental shift in how we interact with operating systems. After decades of the kernel being a black box that only kernel developers could modify, eBPF democratizes kernel programmability. That’s why I’m excited about it—and why you should learn it too.