Start your free 14-day ContainIQ trial

Troubleshooting SIGSEGV: Segmentation Fault in Linux Containers (exit code 139)

Segmentation faults are an unfortunate reality of software development. In this post you’ll learn about the SIGSEGV error, and how to debug it when you’re working with Linux containers.

July 13, 2022
James Walker
Software Engineer

The SIGSEGV Linux signal denotes a segmentation violation within a running process. Segmentation errors occur when a program tries to access memory that hasn’t been allocated. This could be due to accidentally buggy code or intentional malicious activity.

SIGSEGV signals arise at the operating system level, but you’ll also encounter them in the context of containerization technologies like Docker and Kubernetes. When a container exits with status code 139, it’s because it received a SIGSEGV signal. The operating system terminated the container’s process to guard against a memory integrity violation.

It’s important to investigate what’s causing the segmentation errors if your containers are terminating with code 139. It often points to a programming error in languages which gives you direct access to memory. If the error occurs in containers running a third-party image, there could be a bug inside that software or an incompatibility with your environment.

In this article, we’ll explain what SIGSEGV signals are, their impact on your Linux containers in Kubernetes, and the ways you can troubleshoot and handle segmentation faults in your application.

What’s a Segmentation Fault?

A segmentation fault can seem quite an opaque term. The meaning is quite simple: a process that receives a SIGSEGV signal tried to read or write memory it’s not allowed to access. The kernel will normally terminate the process to avoid memory corruption. This behavior can be modified by explicitly handling the signal in the program’s code.

Segmentation faults are named to reflect the way in which memory is partitioned by purpose. Data segments store values that can be determined at compile time, text segments hold program instructions, and heap segments encapsulate dynamically allocated variables created at runtime.

Most real-world segmentation faults fall into the last category. Operations such as improper pointer definitions, writes to read-only memory, and out-of-bounds array accesses all try to access memory that’s outside the heap.

Here’s a trivial example of a C program which exhibits a segmentation error:


int main() {
  char *buffer;
  buffer[0] = 0;
  return 0;
}

Save the program as <terminal inline>hello-world.c<terminal inline> and compile it with <terminal inline>make<terminal inline>:


$ make hello-world

Now run the compiled binary:


$ ./hello-world
Segmentation fault (core dumped)

You’ll see the program immediately terminates, and a segmentation fault is reported. If you inspect the exit code, you’ll see it’s 139, corresponding to a segmentation error:


$ echo $?
139

Why did this happen? The program created a variable called <terminal inline>buffer<terminal inline>, but didn’t allocate it any memory. As a result, the assignment <terminal inline>buffer[0] = 0<terminal inline> ended up writing to unallocated memory. You can fix the program by making sure <terminal inline>buffer<terminal inline> is large enough to cover the data it’ll store:


int main() {
  char *buffer[1];
  buffer[0] = 0;
  return 0;
}

Allocating <terminal inline>buffer<terminal inline> one byte of memory is sufficient to handle the assigned value. This program will run successfully and exit with status code 0.

Segmentation Faults in Containers

Now let’s look at what happens when a segmentation fault occurs within a container. Here’s a simple Dockerfile for the crashing application written above:


FROM alpine:latest
RUN apk install --upgrade build-base
COPY hello-world.c .
RUN make hello-world && mv hello-world /usr/bin/hello-world
CMD ["hello-world"]

Build your container image with the following command:


$ docker build -t segfault:latest .

Now start a container:


$ docker run segfault:latest

The container will start, run the command, and terminate immediately. Use <terminal inline>docker ps<terminal inline> with the <terminal inline>-a<terminal inline> flag to retrieve the stopped container’s details:

$ docker ps -a


CONTAINER ID IMAGE COMMAND CREATED STATUS
6e6944f7f339 segfault:latest "hello-world" 17 seconds ago Exited (139) 16 seconds ago

Exit code 139 is reported because of the segmentation error in the application.

Monitor Kubernetes Events in Real-Time
Monitor the health of your cluster and troubleshoot issues faster with pre-built dashboards that just work.
Learn More Book a Demo
event dashboard

Debugging Kubernetes Segmentation Errors

You can troubleshoot segmentation faults in Kubernetes containers, too. Use a project such as MicroK8s or K3s to start a local Kubernetes cluster on your machine. Next, create a pod manifest that starts a container using your image:


apiVersion: v1
kind: Pod
metadata:
  name: segfault
spec:
  containers:
    - name: segfault
    image: segfault:latest

Use kubectl to add the pod to your cluster:


$ kubectl apply -f pod.yaml

Now retrieve the pod’s details:

$ kubectl get pod/segfault


NAME READY STATUS RESTARTS AGE
segfault 0/1 CrashLoopBackOff 1 (7s ago) 19s

The pod is stuck crashing in a restart loop. Use the <terminal inline>describe<terminal inline> command to find out the cause:


$ kubectl describe pod/segfault
Name: segfault
Namespace: default
...
Containers:
  segfault:
    ...
    Last State:   Terminated
    Reason:     Error
    Exit Code:  139

The exit code is reported as 139, indicating that a segmentation error caused the application inside the container to crash.

Solving Segmentation Faults

Once you’ve identified segmentation errors as the cause of your container terminations, you can move on to mitigating them and preventing future recurrences.

If the error’s occurring inside a third-party container image, you will have limited options. You should raise an issue with the developer to investigate the cause of the unexpected memory access attempts. When the problem’s inside your own software, you can start more targeted troubleshooting efforts to work out what’s wrong.

Identifying Problem Code

First, look for any obvious areas of your code that could be impacted by segmentation issues. You might be able to use your container’s logs to work out the sequence of events leading up to the error:


$ docker logs my-container

$ kubectl logs pod/my-pod

Use the container’s activity to work out where in the source the error originates. If there’s an array access, pointer reference, or unguarded memory write in the area, it could be the cause of the problem.

Environment Incompatibilities

Another common cause of these errors is when an update to a shared library introduces incompatibilities with existing binaries. This can cause memory access violations when the loaded versions differ from the compatible range.

Try to revert any recent changes to the dependencies inside your containers. This can help eliminate issues that have been provoked by third-party library updates.

In rare cases, persistent segmentation faults with no obvious explanation can be caused by incompatibilities with the machine’s physical hardware. They might even be symptomatic of a memory fault. This kind of issue is less likely in the context of a typical Kubernetes cluster running on a public K8s cloud provider. Running memtester can help you rule out physical problems when you’re maintaining your own hardware.

Targeted Debugging

You can use Linux tools to more precisely debug SIGSEGV signals. Segmentation fault errors always create kernel log messages. As containers execute as processes within your host’s kernel, these will be written even if the error occurred inside a container.

Inspect your system log by viewing the contents of <terminal inline>/var/log/syslog<terminal inline>:


$ sudo tail -f /var/log/syslog

This command will continually stream logs to your terminal until you use Ctrl+C to cancel it. Now, try to reproduce the event that caused the segmentation error. The SIGSEGV signal will look like this in the log:


hello-world[2631584]: segfault at 7f4624c6cfe0 ip 000055730c3621ed sp 00007ffce90e35f0 error 7 in hello-world[55730c362000+1000]

The log can be interpreted as follows:

  • <terminal inline bold>at <address><terminal inline bold>: The forbidden memory address that the code tried to access.
  • <terminal inline bold>ip <pointer><terminal inline bold>: The memory address of the code that committed the violation.
  • <terminal inline bold>sp <pointer><terminal inline bold>: The stack pointer for the operation, giving the address of the last program request in the stack.
  • <terminal inline bold>error <code><terminal inline bold>: The error code gives an indication of the type of operation that was attempted. Common codes include <terminal inline>6<terminal inline>, writing to an unallocated area; <terminal inline>7<terminal inline>, writing to an area that is readable but can’t be written to; <terminal inline>4<terminal inline>, reading from an unallocated area; and <terminal inline>5<terminal inline>, reading from a write-only area.

Accessing the kernel log gives you a better understanding of what the code’s doing at the point the error occurs. Although this log isn’t directly accessible from within containers, you should still be able to retrieve details of segmentation faults if you have root access to the host machine.

Gracefully Handling Segmentation Faults

Another way to resolve segmentation faults is to gracefully handle them inside your code. You can use libraries like segvcatch to capture SIGSEGV signals and convert them into software exceptions. You can then handle them like any other exception, giving you the chance to log details to your error-monitoring platform and recover without a crash.

While handling SIGSEGV is a good way to prevent hard failures, it’s still worth fully investigating and resolving each occurrence of this error. A segmentation fault indicates that the program is doing something that the Linux kernel explicitly forbids, pointing to serious reliability or security defects in your code. Merely catching and ignoring the signal could cause other problems in your program if it expects to have read or written memory which proved to be out of bounds.

Final Thoughts

Segmentation faults occur when a program tries to use memory that it’s not allowed to access. They also arise when data is written to read-only memory and vice versa. In this article, you’ve seen how these errors are often the result of simple programming mistakes. You’ve also looked at how to identify a segmentation error as the cause of container terminations, and how you can start troubleshooting segmentation faults you experience in your programs.

Staying ahead of these errors ensures your applications run with maximum reliability and uptime. ContainIQ’s Kubernetes monitoring platform lets you track container terminations and their causes in real-time, helping you identify when things aren’t working as they should. You can use it to set up alerts for failed containers, then inspect their exit codes to identify segmentation errors within your cluster. This gives you the ability to react to segmentation faults as they occur, keeping your workloads in a healthy state.

Start your free 14-day ContainIQ trial
Start Free TrialBook a Demo
No card required
James Walker
Software Engineer

James Walker is the founder of Heron Web, a UK-based digital agency providing bespoke software development services to SMEs. He has experience managing complete end-to-end web development workflows with DevOps, CI/CD, Docker, and Kubernetes. James also writes technical articles on programming and the software development lifecycle, using the insights acquired from his industry career. He's currently a regular contributor to CloudSavvy IT and has previously written for DigitalJournal.com, OnMSFT.com, and other technology-oriented publications.

READ MORE