The SIGSEGV Linux signal denotes a segmentation violation within a running process. Segmentation errors occur when a program tries to access memory that hasn’t been allocated. This could be due to accidentally buggy code or intentional malicious activity.
SIGSEGV signals arise at the operating system level, but you’ll also encounter them in the context of containerization technologies like Docker and Kubernetes. When a container exits with status code 139, it’s because it received a SIGSEGV signal. The operating system terminated the container’s process to guard against a memory integrity violation.
It’s important to investigate what’s causing the segmentation errors if your containers are terminating with code 139. It often points to a programming error in languages which gives you direct access to memory. If the error occurs in containers running a third-party image, there could be a bug inside that software or an incompatibility with your environment.
In this article, we’ll explain what SIGSEGV signals are, their impact on your Linux containers in Kubernetes, and the ways you can troubleshoot and handle segmentation faults in your application.
What’s a Segmentation Fault?
A segmentation fault can seem quite an opaque term. The meaning is quite simple: a process that receives a SIGSEGV signal tried to read or write memory it’s not allowed to access. The kernel will normally terminate the process to avoid memory corruption. This behavior can be modified by explicitly handling the signal in the program’s code.
Segmentation faults are named to reflect the way in which memory is partitioned by purpose. Data segments store values that can be determined at compile time, text segments hold program instructions, and heap segments encapsulate dynamically allocated variables created at runtime.
Most real-world segmentation faults fall into the last category. Operations such as improper pointer definitions, writes to read-only memory, and out-of-bounds array accesses all try to access memory that’s outside the heap.
Here’s a trivial example of a C program which exhibits a segmentation error:
Save the program as <terminal inline>hello-world.c<terminal inline> and compile it with <terminal inline>make<terminal inline>:
Now run the compiled binary:
You’ll see the program immediately terminates, and a segmentation fault is reported. If you inspect the exit code, you’ll see it’s 139, corresponding to a segmentation error:
Why did this happen? The program created a variable called <terminal inline>buffer<terminal inline>, but didn’t allocate it any memory. As a result, the assignment <terminal inline>buffer = 0<terminal inline> ended up writing to unallocated memory. You can fix the program by making sure <terminal inline>buffer<terminal inline> is large enough to cover the data it’ll store:
Allocating <terminal inline>buffer<terminal inline> one byte of memory is sufficient to handle the assigned value. This program will run successfully and exit with status code 0.
Segmentation Faults in Containers
Now let’s look at what happens when a segmentation fault occurs within a container. Here’s a simple Dockerfile for the crashing application written above:
Build your container image with the following command:
Now start a container:
The container will start, run the command, and terminate immediately. Use <terminal inline>docker ps<terminal inline> with the <terminal inline>-a<terminal inline> flag to retrieve the stopped container’s details:
Exit code 139 is reported because of the segmentation error in the application.
Debugging Kubernetes Segmentation Errors
You can troubleshoot segmentation faults in Kubernetes containers, too. Use a project such as MicroK8s or K3s to start a local Kubernetes cluster on your machine. Next, create a pod manifest that starts a container using your image:
Use kubectl to add the pod to your cluster:
Now retrieve the pod’s details:
The pod is stuck crashing in a restart loop. Use the <terminal inline>describe<terminal inline> command to find out the cause:
The exit code is reported as 139, indicating that a segmentation error caused the application inside the container to crash.
Solving Segmentation Faults
Once you’ve identified segmentation errors as the cause of your container terminations, you can move on to mitigating them and preventing future recurrences.
If the error’s occurring inside a third-party container image, you will have limited options. You should raise an issue with the developer to investigate the cause of the unexpected memory access attempts. When the problem’s inside your own software, you can start more targeted troubleshooting efforts to work out what’s wrong.
Identifying Problem Code
First, look for any obvious areas of your code that could be impacted by segmentation issues. You might be able to use your container’s logs to work out the sequence of events leading up to the error:
Use the container’s activity to work out where in the source the error originates. If there’s an array access, pointer reference, or unguarded memory write in the area, it could be the cause of the problem.
Another common cause of these errors is when an update to a shared library introduces incompatibilities with existing binaries. This can cause memory access violations when the loaded versions differ from the compatible range.
Try to revert any recent changes to the dependencies inside your containers. This can help eliminate issues that have been provoked by third-party library updates.
In rare cases, persistent segmentation faults with no obvious explanation can be caused by incompatibilities with the machine’s physical hardware. They might even be symptomatic of a memory fault. This kind of issue is less likely in the context of a typical Kubernetes cluster running on a public K8s cloud provider. Running memtester can help you rule out physical problems when you’re maintaining your own hardware.
You can use Linux tools to more precisely debug SIGSEGV signals. Segmentation fault errors always create kernel log messages. As containers execute as processes within your host’s kernel, these will be written even if the error occurred inside a container.
Inspect your system log by viewing the contents of <terminal inline>/var/log/syslog<terminal inline>:
This command will continually stream logs to your terminal until you use Ctrl+C to cancel it. Now, try to reproduce the event that caused the segmentation error. The SIGSEGV signal will look like this in the log:
The log can be interpreted as follows:
- <terminal inline bold>at <address><terminal inline bold>: The forbidden memory address that the code tried to access.
- <terminal inline bold>ip <pointer><terminal inline bold>: The memory address of the code that committed the violation.
- <terminal inline bold>sp <pointer><terminal inline bold>: The stack pointer for the operation, giving the address of the last program request in the stack.
- <terminal inline bold>error <code><terminal inline bold>: The error code gives an indication of the type of operation that was attempted. Common codes include <terminal inline>6<terminal inline>, writing to an unallocated area; <terminal inline>7<terminal inline>, writing to an area that is readable but can’t be written to; <terminal inline>4<terminal inline>, reading from an unallocated area; and <terminal inline>5<terminal inline>, reading from a write-only area.
Accessing the kernel log gives you a better understanding of what the code’s doing at the point the error occurs. Although this log isn’t directly accessible from within containers, you should still be able to retrieve details of segmentation faults if you have root access to the host machine.
Gracefully Handling Segmentation Faults
Another way to resolve segmentation faults is to gracefully handle them inside your code. You can use libraries like segvcatch to capture SIGSEGV signals and convert them into software exceptions. You can then handle them like any other exception, giving you the chance to log details to your error-monitoring platform and recover without a crash.
While handling SIGSEGV is a good way to prevent hard failures, it’s still worth fully investigating and resolving each occurrence of this error. A segmentation fault indicates that the program is doing something that the Linux kernel explicitly forbids, pointing to serious reliability or security defects in your code. Merely catching and ignoring the signal could cause other problems in your program if it expects to have read or written memory which proved to be out of bounds.
Segmentation faults occur when a program tries to use memory that it’s not allowed to access. They also arise when data is written to read-only memory and vice versa. In this article, you’ve seen how these errors are often the result of simple programming mistakes. You’ve also looked at how to identify a segmentation error as the cause of container terminations, and how you can start troubleshooting segmentation faults you experience in your programs.
Staying ahead of these errors ensures your applications run with maximum reliability and uptime. ContainIQ’s Kubernetes monitoring platform lets you track container terminations and their causes in real-time, helping you identify when things aren’t working as they should. You can use it to set up alerts for failed containers, then inspect their exit codes to identify segmentation errors within your cluster. This gives you the ability to react to segmentation faults as they occur, keeping your workloads in a healthy state.