Unexpected Kubernetes pod terminations can be frustrating when you’re left with a vague error message. Unclear root causes often delay remediation, prolonging the duration of problems inside your cluster.
“Terminated with exit code 1” is one such generic problem that you might encounter from time to time. It occurs when the foreground process inside a container stops because of an error.
In this article we’ll look at some of the possible causes, show how to identify when a pod terminates with exit code 1, and walk through your options for debugging the issue. This should equip you to address these errors in your cluster, reducing failure rates to maximize your application’s uptime.
What is an Exit Code 1 Error?
Processes emit a numerical exit code when they terminate. A command that successfully runs to completion should emit a 0 exit code. All other codes (from 1 to 255) indicate the program stopped unexpectedly, often because of an internal error or invalid arguments.
You can view a command’s exit code by inspecting the <terminal inline>?<terminal inline> variable in your shell:
The <terminal inline>cat<terminal inline> command issued an exit code of <terminal inline>1<terminal inline> because it received an invalid argument. Had you specified a valid file path, <terminal inline>cat<terminal inline> would have successfully read its content, leading to an exit code of 0.
Exit codes between 1 and 128 are reserved for internal use by applications, while codes between 129 and 255 are used when a process is stopped by an external input. One example is the code 137: this means the operating system sent a <terminal inline>SIGKILL<terminal inline> signal, perhaps to resolve a low memory situation.
An exit code 1 error can mean many different things depending on the process you’re working with. It’s a generic code that applications can use freely. A loosely held convention among Unix utility commands sees exit code 1 used to report bad inputs, such as the invalid file path in the example above. Other programs may use exit code 1 for internal or unhandled errors.
Instances of this error may be surfaced as <terminal inline>Exited (1)<terminal inline> or <terminal inline>Terminated with exit code 1<terminal inline> in a container’s logs. Next, you’ll see how to identify and diagnose this exit code when working with Kubernetes applications.
Viewing Kubernetes Pod Exit Codes
Pods that have stopped because of a non-zero exit code will show an <terminal inline>Error<terminal inline> status when you list them using kubectl’s <terminal inline>get pods<terminal inline> command. You can see this by adding an intentionally broken pod to your cluster. Save the following YAML to <terminal inline>demo-pod.yaml<terminal inline> in your working directory:
Next, use kubectl to add the pod to your cluster:
List your pods with the <terminal inline>get pods<terminal inline> command:
The pod has ended up in the <terminal inline>Error<terminal inline> state. This is because its restart policy is set to Never so Kubernetes won’t automatically start a new container when one terminates. If you were using the <terminal inline>OnFailure<terminal inline> or <terminal inline>Always<terminal inline> (default) restart policy, the pod may have a status of ‘CrashLoopBackOff’:
Kubernetes has tried to restart the pod, but it has failed on multiple consecutive attempts. It’ll keep retrying, with an exponentially longer backoff delay before each attempt.
Whether you’re allowing automatic restarts or not, you can inspect a pod’s last exit code using the <terminal inline>describe pod<terminal inline> command:
The output is relatively verbose—some sections have been omitted from the example above. Piping the command through <terminal inline>grep<terminal inline> and <terminal inline>awk<terminal inline> can display the exit code in isolation, without the extraneous supporting information:
Troubleshooting Unexpected Exit Codes
Now you’ve identified that a container’s exiting with status code 1, it’s time to start solving the problem. There’s no guaranteed resolution path, because this is a catch-all error where the cause naturally varies between applications. Here are some techniques that should help uncover the problem.
Check Container Logs
As exit code 1 is issued from within a pod, checking its logs should be your first troubleshooting step. Although containers may seem to crash on startup, they will be briefly running until the termination occurs. Most applications will write logs that can help you debug.
Use the <terminal inline>kubectl logs<terminal inline> command to retrieve the logs for the first container in your pod. When the pod’s stuck in a restart loop, this will be the container created by the most recent restart attempt.
The logs immediately reveal the root cause of the exit code 1 produced by our basic example. You can use this information to fix the <terminal inline>command<terminal inline> field in the pod’s YAML file, then re-apply it to your cluster with <terminal inline>kubectl apply<terminal inline>.
Carefully Inspect Names and Arguments
Sometimes the logs won’t help you. Perhaps the application’s simply crashing too early in its lifecycle to record something useful. In this situation, the best approach is to start with the basics.
Check your pod’s YAML file for simple typos that could be executing the wrong command or providing invalid arguments. Although it’s far from universal, many applications do use exit code 1 to signal an input error, so it’s worth looking for mistakes like passing <terminal inline>--hostv when <terminal inline>--hostname<terminal inline> is expected.
Make sure the image tag reference is correct, too. Specifying the wrong version of an image, such as <terminal inline>my-image:1<terminal inline> instead of <terminal inline>my-image:2<terminal inline>, could trigger unexpected incompatibilities that leave your container unable to interpret your input.
Try Running the Command Yourself
Running the command on your local machine can help identify problems that stem from the container’s environment. The application might depend on certain external characteristics that aren’t satisfied by your container image. There may even be an incompatibility with other programs, libraries, or your Kubernetes distribution, although this is rare.
You can also try manually starting a container using the same image. This can help further narrow down the possibilities:
Here, Docker is used to run the <terminal inline>busybox<terminal inline> image with equivalent arguments to our Kubernetes pod manifest. The application still failed in the same way, confirming the problem isn’t something specific to the Kubernetes deployment.
Completely Recreate the Pod
Sometimes an “off and on again” approach can prove effective. Delete the pod completely, then add it back into your cluster. This can help to resolve transient issues that could be specific to a single Kubernetes node.
This isn’t guaranteed to succeed, as an exit code of 1 originates from inside the container. However, it could help resolve any environmental issues that are preventing the command from successfully running.
Manage Resource Consumption
Sometimes, you might find a pod only crashes after it’s been running for a while. This suggests that the application could have a memory leak, cache mismanagement, or another transient fault that occurs under specific conditions.
Checking the resource utilization of the hardware that hosts your Kubernetes cluster can be helpful in this situation. If your cluster’s routinely encountering low memory scenarios, your applications could break in unexpected ways. It’s possible this can provoke an exit code 1 error if the code crashes because it can’t use any more memory.
Provisioning extra nodes to serve your workloads is a good way to address this problem. Kubernetes will be able to horizontally scale your application across additional hardware, making it less likely that faults will occur. You can also try increasing the resource limits on your individual pods—in this example, each container is limited to 100 MB of memory, which may not be enough for a busy workload:
Kubernetes pod terminations that report exit code 1 indicate something has gone wrong inside the pod’s container. The application will have crashed, causing the container’s foreground process to stop and emit the exit code. This signals to Kubernetes that an error occurred.
These problems are usually caused by issues with your container image or the config parameters you supply. They can also be due to programming bugs that allow exceptions to propagate without being caught. Reviewing your Kubernetes pod logs can help you spot troublesome sections of code. Transient or recoverable issues could be mitigated by registering a catch-all error handler at the start of your program, allowing subsequent issues to be gracefully dealt with.
Failing to address exit code 1 errors could leave you facing downtime if pods keep terminating. ContainIQ’s monitoring platform lets you set up dashboards and alerts that track activity inside your Kubernetes cluster. This provides an effective way to spot failed pods, inspect their exit codes, and retrieve log data that’s automatically correlated with pod termination events.