Resources

OOMKilled: Troubleshooting Kubernetes Memory Requests and Limits

November 30, 2021

Kubernetes is great when it works, but when undesired issues occur, like your application shutting down because of memory issues, you should know how to troubleshoot with OOMKilled.

Josh Alletto
Senior Site Reliability Engineer

If you’ve been working with Kubernetes for any period of time, you’ve probably come across the <terminal inline>OOMkilled<terminal inline> error. It can be a frustrating error to debug if you don’t understand how it works. In this article, we’ll take a closer look at the <terminal inline>OOMKilled<terminal inline> error, why this error occurs, how to troubleshoot it when it happens, and what steps you can take to help prevent it.

Memory in Kubernetes

Let’s begin by understanding how Kubernetes thinks about memory allocation. When the scheduler is trying to decide how to place pods in the Kubernetes cluster, it looks at the capacity for each node.

You should note that a node with 8 GB of memory won’t necessarily have 8 GB available to run pods. Kubernetes tries to determine how much of the 8 GB the node needs for normal operation and how much is left over to run pods.

You can see a breakdown of allocatable resources by taking a look at the YAML for a node: <terminal inline>kubectl get node my_node -oyaml<terminal inline>. You should see something like this:

Allocatable:

attachable-volumes-aws-ebs: 25
cpu: 3920m
ephemeral-storage: 95551679124
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 15104488Ki
pods: 58

Based on this resource, the scheduler decides which pods to run where, and it tries to make sure that none of the nodes in the cluster end up running more pods than they can handle.

When you define a container, you can set two different variables for memory. Whether or not you set these variables and what you set them to can have huge repercussions for your pod.

The first is the requests variable. This tells Kubernetes that this particular container needs, at minimum, this much memory. Kubernetes will guarantee the memory is available when it places the pod. If you don’t set it, Kubernetes will assume you don’t need any resources by default and so it won’t guarantee that your pod will be placed on a node with enough memory.

The next value you can set is the limit, which is the maximum. The container won’t always need this much memory, but if it asks for more, it can’t go above the limit. Limits can be tricky—when Kubernetes places a pod, it only checks the requests variable.

We’ll take a look at how this can contribute to an <terminal inline>OOMKilled<terminal inline> error in a bit.


---
apiVersion: v1
kind: Pod
metadata:
 name: test-application
spec:
 containers:
 - name: test-application
  image: ruby
  resources:
   requests:
    memory: "128Mi"
    cpu: "250m"
    limits:
    memory: "250Mi"
    cpu: "500m"

The <terminal inline>request<terminal inline> and the <terminal inline>limit<terminal inline> are important because they play a big role in how Kubernetes decides which pods to kill when it needs to free up resources:

  • Pods that do not have the limit or the request set
  • Pods with no set limit
  • Pods that are over memory request but under limit
  • Pods using less than requested memory

So What Is OOMKilled?

<terminal inline>OOMKilled<terminal inline> is an error that actually has its origins in Linux. Linux systems have a program called OOM (Out of Memory Manager) that tracks memory usage per process. If the system is in danger of running out of available memory, OOM Killer will come in and start killing processes to try to free up memory and prevent a crash. The goal of OOM Killer is to free up as much memory as possible by killing off the least number of processes.

Under the hood, OOM Killer allocates each running process a score. The greater the score, the greater the possibility the process will be killed off. The method it uses to calculate this score is beyond this tutorial, but it’s good to know that Kubernetes takes advantage of the score to help make decisions about which pods to kill.

The kubelet running on the VM monitors memory consumption. If resources on a VM become scarce, the kubelet will start killing pods. Essentially, the idea is to preserve the health of the VM so that all the pods running on it won’t fail. The needs of the many outweigh the needs of the few, and the few get murdered.

There are two main <terminal inline>OOMKilled<terminal inline> errors you’ll see in Kubernetes:

  • OOMKilled: Limit Overcommit
  • OOMKilled: Container Limit Reached

Let’s take a look at each one.

OOMKilled Because of Limit Overcommit

Remember that limit variable we talked about? Here is where it can get you into trouble.

The <terminal inline>OOMKilled: Limit Overcommit<terminal inline> error can occur when the sum of pod limits is greater than the available memory on the node. So for example, if you have a node with 8 GB of available memory, you might get eight pods that each need a gig of memory. However, if even one of those pods is configured with a limit of, say 1.5 gigs, you run the risk of running out of memory. All it would take is for that one pod to have a spike in traffic or an unknown memory leak, and Kubernetes will be forced to start killing pods.

You might also want to check the host itself and see if there are any processes running outside of Kubernetes that could be eating up memory, leaving less for the pods.

OOMKilled Because of Container Limit Reached

While the <terminal inline>Limit Overcommit<terminal inline> error is related to the total amount of memory on the node, <terminal inline>Container Limit Reached<terminal inline> is usually relegated to a single pod. When Kuberntetes detects a pod using more memory than the set limit, it will kill the pod with error <terminal inline>OOMKilled—Container Limit Reached<terminal inline>.

When this happens, check the application logs to try to understand why the pod was using more memory than the set limit. It could be for a number of reasons, such as a spike in traffic or a long-running Kubernetes job that caused it to use more memory than usual.

If during your investigation you find that the application is running as expected and that it just requires more memory to run, you might consider increasing the values for request and limit.

Conclusion

In this article, we took a closer look at the Kubernetes <terminal inline>OOMKilled<terminal inline> error, an error that has its origins in Linux. It helps Kubernetes manage memory when scheduling pods and make decisions about what pods to kill when resources are running low. Don’t forget to consider the two flavors of the <terminal inline>OOMKilled<terminal inline> error, Container Limit Reached and Limit Overcommit. Understanding them both can go a long way toward successful troubleshooting, and ensure that you minimize running into the error in the future.

Looking for an out-of-the-box monitoring solution?

With a simple one-line install, ContainIQ allows you to monitor the health of your cluster with pre-built dashboards and easy-to-set alerts.

Article by

Josh Alletto

Senior Site Reliability Engineer

Josh Alletto is a developer, writer, and educator with over ten years of experience teaching in classrooms. He is currently a Senior Site Reliability Engineer (SRE) at Lessonly where he works closely with software and DevOps engineers to ensure sites and services are reliable, scalable, and free of tech debt. Previously, Josh held roles as an SRE role at Enova International and as an Adjunct Professor at Loyola University Chicago.

Read More