Start your free 14-day ContainIQ trial

Kubernetes CPU Limits and Throttling

In this post, you’ll learn how Kubernetes CPU limits and throttling work, including the core concepts, uses, how to assign resources to containers and pods, and how to troubleshoot issues.

August 25, 2022
Alexander Fashakin
Software Engineer

Resource allocation is an essential consideration when you’re running Kubernetes. When a pod is created on Kubernetes, you need to ensure the containers have sufficient resources to run. The CPU and RAM are two of the most critical cluster resources, and they need to be specified and allocated correctly.

Kubernetes provides a number of ways to configure CPU and RAM resource usage parameters. Setting up these usage parameters and limits ensures you don’t exhaust resources, starve out your apps, or incur high operational costs. It also helps to prevent resource hogging and out-of-memory (OOM) errors that may affect cluster health.

Additionally, if an application exceeds its CPU limit, it’s throttled, which affects the response rate and can cause problems.

In this article, you’ll learn more about Kubernetes CPU limits and throttling, why they are relevant, and how you can implement them.

What Are CPU Limits and Requests?

There are several ways to manage resource consumption in Kubernetes, which include assigning priority to pods to avoid a kernel termination when resource limits are reached, using namespaces and resource quotas, setting network policies, and configuring storage.

To define how many resources will be allocated to a container, Kubernetes uses the concept of requests and limits.

Requests determine how many resources a container will receive during deployment. When the usage request is defined, Kubernetes only schedules the container on a compute node that can provide the resource it needs.

Meanwhile, limits represents the highest possible value a container is permitted to receive before it’s restricted, which ensures that the container never goes above that value. This is illustrated in the diagram below.

Container limits and requests

Take note, however, that the value for a container request should always be less than that of the limit. Otherwise, you’ll get an error that prevents you from running the container.

CPU Throttling and OOM Killed

Many enterprises today run their business-critical workloads on a Kubernetes multi-tenant environment. These multi-tenant environments use limits to regulate tenant workloads or chargebacks.

CPU throttling is an approach to automatically slow down the CPU so as to consume fewer resources, and is a side effect of setting resource usage limits. Whenever an application is running close to the maximum CPU utilization that it’s permitted, it is throttled. This cuts down the CPU cycles and slows down the response rate.

CPU throttling

Additionally, if you’re running a resource-intensive application on a node with low resources, there’s a chance the node will exhaust its CPU or memory resources and shut down. The process in which pods are terminated when they use more than the allocated memory resources is known as out-of-memory termination, or OOM Killed.

In essence, the OOM Killer monitors memory usage, identifies processes that use up too much memory, and terminates them. Note, however, that the OOM Killer can terminate a process even when there is memory available.

An application failure resulting from an OOM error can trigger a snowball effect that affects the entire cluster, making resource management a critical aspect of Kubernetes.

The Risks of Operating Kubernetes Without Limits

The following are the typical errors that happen during limitless execution without any limits:

  • OOM errors: As explained earlier, out of memory errors can occur when you don’t set a specified limit, and the node ends up exhausting all the resources available. This causes the node to be shut down, and negatively impacts the cluster stability and health.
  • Excessive resource use: If a maximum CPU limit is not specified, a container can use the full CPU capacity of the node. Excess resource usage will slow down other containers on the same node, and can even cause Kubernetes core components like kubectl, the control panel, and kube-proxy to become unresponsive.
  • Increased expenses: This is mostly due to overutilization or underutilization of computing resources. For example, when you operate without any resource requests or limits, even if you’re running out of resources, you won’t get any errors or stopgaps. This can quickly lead to a costly, overprovisioned cluster.
  • Resource hogging: Without any limits set, you may end up with a resource-intensive application that consumes too many resources. This resource hogging affects other applications, starving them of resources, making them slow and sometimes unresponsive.

Side Effects of Setting Wrong Resource Limits

Using resource limits doesn’t come without potential complications. If you provide insufficient CPU resources, then the application service may have high latency, because the kernel will spend most of its time switching contexts at the CPU core.

If the requested CPU resource limit is too high, it may lead to underutilization of resources, artificially inflating the cost of running the cluster. Because of this, it’s strongly advised not to use configuration values that are too high, affecting stability, or too low, wasting resources.

How Resource Requests and Limits Work

Each pod or container can select one or more of the configurations in their deployment YAML files to achieve resource limits, such as CPU limits, CPU requests, memory limits, and memory requests. A typical sample YAML file is shown below, where pod resource limits and requests are specified:

apiVersion: v1
kind: Pod
 name: cpu-memorylimit-demo
   - name: cpu-memorylimit-demo-container
         memory: "64Mi"
         cpu: "250m"
         memory: "128Mi” \
         cpu: "1"` \

The pod reserves 64 Mi of RAM and 0.25 CPU units, but can use up to twice the RAM and an entire CPU.

How to Set Resource Limits in Kubernetes

Kubernetes provides an excellent way to limit resources in nodes, pods, or containers. It is called <terminal inline>LimitRange<terminal inline>.

With LimitRange, you can specify the upper and lower threshold limits of resource consumption. The default in Kubernetes is to allow containers to run with unlimited computing resources. LimitRanges allow you to restrict how much CPU and memory a pod can consume.

A <terminal inline>Limit Range<terminal inline> provides constraints in the following ways:

  • Minimum and maximum resources
  • A ratio of demand to limit for a resource in a namespace.
  • Enforcing a default demand and limit for computing resources

In the following sections, you will create a limit range in the local Kubernetes cluster and set CPU limits (minimum and maximum), which the containers and pods will use.


To follow along with this tutorial, you’ll need the following:

  • Kubernetes cluster with at least one worker node; eg something like MiniKube
  • kubectl command-line tool

The following steps will be performed in the cluster:

  • Set up a LimitRange in the default namespace
  • Deploy pods with assigned CPU Limit Range

Set Up a LimitRange in the Default Namespace.

Create a YAML file to define a limit range in the default namespace with the following name and content:

Name: <terminal inline>set-limit-range.yaml<terminal inline>


apiVersion: v1
kind: LimitRange
  name: set-limit-range
  - max:
      cpu: "100m"
      cpu: "50m" 
    type: Container

Run the following command to create a <terminal inline>LimitRange<terminal inline> in the cluster:

kubectl create -f set-limit-range.yaml

To confirm that the <terminal inline>LimitRange<terminal inline> was successfully created, enter the following command:

kubectl describe limitrange set-limit-range

On successful execution, it will display a limit range that defines the CPU as “Min=50m” and “Max=100m”in the terminal.

Deploy Pods with LimitRange in the Default Namespace

In this section, you’ll create a pod definition that consumes a CPU greater than 50m and less than 100m within our defined namespace.

Create a file called <terminal inline>pod-with-cpu-within-range.yaml<terminal inline>, and paste in the following contents:

apiVersion: v1
kind: Pod
 name: pod-with-cpu-within-range
 - name: pod-with-cpu-within-range
   image: nginx
       cpu: "80m"
       cpu: "40m"

Now apply the YAML file you created above to create a pod in the cluster, and subsequently view the pod details using these commands:

kubectl create -f pod-with-cpu-within-range.yaml
kubectl describe pod pod-with-cpu-within-range

Once executed, you’ll see that the pod was created with a CPU resource request and limit. However, Kubernetes will never allow this pod to request resources more than its defined limit at any time.

You can also check the CPU usage using the inbuilt Kubernetes dashboard. Enter the following command to use the dashboard.

minikube dashboard

If you navigate to Workloads > Pods, you can see the complete CPU and memory usage.

CPU and memory usage
CPU and memory usage

As shown in the CPU usage dashboard below, Kubernetes was throttling it to 60m, or .6 CPU, every time consumption load increased.

CPU usage
CPU usage

The image below depicts the HTTP response time after CPU throttling. The first section of the graph represents how API response times exponentially increase when the CPU is throttled. For API consumers, this will lead to service degradation.

CPU throttling and response
CPU throttling and response

In the second section, API latency becomes zero as soon as it exceeds a certain threshold. In other words, CPU throttling causes some API requests to get timed out right away.

Similarly, you can set the memory limit to the namespace as shown below:

        cpu: 60m
        memory: 20Mi
        cpu: 40m
        memory: 40Mi

With this change, Kubernetes will never allow this pod to request resources with more memory than its defined limit at any time. In the image below, you can see how the pod was limited to 4.08m memory usage every time.

Memory usage
Memory usage

Troubleshooting Cluster Health Using ContainIQ

Monitoring plays an important role in maintaining the health of individual services, as well as the overall health of the cluster. If you have a large cluster running many services, it can be difficult to monitor your health and failures.

ContainIQ is a SaaS-based Kubernetes monitoring solution designed specifically to handle this. It enables developers to monitor the health of their clusters and troubleshoot problems more quickly.

The ContainIQ platform provides pre-built dashboards that capture live pod metrics and events in real-time, and then lets you structure, visualize and analyze this data, and configure alerts.

Pod and Node Metrics

As soon as the ContainIQ agent is set up and deployed in the cluster, it starts displaying useful data about your application health in various dashboards. Here, it provides information on the cluster using a beautiful hexagonal, color-coded display. For instance, if a pod is using more memory than usual, it will be color coded, moving from the standard green to yellows, oranges, and reds, depending on how severe the change is.

ContainIQ dashboard with CPU per pod and MEM per pod
ContainIQ dashboard with CPU per pod and MEM per pod

In the following image, the <terminal inline>CPU Per Node<terminal inline> section provides aggregated core and memory usage across all nodes in your cluster. It also allows you to easily identify all pods with CPU spikes. The <terminal inline>MEM per Node<terminal inline> does the same for memory.

ContainIQ dashboard showing CPU per node and MEM per node
ContainIQ dashboard showing CPU per node and MEM per node

Monitoring with ContainIQ

When using Kubernetes to manage your containerized apps, it’s essential to monitor the health of your clusters, pods, and nodes through consistent, automated alerts. For instance, you should be notified when a pod stops working, and when memory or CPU usage consumption reaches a certain threshold.

An effective alert system can help you identify issues early before they escalate and cause you to lose data, suffer downtime, or affect the entire health of your containers

ContainIQ provides predefined alerts, making it easier and faster to identify pods that need attention. This is more efficient than conventional monitoring or command-line tools in identifying problematic pods.

When alerts are activated, you will receive real-time notifications via Slack or another notification hook of your choice. Using the monitor dashboard, you can enable and disable any alert as your app demands. You can also create custom alerts.

You can set up alerts to track network connectivity issues, display disk usage warning, identify missing pods, and monitor node resource consumption.

ContainIQ monitor dashboard
ContainIQ monitor dashboard

In the below screenshot, there’s a custom monitor which triggers an alert when CPU usage is greater than eighty percent.

ContainIQ custom monitor alert
ContainIQ custom monitor alert

Additionally, ContainIQ provides an innovative latency dashboard that allows users to view reports and set alerts for specific application metrics such as P95 latency, P99 latency, average latency, and per second requests for each microservice. With this kind of data, you can easily handle traffic spikes, improve end-user experience, and ensure better overall performance for all your apps.

Final Thoughts

While a simple Kubernetes cluster may function fine without limits and resource requests, you may start experiencing stability issues as your application complexity and workload grows. Setting up limits in an effective manner helps the cluster function properly without compromising the health of one or more workloads.

Because there are multiple abstraction levels in the Kubernetes ecosystem, troubleshooting becomes a lot more difficult without proper tools. Having the ability to set up custom alerts, monitor container health, and track consumption is essential.

Unlike traditional infrastructure, a Kubernetes cluster’s resources are dynamic and constantly evolving, making real-time Kubernetes monitoring a necessity. With a comprehensive alert and monitoring tool like ContainIQ, you can get unique and clear insights into cluster health and performance with metrics such as logs, events, latencies, and traces.

Start your free 14-day ContainIQ trial
Start Free TrialBook a Demo
No card required
Alexander Fashakin
Software Engineer

Alexander Fashakin is a technical writer and engineer from Nigeria. He holds a Master’s degree in computer science from Shenyang Jianzhu University in China, as well as a Bachelor’s degree in computer science from Wesley University of Science and Technology. He specializes in building applications using React and has experience in DevOps.