Orchestrators like Kubernetes have abstracted servers away, and now you can manage your whole infrastructure in a multi-tenant, heterogeneous Kubernetes cluster. You package your application in containers, and then Kubernetes takes care of availability, scaling, monitoring, and more across various nodes featuring specialized hardware or logical isolation.
Kubernetes has a lot of options and flexibility depending on what you need from it. One such piece of functionality is the concept of taints and tolerations, which helps you achieve selective scheduling.
This guide will walk you through everything you need to know about Kubernetes taints and tolerations, showing how these concepts can be applied and how you can effectively make use of them for your cluster.
Why Use Taints and Tolerations?
Taints and tolerations help prevent your pods from scheduling to undesirable nodes. Suppose you want to run all your graphic-intensive pods or your frontend pods on a particular node. Taints and tolerations make this possible.
What is Scheduling?
In Kubernetes, scheduling isn’t about timing, but about ensuring that pods are matched to nodes. When you create a pod, your scheduler in the control plane looks at the nodes and evaluates available resources and other conditions before assigning a pod to the nodes.
If there are no errors during the evaluation, the pod will be scheduled on your node. If the conditions of the evaluations aren’t satisfied, then your pods will be put in a <terminal inline>Pending<terminal inline> state.
You can use <terminal inline>kubectl describe pods name_of_pod<terminal inline> and scroll down to <terminal inline>Events<terminal inline> to find the precise reasons for the pending state. In the following example, the manifest requires more CPU than we have available in the node, which results in this error message:
If there were another node with adequate resources present, the scheduler would have scheduled the pod to it.
Note: Pods can be in a <terminal inline>Pending<terminal inline> state for a variety of other reasons. A scheduling fault is just one of many.
How Do Taints Work?
Taints are like mosquito repellent, if you think of mosquitoes as your pods and yourself as a node. The scheduler looks at the nodes, and if there are taints that the pods can’t tolerate, it doesn’t schedule the pod to that node.
Similar to labels, there can be many taints on your nodes, and the cluster’s scheduler will schedule a pod to your nodes only if it tolerates all of the taints.
You can add taint to your nodes with the following command:
Here, nodeName is the name of the node that you want to taint, and the taint is described with the key-value pair. In the above example, <terminal inline>value1<terminal inline> is the key and <terminal inline>taint-effect<terminal inline> is the value.
The taints can produce three different outcomes depending on your <terminal inline>taint-effect<terminal inline> choice.
Taint effects define what will happen to pods if they don’t tolerate the taints. The three taint effects are:
- NoSchedule: A strong effect where the system lets the pods already scheduled in the nodes run, but enforces taints from the subsequent pods.
- PreferNoSchedule: A soft effect where the system will try to avoid placing a pod that does not tolerate the taint on the node.
- NoExecute: A strong effect where all previously scheduled pods are evicted, and new pods that don’t tolerate the taint will not be scheduled.
The three taint effects can be seen here:
How Do Tolerations Work?
Taints don’t allow pods to schedule on nodes with the set key-value property, but how will you schedule a pod to these nodes with taints?
That’s where tolerations come in. They help you schedule pods on the nodes with the taints. Tolerations are applied to your pods’ manifests in the following format:
If the taints and tolerations match, the pods can be scheduled on the tainted nodes, but there’s not a requirement that the pods be scheduled on the tainted nodes. If there’s a node without taints, a pod with tolerations can be scheduled on that node, even if there’s also an available node with tolerable taints.
The nodeSelector property that can help you schedule pods on specific nodes, but that’s out of the scope of this article.
Use Cases for Taints and Tolerations
Now that you understand what taints and tolerations are, you’re likely curious about how they are used in a cluster. There are many ways, and in this section, you’ll go through the four most prominent use cases.
Master Node Taints
The control plane components are hosted on your master node, and you don’t want your application to interfere with the core processes for Kubernetes. Kubernetes, by default, taints your master node with <terminal inline>NoSchedule<terminal inline> to prevent it from crashing under high loads. If your workers crash, new processes can be spawned, but there’s no such luxury if your master crashes, because the master controls everything else in your cluster.
You can remove the taints, but that’s not recommended in production environments. The command to remove is:
Dedicated Nodes for Specific Users
For some business requirements, you need logical isolation for your pods. An example of this would be using different nodes for your internal tools, customers, or different teams in your organization. You can achieve this with the help of taints and tolerations.
After the taints are applied, you can instruct your teams to use specific tolerations for their pods, ensuring that their pods can be scheduled to the correct nodes. As long as all of your nodes are tainted, this is an error-free way to segregate specific teams into specific nodes.
If there’s a possibility of nodes without taints existing in your cluster, you can use NodeAffinity or NodeSelectors to make sure pods are scheduled to the desired nodes.
Nodes with Special Hardware
Earlier, you read about scheduling for limited resources. But what if you require special hardware for your nodes? For example, how can you prevent pods that don’t need GPU from monopolizing resources on an expensive virtual machine with specialized hardware? Taints and tolerations are the answer to this problem.
With the above taint on the nodes, and respective toleration applied to your pods, you can make sure your specialized resources (like GPUs) are utilized by the pods that require them. The tolerations for this use case would look like this:
As previously stated, the master node manages all other components in the cluster, and has a taint applied to it by default.
In some scenarios, the Kubernetes node controller automatically adds <terminal inline>NoExecute<terminal inline> taint to a node. The applied taint either evicts the pods immediately, or “drains” all of the pods from the node and schedules them to different available nodes, depending upon the deployment object.
A good example is when a node has high disk utilization. When this happens, a taint is added to your node so that no further pods are scheduled. This process is automatically done via the node controller in the control plane, and no manual intervention is required.
The other conditions where Kubernetes will add taints automatically are as follows:
- <terminal inline>node.kubernetes.io/not-ready<terminal inline>: This indicates that the node is not ready, and the NodeCondition <terminal inline>Ready<terminal inline> is <terminal inline>False<terminal inline>.
- <terminal inline>node.kubernetes.io/unreachable<terminal inline>: This indicates that the node controller is unable to reach the node, and the NodeCondition <terminal inline>Ready<terminal inline> is <terminal inline>Unknown<terminal inline>.
- <terminal inline>node.kubernetes.io/memory-pressure<terminal inline>: The node is running out of available memory.
- <terminal inline>node.kubernetes.io/disk-pressure<terminal inline>: The node is running out of disk space, or is using disk space at an unexpected rate.
- <terminal inline>node.kubernetes.io/pid-pressure<terminal inline>: The node is running out of process IDs. If this happens, the node will be unable to start any new processes.
- <terminal inline>node.kubernetes.io/network-unavailable<terminal inline>: The node’s network is unavailable.
- <terminal inline>node.kubernetes.io/unschedulable<terminal inline>: The node is not schedulable.
- <terminal inline>node.cloudprovider.kubernetes.io/uninitialized<terminal inline>: This taint is applied to mark a node started with an external cloud provider as unusable. It’s removed when a controller from the cloud-controller-manager initializes the node.
In this article, you learned how taints and tolerations help you schedule your pods to specific nodes. As you’ve seen, the properties are widely used across clusters when scheduling pods.
Taints and tolerations aren’t that useful without NodeSelectors and NodeAffinity properties. If you use all the available properties, the management of pods becomes very simple.