When something goes wrong, it’s often the DevOps engineer who’s responsible for detecting and solving the issue immediately. While Kubernetes has many benefits when it comes to speed and resource efficiency, its added complexity can make debugging more difficult. In order to resolve problems efficiently and avoid interruptions for end users, it’s vitally important that you understand how to debug a Kubernetes cluster.
In this article, you’ll learn how to quickly and effectively debug your cluster, nodes, pods, and containers.
Cluster Level Debugging
Let’s say there’s an issue with your Kubernetes cluster. Because these clusters are made up of several components like nodes and control planes, any problem with them can lead to issues with your cluster. To successfully debug the cluster, you can try one of the suggestions in the following sections.
Obtaining Information About Your Clusters
The first step toward debugging your cluster is to gather more information about its components. Run the following command:
The <terminal inline>kubectl cluster-info<terminal inline> command outputs information about the status of the control plane and CoreDNS. As seen below, the command shows there are no issues with the control plane of the cluster and the CoreDNS is running correctly.
To get more detailed information about your cluster, you can run:
The output of the previous command was not included due to its length.
The <terminal inline>kubectl cluster-info dump<terminal inline> command gives detailed information about your cluster and activities carried out on the cluster.
Getting the Status of Your Node
An unhealthy node is a problem and affects the cluster as a whole. To get the status of a node, run:
The command’s output shows you the name, status, roles, age, and the Kubernetes version that runs the node.
Below, the node with the minikube name has the ready status. So the minikube node is running fine, and there are no issues. If you notice that the status of any of your nodes is not ready, then you can assume there’s an issue with the node.
Getting the Health Status of Cluster Components
Kubernetes clusters have different components like the scheduler, controller manager, and the etcd. Knowing the health status of the components will help save time debugging your cluster. To get the health status of your cluster components, run:
As you can see below, the scheduler is unhealthy, while the controller manager and etcd are healthy.
Getting Activities in Your Cluster
Viewing all of the events that have taken place on your cluster is another effective way to debug your cluster. You can spot any error that occurred while a particular action was carried out on your cluster namespace.
To get all the events that occurred in your cluster, you can run:
The command below will show you the details of all events carried out on your cluster and why the action occurred.
Pods and Container Debugging
If your cluster and node are healthy, but you still have issues with your pods and container, it’s time to examine the pods and containers running inside. You may have mistakenly tried to run a pod with a nonexistent image or something similar. These suggested actions below might be a helpful starting point.
Describing Pods
You may have issues with your pods due to problems with the containers inside them. When you run <terminal inline>kubectl get pods<terminal inline>, you might notice a status like <terminal inline>ImagePullBackOff<terminal inline> or <terminal inline>pending<terminal inline> instead of the normal <terminal inline>Running<terminal inline> status. The <terminal inline>kubectl describe pod podname<terminal inline> can help you quickly get to the root of the trouble. Let’s dive into these two examples.
Pods with ImagePullBackOff Error Status
To simulate an example of a pod with the status <terminal inline>ImagePullBackOff<terminal inline> error, create a pod with the image <terminal inline>busybox888<terminal inline>. Copy the YAML content below and store it inside a file with the name <terminal inline>pods.yaml<terminal inline>. Then, create a pod object with <terminal inline>kubectl create -f pods.yaml<terminal inline>.
Get the pods running on your cluster with <terminal inline>kubectl get pods<terminal inline>. You can see the output of this command below. Notice that the demopod has <terminal inline>ImagePullBackOff<terminal inline> status.
After seeing the <terminal inline>ImagePullBackOff<terminal inline> status, you need to be able to identify the cause of <terminal inline>ImagePullBAckOff<terminal inline> correctly. The <terminal inline>kubectl describe pod<terminal inline> gives a comprehensive overview of your pods.
To get an overview of demopod, run:
If you scroll down after running the command, you’ll notice the output below showing the reason for the error. The error occurred because Kubernetes couldn’t pull the image from the Docker repository. The <terminal inline>ImagePullBackOff<terminal inline> error can occur if the image doesn’t exist or is hosted inside a private Docker registry that’s not accessible to Kubernetes.
The error can also occur when Kubernetes tries to pull a container with an image that’s not hosted on the Docker public registry but in your local computer system. Kubernetes will try to pull the image multiple times without success, leading to an <terminal inline>ImagePullBackOff<terminal inline> error.
If you take a look at the output of the <terminal inline>kubectl describe pod demopod<terminal inline> command, you will notice an <terminal inline>ErrImagePull<terminal inline> error. This error occurs while pulling the image of the container that will run inside the pod. Notice also the <terminal inline>ImagePullBackOff<terminal inline> error, which happens when Kubernetes stops pulling the image due to several <terminal inline>ErrImagePull<terminal inline> errors.
In this scenario, the failure occurred because the busybox888 image doesn’t exist. Kubernetes will try to pull the busybox888 image, but the output will show an <terminal inline>ErrImagePull<terminal inline> error. Kubernetes will try to pull the image again, and for each pull, the <terminal inline>ErrImagePull<terminal inline> error displays, resulting in the <terminal inline>ImagePullBackOff<terminal inline> error.
The correct name for the image is <terminal inline>busybox<terminal inline>, not <terminal inline>busybox888<terminal inline>. You’ll need to change the image name from <terminal inline>busybox888<terminal inline> to <terminal inline>busybox<terminal inline>.
If you’re sure that the image name is correct, but you see <terminal inline>ImagePullBackOff<terminal inline>, the error could have occurred because the container image is hosted in a private Docker registry. To resolve this, you can specify the authorization details inside secrets.
Pods with Pending Status
When you run the <terminal inline>kubectl get pods<terminal inline> command, you can sometimes see pods with pending status. To simulate that scenario, create 1,000 pods on your Kubernetes namespace cluster.
Copy the content below and save it inside the deployment file that you created previously. Make sure you delete the previous contents in the deployment file.
Run <terminal inline>kubectl apply -f deployment.yaml<terminal inline> to apply the changes to your deployment. Then run <terminal inline>kubectl get pods<terminal inline> to get your pods. Notice that a pod called <terminal inline>demodeploy-6df58566f5-2p969<terminal inline> has pending status. You can use the <terminal inline>kubectl describe pods name of pod<terminal inline> to get more details about the pending status.
In order to find out why pod <terminal inline>demodeploy-6df58566f5-2p969<terminal inline> is pending, you can run <terminal inline>kubectl describe pod<terminal inline> with the <terminal inline>demodeploy-6df58566f5-2p969<terminal inline> pod, i.e. <terminal inline>kubectl describe pod demodeploy-6df58566f5-2p969<terminal inline>.
<terminal inline>kubectl describe pods nameofpodwithpendingstatus<terminal inline> gets you detailed information about any pod with pending status. In this case, the <terminal inline>demodeploy-6df58566f5-2p969<terminal inline> pod has a pending status and was included in <terminal inline>kubectl describe pod<terminal inline>. In your case, the pod with pending status might have a different name; you need to get the name and add it to the <terminal inline>kubectl describe pod<terminal inline>, i.e. <terminal inline>kubectl describe pod nameofyourpodwithpendingstatus<terminal inline>.
After running the <terminal inline>kubectl describe pods nameofyourpodwithpendingstatus<terminal inline> command, scroll down to the bottom of the output of the command. You’ll see the information shown below.
The result shows that there are too many pods on the cluster (in this case, the cluster is running on minikube). The <terminal inline>demodeploy-6df58566f5-2p969<terminal inline> pod will remain in the pending state due to the overload of pods. If you encounter such an error, make sure the cluster is not overloaded.
Other Pod Errors
When running <terminal inline>kubectl get pods<terminal inline>, you’ll naturally encounter other errors. The <terminal inline>kubectl describe podname<terminal inline> command can get you more information. Some possible errors include:
- <terminal inline>RunContainerError<terminal inline>. Occurs when the container inside the pod can’t start due to the application’s configuration inside the container.
- <terminal inline>KillContainerError<terminal inline>. Occurs when the container running inside the pod fails to stop (or be killed).
- <terminal inline>SetupNetworkError<terminal inline>. Occurs if there is an issue setting up the network for the pod.
- <terminal inline>CrashLoopBackOff<terminal inline>. Occurs when your pods continuously crash in an endless loop after starting. It can be caused by an issue with the application inside the container, misconfiguring parameters of the pod or container, or errors while creating your Kubernetes cluster. This could include using the same port for two or more containers in a pod.
- Pods stuck in an error state. Occurs when the scheduler has scheduled the pod, but the pod fails to start. <terminal inline>kubectl describe pod<terminal inline> can help solve this.
Pod Level Logging
In addition to describing your pods, you can also check your pods’ logs. To view the events that have occurred inside your pods, run <terminal inline>kubectl get logs podname<terminal inline>.
Use <terminal inline>kubectl get pods<terminal inline> to get all your pods and then run <terminal inline>kubectl logs anyrunningpodname<terminal inline>.
The <terminal inline>kubectl get pods<terminal inline> command will output something similar to what’s shown below.
As you can see, the logs show details about the pod’s events, which can help you to understand how the container inside your pods is running.
Using Exec to Debug
Let’s say you want to fix errors that might be caused by the application running inside your container or the misconfiguration of your container. However, you can’t access the application unless you’re inside the container. To get inside and fix the errors, you need to use the <terminal inline>kubectl exec<terminal inline> command.
Copy the pod definition commands below and save them inside any file of your choice.
To navigate into the Redis container running inside demopod, run <terminal inline>kubectl exec -it demopod -- /bin/bash<terminal inline>. The <terminal inline>-it<terminal inline> makes the execution interactive or gives you an interactive interface. The command will display something similar to what is shown below. You’re inside the Redis container and can proceed to check other information like environmental details.
Other Kubernetes Debugging Tips
Your pods and containers might be running fine, however, you may not be able to access the pods externally (over the internet). If you can’t access your application running inside your pods, it might be due to misconfiguration in your service YAML file.
Copy the pod definition YAML content below into any file of your choice.
Then, copy the service definition YAML content below into another file of your choice.
Create both files by running <terminal inline>kubectl create -f filename<terminal inline>.
Finally, run <terminal inline>kubectl get service<terminal inline> to ensure your service has been created, as shown below.
To get the service URL, run <terminal inline>minikube service service-myapp --url<terminal inline>.
When you click on the link, you should see the image below showing the output of NGINX. If you can’t see the image, then there’s an issue with your service. The error can occur if you specify a target port number that’s different from the NodePort number. Rectify this by specifying the correct port number in the service and pod YAML configuration files.
Another possible cause of the error might be that you didn’t specify the correct selector inside your service YAML file.

Listing the API Version
If the API version of your Kubernetes objects has been misconfigured, you may be using a different API version that’s outdated or inaccurate. To troubleshoot this, list the apiVersion for each object.
A tool called Move2Kube uses source artifacts like Docker Compose or Cloud Foundry manifest files to source and generate Kubernetes deployment artifacts, including object YAML, Helm charts, and operators. You can install Move2Kube with the command below.
Next, run <terminal inline>move2kube collect<terminal inline>. You will see a similar output below, which shows that the information has been collected.
You’ll see something similar to what is shown below.
Navigate into the <terminal inline>m2k_collect<terminal inline> folder and find the clusters folder (the cluster folder holds collected information about your Kubernetes cluster). In the clusters folder is a file with the name <terminal inline>minikube--77f0e6522d6f6d24.yaml<terminal inline>. Use that to view the file’s content. While the output is longer than this, you can still double-check that you’re using the right apiVersion.
Final Thoughts
This article has introduced you to several ways to debug your Kubernetes clusters, pods, and containers. With a few simple kubectl commands, you can solve problems that range from misconfiguring your API version to identifying that you ran your pod with an image that wasn’t located on the Docker repository.
ContainIQ’s Kubernetes monitoring platform can help identify and solve these issues even more efficiently. It’s a platform that allows you to monitor Kubernetes metrics and events within your clusters automatically. Its metric dashboards help you view your node/pod CPU, so you can debug your Kubernetes clusters, pods, and containers quickly and easily.