Start your free 14-day ContainIQ trial

Debugging Your Kubernetes Cluster, Pods, and Containers | Tutorial & Kubectl Examples

Kubernetes is one of the most popular container orchestration systems, but what happens when something goes wrong? This guide shares high-level tips for debugging a Kubernetes cluster, pods, and containers.

June 24, 2022
Nate Matherson
Co-founder

When something goes wrong, it’s often the DevOps engineer who’s responsible for detecting and solving the issue immediately. While Kubernetes has many benefits when it comes to speed and resource efficiency, its added complexity can make debugging more difficult. In order to resolve problems efficiently and avoid interruptions for end users, it’s vitally important that you understand how to debug a Kubernetes cluster.

In this article, you’ll learn how to quickly and effectively debug your cluster, nodes, pods, and containers.

Cluster Level Debugging

Let’s say there’s an issue with your Kubernetes cluster. Because these clusters are made up of several components like nodes and control planes, any problem with them can lead to issues with your cluster. To successfully debug the cluster, you can try one of the suggestions in the following sections.

Obtaining Information About Your Clusters

The first step toward debugging your cluster is to gather more information about its components. Run the following command:


kubectl cluster-info

The <terminal inline>kubectl cluster-info<terminal inline> command outputs information about the status of the control plane and CoreDNS. As seen below, the command shows there are no issues with the control plane of the cluster and the CoreDNS is running correctly.


Kubernetes control plane is running at https://192.168.49.2:8443
CoreDNS is running at https://192.168.49.2:8443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To get more detailed information about your cluster, you can run:


kubectl cluster-info dump
The output of the previous command was not included due to its length.

The <terminal inline>kubectl cluster-info dump<terminal inline> command gives detailed information about your cluster and activities carried out on the cluster.

Getting the Status of Your Node

An unhealthy node is a problem and affects the cluster as a whole. To get the status of a node, run:


kubectl get nodes

The command’s output shows you the name, status, roles, age, and the Kubernetes version that runs the node.

Below, the node with the minikube name has the ready status. So the minikube node is running fine, and there are no issues. If you notice that the status of any of your nodes is not ready, then you can assume there’s an issue with the node.

NAME STATUS ROLES AGE VERSION
minikube Ready control-plane,master 3d12h v1.22.2

Getting the Health Status of Cluster Components

Kubernetes clusters have different components like the scheduler, controller manager, and the etcd. Knowing the health status of the components will help save time debugging your cluster. To get the health status of your cluster components, run:


kubectl get componentstatus

As you can see below, the scheduler is unhealthy, while the controller manager and etcd are healthy.

NAME STATUS MESSAGE ERROR
scheduler Unhealthy Get "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251: connect: connection refused
controller-manager Healthy OK
etcd-0 Healthy {"health":"true","reason":""}

Getting Activities in Your Cluster

Viewing all of the events that have taken place on your cluster is another effective way to debug your cluster. You can spot any error that occurred while a particular action was carried out on your cluster namespace.

To get all the events that occurred in your cluster, you can run:


kubectl get events 

The command below will show you the details of all events carried out on your cluster and why the action occurred.

40s Normal Scheduled pod/worker-deploy-799b5fb489-7c4sx Successfully assigned default/worker-deploy-799b5fb489-7c4sx to minikube
37s Normal Pulled pod/worker-deploy-799b5fb489-7c4sx Container image "kodekloud/examplevotingapp_worker:v1" already present on machine
35s Normal Created pod/worker-deploy-799b5fb489-7c4sx Created container worker-app
35s Normal Started pod/worker-deploy-799b5fb489-7c4sx Started container worker-app
41s Normal Killing pod/worker-deploy-799b5fb489-gc9xr Stopping container worker-app
41s Normal SuccessfulCreate replicaset/worker-deploy-799b5fb489 Created pod: worker-deploy-799b5fb489-7c4sx

Pods and Container Debugging

If your cluster and node are healthy, but you still have issues with your pods and container, it’s time to examine the pods and containers running inside. You may have mistakenly tried to run a pod with a nonexistent image or something similar. These suggested actions below might be a helpful starting point.

Describing Pods

You may have issues with your pods due to problems with the containers inside them. When you run <terminal inline>kubectl get pods<terminal inline>, you might notice a status like <terminal inline>ImagePullBackOff<terminal inline> or <terminal inline>pending<terminal inline> instead of the normal <terminal inline>Running<terminal inline> status. The <terminal inline>kubectl describe pod podname<terminal inline> can help you quickly get to the root of the trouble. Let’s dive into these two examples.

Pods with ImagePullBackOff Error Status

To simulate an example of a pod with the status <terminal inline>ImagePullBackOff<terminal inline> error, create a pod with the image <terminal inline>busybox888<terminal inline>. Copy the YAML content below and store it inside a file with the name <terminal inline>pods.yaml<terminal inline>. Then, create a pod object with <terminal inline>kubectl create -f pods.yaml<terminal inline>.


kubectl get nodesapiVersion: v1
kind: Pod 
metadata:
  name: demopod
  labels:
    app: app
spec:
  containers:
    - name: busybox
      image: busybox888

Get the pods running on your cluster with <terminal inline>kubectl get pods<terminal inline>. You can see the output of this command below. Notice that the demopod has <terminal inline>ImagePullBackOff<terminal inline> status.

NAME READY STATUS RESTARTS AGE
demopod 0/1 ImagePullBackOff 0 2m13s
postgres-deploy-8695749f5f-nnd67 1/1 Running 3 (3h4m ago) 9d
redis-deploy-5d7988b4bb-9kntq 1/1 Running 3 (3h4m ago) 9d
result-app-deploy-b8f4fc44b-8phh2 1/1 Running 3 (3h4m ago) 9d
voting-app-deploy-547678ccc7-67sh5 1/1 Running 3 (3h4m ago) 9d
worker-deploy-799b5fb489-7c4sx 1/1 Running 0 37m

After seeing the <terminal inline>ImagePullBackOff<terminal inline> status, you need to be able to identify the cause of <terminal inline>ImagePullBAckOff<terminal inline> correctly. The <terminal inline>kubectl describe pod<terminal inline> gives a comprehensive overview of your pods.

To get an overview of demopod, run:


kubectl describe pod demopod

If you scroll down after running the command, you’ll notice the output below showing the reason for the error. The error occurred because Kubernetes couldn’t pull the image from the Docker repository. The <terminal inline>ImagePullBackOff<terminal inline> error can occur if the image doesn’t exist or is hosted inside a private Docker registry that’s not accessible to Kubernetes.

The error can also occur when Kubernetes tries to pull a container with an image that’s not hosted on the Docker public registry but in your local computer system. Kubernetes will try to pull the image multiple times without success, leading to an <terminal inline>ImagePullBackOff<terminal inline> error.

If you take a look at the output of the <terminal inline>kubectl describe pod demopod<terminal inline> command, you will notice an <terminal inline>ErrImagePull<terminal inline> error. This error occurs while pulling the image of the container that will run inside the pod. Notice also the <terminal inline>ImagePullBackOff<terminal inline> error, which happens when Kubernetes stops pulling the image due to several <terminal inline>ErrImagePull<terminal inline> errors.

In this scenario, the failure occurred because the busybox888 image doesn’t exist. Kubernetes will try to pull the busybox888 image, but the output will show an <terminal inline>ErrImagePull<terminal inline> error. Kubernetes will try to pull the image again, and for each pull, the <terminal inline>ErrImagePull<terminal inline> error displays, resulting in the <terminal inline>ImagePullBackOff<terminal inline> error.

Events:

TYPE REASON AGE FROM MESSAGE
---- ------ ---- ---- -------
Normal Scheduled 27m default-scheduler Successfully assigned default/demopod to minikube
Normal Pulling 25m (x4 over 27m) kubelet Pulling image "busybox888"
Warning Failed 25m (x4 over 27m) kubelet Failed to pull image "busybox888": rpc error: code = Unknown desc = Error response from daemon: pull access denied for busybox888, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Warning Failed 25m (x4 over 27m) kubelet Error: ErrImagePull
Warning Failed 25m (x4 over 27m) kubelet Error: ImagePullBackOff
Normal BackOff 2m26s (x101 over 27m) kubelet Back-off pulling image "busybox888"

The correct name for the image is <terminal inline>busybox<terminal inline>, not <terminal inline>busybox888<terminal inline>. You’ll need to change the image name from <terminal inline>busybox888<terminal inline> to <terminal inline>busybox<terminal inline>.

If you’re sure that the image name is correct, but you see <terminal inline>ImagePullBackOff<terminal inline>, the error could have occurred because the container image is hosted in a private Docker registry. To resolve this, you can specify the authorization details inside secrets.

Pods with Pending Status

When you run the <terminal inline>kubectl get pods<terminal inline> command, you can sometimes see pods with pending status. To simulate that scenario, create 1,000 pods on your Kubernetes namespace cluster.

Copy the content below and save it inside the deployment file that you created previously. Make sure you delete the previous contents in the deployment file.


apiVersion: apps/v1
kind: Deployment 
metadata:
  name: demodeploy 
  labels:
    app: demoapp
spec:
  selector:
    matchLabels:
      app: app

  replicas: 1000
  template:
    metadata:
      name: demopod
      labels:
        app: app
    spec:
      containers:
        - name: busybox
          image: busybox

Run <terminal inline>kubectl apply -f deployment.yaml<terminal inline> to apply the changes to your deployment. Then run <terminal inline>kubectl get pods<terminal inline> to get your pods. Notice that a pod called <terminal inline>demodeploy-6df58566f5-2p969<terminal inline> has pending status. You can use the <terminal inline>kubectl describe pods name of pod<terminal inline> to get more details about the pending status.

NAME READY STATUS RESTARTS AGE
demodeploy-6df58566f5-26jwq 0/1 Terminating 0 12m
demodeploy-6df58566f5-2p969 0/1 Pending 0 5m43s
demodeploy-6df58566f5-2vwn9 0/1 Pending 0 5m39s
demodeploy-6df58566f5-2xftp 0/1 Pending 0 4s
demodeploy-6df58566f5-4jrvc 0/1 Pending 0 7s
demodeploy-6df58566f5-4rp6m 0/1 Pending 0 5m42s
demodeploy-6df58566f5-4xb7n 0/1 Pending 0 5m33s
demodeploy-6df58566f5-4zpsk 0/1 Pending 0 5m50s

In order to find out why pod <terminal inline>demodeploy-6df58566f5-2p969<terminal inline> is pending, you can run <terminal inline>kubectl describe pod<terminal inline> with the <terminal inline>demodeploy-6df58566f5-2p969<terminal inline> pod, i.e. <terminal inline>kubectl describe pod demodeploy-6df58566f5-2p969<terminal inline>.

<terminal inline>kubectl describe pods nameofpodwithpendingstatus<terminal inline> gets you detailed information about any pod with pending status. In this case, the <terminal inline>demodeploy-6df58566f5-2p969<terminal inline> pod has a pending status and was included in <terminal inline>kubectl describe pod<terminal inline>. In your case, the pod with pending status might have a different name; you need to get the name and add it to the <terminal inline>kubectl describe pod<terminal inline>, i.e. <terminal inline>kubectl describe pod nameofyourpodwithpendingstatus<terminal inline>.

After running the <terminal inline>kubectl describe pods nameofyourpodwithpendingstatus<terminal inline> command, scroll down to the bottom of the output of the command. You’ll see the information shown below.

Status: Pending
kube-api-access-np625:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true;
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s

Events:

Type Reason Age From Message
---- ----- ---- ---- -------
Warning FailedScheduling 5m44s (x4 over 6m37s) default-scheduler 0/1 nodes are available: 1 Too many pods.

The result shows that there are too many pods on the cluster (in this case, the cluster is running on minikube). The <terminal inline>demodeploy-6df58566f5-2p969<terminal inline> pod will remain in the pending state due to the overload of pods. If you encounter such an error, make sure the cluster is not overloaded.

Other Pod Errors

When running <terminal inline>kubectl get pods<terminal inline>, you’ll naturally encounter other errors. The <terminal inline>kubectl describe podname<terminal inline> command can get you more information. Some possible errors include:

  • <terminal inline>RunContainerError<terminal inline>. Occurs when the container inside the pod can’t start due to the application’s configuration inside the container.
  • <terminal inline>KillContainerError<terminal inline>. Occurs when the container running inside the pod fails to stop (or be killed).
  • <terminal inline>SetupNetworkError<terminal inline>. Occurs if there is an issue setting up the network for the pod.
  • <terminal inline>CrashLoopBackOff<terminal inline>. Occurs when your pods continuously crash in an endless loop after starting. It can be caused by an issue with the application inside the container, misconfiguring parameters of the pod or container, or errors while creating your Kubernetes cluster. This could include using the same port for two or more containers in a pod.
  • Pods stuck in an error state. Occurs when the scheduler has scheduled the pod, but the pod fails to start. <terminal inline>kubectl describe pod<terminal inline> can help solve this.

Pod Level Logging

In addition to describing your pods, you can also check your pods’ logs. To view the events that have occurred inside your pods, run <terminal inline>kubectl get logs podname<terminal inline>.

Use <terminal inline>kubectl get pods<terminal inline> to get all your pods and then run <terminal inline>kubectl logs anyrunningpodname<terminal inline>.

The <terminal inline>kubectl get pods<terminal inline> command will output something similar to what’s shown below.

ERROR: relation "votes" does not exist at character 38
STATEMENT: SELECT vote, COUNT(id) AS count FROM votes GROUP BY vote
ERROR: relation "votes" does not exist at character 38
STATEMENT: SELECT vote, COUNT(id) AS count FROM votes GROUP BY vote
ERROR: relation "votes" does not exist at character 38
STATEMENT: SELECT vote, COUNT(id) AS count FROM votes GROUP BY vote
ERROR: relation "votes" does not exist at character 38
STATEMENT: SELECT vote, COUNT(id) AS count FROM votes GROUP BY vote
ERROR: relation "votes" does not exist at character 38
STATEMENT: SELECT vote, COUNT(id) AS count FROM votes GROUP BY vote
ERROR: relation "votes" does not exist at character 38
STATEMENT: SELECT vote, COUNT(id) AS count FROM votes GROUP BY vote

As you can see, the logs show details about the pod’s events, which can help you to understand how the container inside your pods is running.

Using Exec to Debug

Let’s say you want to fix errors that might be caused by the application running inside your container or the misconfiguration of your container. However, you can’t access the application unless you’re inside the container. To get inside and fix the errors, you need to use the <terminal inline>kubectl exec<terminal inline> command.

Copy the pod definition commands below and save them inside any file of your choice.


apiVersion: v1
kind: Pod 
metadata:
  name: demopod
  labels:
    app: app
spec:
  containers:
    - name: redis
      image: redis

To navigate into the Redis container running inside demopod, run <terminal inline>kubectl exec -it demopod -- /bin/bash<terminal inline>. The <terminal inline>-it<terminal inline> makes the execution interactive or gives you an interactive interface. The command will display something similar to what is shown below. You’re inside the Redis container and can proceed to check other information like environmental details.


root@demopod:/data# 
K8s Metrics, Logging, and Tracing
Monitor the health of your cluster and troubleshoot issues faster with pre-built dashboards that just work.
Start Free Trial Book a Demo

Other Kubernetes Debugging Tips

Your pods and containers might be running fine, however, you may not be able to access the pods externally (over the internet). If you can’t access your application running inside your pods, it might be due to misconfiguration in your service YAML file.

Copy the pod definition YAML content below into any file of your choice.


apiVersion: v1
kind: Pod 
metadata:
  name: poddemo
  labels:
    app: app
spec:
  containers:
    - name: nginx
      image: nginx
      ports: 
        - containerPort: 80

Then, copy the service definition YAML content below into another file of your choice.


apiVersion: v1
kind: Service 
metadata:
  name: service-myapp

spec:
  selector:
    app: app
  type: NodePort 
  ports: 
    - targetPort: 80
      port: 80
      nodePort: 30008

Create both files by running <terminal inline>kubectl create -f filename<terminal inline>.

Finally, run <terminal inline>kubectl get service<terminal inline> to ensure your service has been created, as shown below.

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
10d
service-myapp NodePort 10.96.95.52 <none> 80:30008/TCP 31s

To get the service URL, run <terminal inline>minikube service service-myapp --url<terminal inline>.

When you click on the link, you should see the image below showing the output of NGINX. If you can’t see the image, then there’s an issue with your service. The error can occur if you specify a target port number that’s different from the NodePort number. Rectify this by specifying the correct port number in the service and pod YAML configuration files.

Another possible cause of the error might be that you didn’t specify the correct selector inside your service YAML file.

Output of minikube service
Output of minikube service

Listing the API Version

If the API version of your Kubernetes objects has been misconfigured, you may be using a different API version that’s outdated or inaccurate. To troubleshoot this, list the apiVersion for each object.

A tool called Move2Kube uses source artifacts like Docker Compose or Cloud Foundry manifest files to source and generate Kubernetes deployment artifacts, including object YAML, Helm charts, and operators. You can install Move2Kube with the command below.


apiVersion: v1
bash <(curl https://raw.githubusercontent.com/konveyor/move2kube/main/scripts/install.sh)

Next, run <terminal inline>move2kube collect<terminal inline>. You will see a similar output below, which shows that the information has been collected.


INFO[0045] Collection done                              
INFO[0045] Collect Output in [/home/idowu/m2k_collect]. Copy this directory into the source directory to be used for planning. 

You’ll see something similar to what is shown below.


apiVersion: move2kube.konveyor.io/v1alpha1
kind: ClusterMetadata
metadata:
  name: |
    minikube
spec:
  storageClasses:
    - standard
  apiKindVersionMap:
    APIService:
      - apiregistration.k8s.io/v1
    Binding:
      - v1
    CSIDriver:
      - storage.k8s.io/v1
    CSINode:
      - storage.k8s.io/v1
    CSIStorageCapacity:
      - storage.k8s.io/v1beta1
    CertificateSigningRequest:
      - certificates.k8s.io/v1
    ClusterRole:
      - rbac.authorization.k8s.io/v1
    ClusterRoleBinding:
      - rbac.authorization.k8s.io/v1
    ComponentStatus:
      - v1
    ConfigMap:
      - v1
    ControllerRevision:
      - apps/v1
    CronJob:
      - batch/v1
      - batch/v1beta1
    CustomResourceDefinition:
      - apiextensions.k8s.io/v1
    DaemonSet:
      - apps/v1
    Deployment:
      - apps/v1
    EndpointSlice:
      - discovery.k8s.io/v1
      - discovery.k8s.io/v1beta1
    Endpoints:
      - v1
    Event:
      - events.k8s.io/v1
      - events.k8s.io/v1beta1
      - v1
    Eviction:
      - v1
    FlowSchema:
      - flowcontrol.apiserver.k8s.io/v1beta1
    HorizontalPodAutoscaler:
      - autoscaling/v1
      - autoscaling/v2beta1
      - autoscaling/v2beta2
    Ingress:
      - networking.k8s.io/v1
    IngressClass:
      - networking.k8s.io/v1
    Job:
      - batch/v1
    Lease:
      - coordination.k8s.io/v1
    LimitRange:
      - v1
    LocalSubjectAccessReview:
      - authorization.k8s.io/v1
    MutatingWebhookConfiguration:
      - admissionregistration.k8s.io/v1
    Namespace:
      - v1
    NetworkPolicy:
      - networking.k8s.io/v1
    Node:
      - v1
    NodeProxyOptions:
      - v1
    PersistentVolume:
      - v1
    PersistentVolumeClaim:
      - v1
    Pod:
      - v1
    PodAttachOptions:
      - v1
    PodDisruptionBudget:
      - policy/v1
      - policy/v1beta1
    PodExecOptions:
      - v1 

Navigate into the <terminal inline>m2k_collect<terminal inline> folder and find the clusters folder (the cluster folder holds collected information about your Kubernetes cluster). In the clusters folder is a file with the name <terminal inline>minikube--77f0e6522d6f6d24.yaml<terminal inline>. Use that to view the file’s content. While the output is longer than this, you can still double-check that you’re using the right apiVersion.

Final Thoughts

This article has introduced you to several ways to debug your Kubernetes clusters, pods, and containers. With a few simple kubectl commands, you can solve problems that range from misconfiguring your API version to identifying that you ran your pod with an image that wasn’t located on the Docker repository.

ContainIQ’s Kubernetes monitoring platform can help identify and solve these issues even more efficiently. It’s a platform that allows you to monitor Kubernetes metrics and events within your clusters automatically. Its metric dashboards help you view your node/pod CPU, so you can debug your Kubernetes clusters, pods, and containers quickly and easily.

Start your free 14-day ContainIQ trial
Start Free TrialBook a Demo
No card required
Nate Matherson
Co-founder

Nate Matherson is the Co-founder & CEO of ContainIQ. An experienced entrepreneur and technologist, he has founded multiple venture-backed companies and is a two-time Y Combinator Alum. Nate is also an active angel investor.

READ MORE