Start your free 14-day ContainIQ trial

Troubleshooting the “Failed to Create Pod Sandbox” Error

June 27, 2022

The “failed to create pod sandbox” error is a common problem when you’re trying to create a pod in Kubernetes. This article will explain the possible causes of the problem as well as how to fix it.

Vinayak Pandey
Senior Systems Engineer

Kubernetes is a widely popular container orchestration tool, but its complexity can cause a number of problems as well. DevOps and cloud engineers deploying workloads to Kubernetes frequently deal with issues from failed health checks and unhealthy nodes to being unable to bind a volume to a name.

Luckily, Kubernetes has an active user base. Members of the community will readily answer questions and offer resources to help you through any situation.

This article will focus on one error message in particular: “failed to create pod sandbox”. You’ll learn what causes it to pop up when you’re creating a pod and how you can troubleshoot this error.

What Causes FailedCreatePodSandBox?

This error usually occurs due to some issues with the networking, although it can also be caused by suboptimal system resource limit configuration. Since networking is one of the most complex parts of a Kubernetes setup, figuring out the exact cause requires you to have a good understanding of Kubernetes networking.

If pods are stuck in the <terminal inline>ContainerCreating<terminal inline> state, your first step is to check the pod status and get more details with the <terminal inline>kubectl describe podname<terminal inline> command. The output should provide a detailed error message, which you can use as a baseline for further investigation.

It’s important that you’re familiar with such error messages. In a production environment, you will need to fix the issue as soon as possible, and understanding common error messages will help you find the root cause quickly.

Following are all the possible messages you could see related to this error, as well as possible root causes and methods for fixing them.

Scenario 1: CNI Not Working on the Node

The Kubernetes Container Network Interface (CNI) configures networking between pods. If CNI isn’t running properly on the nodes, pods can’t be created because they will be stuck in the <terminal inline>ContainerCreatin<terminal inline>g state.

Here’s how to simulate, troubleshoot, and fix this issue. Say you have a two-node (one control plane, one node) kubeadm cluster running on Kubernetes version 1.23.4, with Weave Net as your CNI. Weave Net runs as a DaemonSet.

Weave Net CNI
Weave Net CNI


To simulate the issue, you’ll prevent the Weave Net from running on your node. Follow these steps:

Step 1

Label your control plane node with <terminal inline>label weave=yes<terminal inline> (control-plane is your node name):


kubectl label node control-plane weave=yes

Step 2

Edit the weave-net DaemonSet and add a node selector under <terminal inline>spec.template.spec<terminal inline>. This node selector will use <terminal inline>label weave=yes<terminal inline>, which ensures the CNI pod only runs on the control plane node:


kubectl edit ds weave-net -n kube-system
Node selector
Node selector


The CNI pod should be running only on the control plane node.

CNI running on control plane node
CNI running on control plane node

Step 3

Now, try to run a pod using <terminal inline>kubectl run nginx --image=nginx<terminal inline>. If you check the pod status, it should be stuck in the <terminal inline>ContainerCreating<terminal inline> state.

Pod in ContainerCreating state
Pod in ContainerCreating state


If you run the command <terminal inline>kubectl describe pod nginx<terminal inline>, you’ll see the error message “FailedCreatePodSandBox: failed to setup network for sandbox”.

FailedCreatePodSandBox
FailedCreatePodSandBox

Debugging and Resolution

The error message indicates that CNI on the node—where nginx pod is scheduled to run—is not functioning properly, so the first step should be to check if the CNI pod is running on that node. If the CNI pod is running properly, you’ve eliminated one possible root cause.

In this case, once you remove the <terminal inline>nodeSelector<terminal inline> from the DaemonSet definition and ensure the CNI pod is running on the node, the nginx pod should be running fine.

Pod running on node
Pod running on node
K8s Metrics, Logging, and Tracing
Monitor the health of your cluster and troubleshoot issues faster with pre-built dashboards that just work.
Start Free Trial Book a Demo

Scenario 2: Missing or Incorrect CNI Configuration Files

Even if the CNI pod is running, you may still experience problems if the CNI configuration files have errors. To simulate this, you’ll make some changes in the CNI configuration files, which are stored under the <terminal inline>/etc/cni/net.d<terminal inline> directory.

Step 1

Log in to the node and run the following commands:


mv /etc/cni/net.d/10-weave.conflist ~
vi /etc/cni/net.d/87-podman-bridge.conflist

Change <terminal inline>bridge<terminal inline> to <terminal inline>bridg<terminal inline> and <terminal inline>cni-podman0<terminal inline> to <terminal inline>cni-podman<terminal inline>.

CNI misconfiguration
CNI misconfiguration

At this point, the Weave pod will still be running on the node.

Step 2

If you run the <terminal inline>kubectl run nginx --image=nginx<terminal inline> command again, the pod will be stuck in the <terminal inline>ContainerCreating<terminal inline> state.

Pod in ContainerCreating state
[Pod in ContainerCreating state

Step 3

Check pod status with the <terminal inline>kubectl describe pod nginx<terminal inline> command, and you’ll see the error message: <terminal inline>failed to find plugin "bridg" in path [/opt/cni/bin]<terminal inline>.

FailedCreatePodSandBox
FailedCreatePodSandBox

Debugging and Resolution

In this scenario, the CNI pod is running on the node, but you’re still facing the same issue. The next logical step is to verify the CNI configuration files. You can check the configuration files of other nodes of the cluster and verify if those files are similar to the ones in the problematic node. If you find any issue with the configuration files, copy the configuration files from the other nodes to that node and then try recreating the pod.

Scenario 3: Insufficient Number of Available IP Addresses

You’ll use an AWS EKS cluster to demonstrate this issue since it’s easy to simulate this on a cloud environment. Here, the VPC where your EKS cluster is running has a limited number of IP addresses available.

VPC
VPC

Step 1

Connect to the EKS cluster and create some pods using the following command:


for i in {1..15}
do
          kubectl run pod$i --image=nginx
done

Step 2

Check the status of your pods. You’ll see that all but one pod (in this example, pod15) is running.

Nginx pods
Nginx pods

Step 3

If you check the status of the pod in question by running the command <terminal inline>kubectl describe pod pod15<terminal inline>, you’ll see it wasn’t able to get an IP address.

Failed to assign IP address
Failed to assign IP address

Debugging and Resolution

This problem happens when the VPC subnet runs out of available IP addresses. Here, you are checking from the AWS VPC console, and none of the subnets where your nodes are running has an available IP.

IP addresses not available
IP addresses not available


To fix this issue, you can either scale down some workloads by running fewer pods, or you can create new subnets with wider CIDR ranges to run the EKS cluster. The best solution is to always plan the VPC and subnet CIDR range ahead so that there are enough IP addresses available for the predicted workload.

Scenario 4: Suboptimal System Resource Limits Configuration

As noted earlier, the networking issue isn’t the only root cause of this error—suboptimally configured system resource limits may be another reason. To demonstrate this, switch back to your kubeadm cluster.

Step 1

If you run the <terminal inline>kubectl run nginx --image=nginx<terminal inline> command, you’ll see it’s running fine.

Step 2

Connect to the node and run <terminal inline>echo 32 > /proc/sys/fs/file-max<terminal inline>. This will limit the maximum number of open files on the Linux host.

Step 3

Run the <terminal inline>kubectl run nginx2 --image=nginx<terminal inline> command and check the pod status. It will be stuck in the <terminal inline>ContainerCreating<terminal inline> state.

Nginx not running
Nginx not running


If you check the pod status with <terminal inline>kubectl describe pod nginx2<terminal inline>, you’ll see the pod encountered some issues when creating new shim sockets.

Error creating shim socket
Error creating shim socket

Debugging and Resolution

If you see socket-related errors in pod status messages, you can rule out the possibility of networking-related issues and check the system resource limits configured on the node. Make sure the maximum number of open files and processes is configured appropriately. If the number is too low, raise it to a sufficient value, and if there are some zombie processes, terminate them.

In this scenario, if you increase the open file limit by running the command <terminal inline>echo 4096 > /proc/sys/fs/file-max<terminal inline>, your pod will be running fine.

Final Thoughts

As you’ve seen, there are several different scenarios that can cause the <terminal inline>FailedCreatePodSandBox<terminal inline> error when you attempt to create a pod. You’ve also seen how misconfigured networking often plays a role in this error, although it may not be the only culprit.

In general, if you see the <terminal inline>FailedCreatePodSandBox<terminal inline> error, first check if CNI is working on the node and if all the CNI configuration files are correct. You should also verify that the system resource limits are properly set. If the cluster is running on a cloud environment, check if the network subnets have a sufficient number of IP addresses available. These details should help you solve this issue and avoid it in the future.

Start your free 14-day ContainIQ trial
Start Free TrialBook a Demo
No card required
Vinayak Pandey
Senior Systems Engineer

Vinayak is an experienced cloud consultant with a knack of automation, currently working with Cognizant Singapore. Before that, Vinayak worked as a Senior Systems Engineer at Singapore Airlines. He has a Bachelor of Technology in Computer Science & Engineering from SRMS.

READ MORE