Kubernetes probes are a mechanism for providing the Kubernetes control plane with information about the internal state of your applications. They let your cluster identify running pods that are in an unhealthy state.
Startup probes detect when a container’s workload has launched and the container is ready to use. Kubernetes relies on this information when determining whether a container can be targeted by liveness and readiness probes.
It’s important to add startup probes in conjunction with these other probe types. Otherwise, a container could be targeted by a liveness or readiness probe before it’s able to handle the request. This would cause the container to be restarted and flagged as unhealthy, when it was actually healthy but still initializing.
In this article, you’ll learn how to use a startup probe to prevent this scenario occurring. We’ll also cover some of the common pitfalls and sticking points associated with these probes.
Kubernetes Probe Types
Kubernetes has three basic probe types:
- Liveness probes: Liveness probes detect whether a pod is healthy by running a command or making a network request inside the container. Containers that fail the check are restarted.
- Readiness probes: Readiness probes identify when a container is able to handle external traffic received from a service. Containers don’t become part of their services until they pass a readiness probe.
- Startup probes: Startup probes provide a way to defer the execution of liveness and readiness probes until a container indicates it’s able to handle them. Kubernetes won’t direct the other probe types to a container if it has a startup probe that hasn’t yet succeeded.
In this article, you’ll be focusing on startup probes. As their role is to prevent other probes from running, you’ll always be using them alongside liveness and readiness probes. A startup probe doesn’t alter your workload’s behavior on its own.
Startup probes should be used when the application in your container could take a significant amount of time to reach its normal operating state. Applications that would crash or throw an error if they handled a liveness or readiness probe during startup need to be protected by a startup probe. This ensures the container doesn’t enter a restart loop due to failing healthiness checks before it’s finished launching.
Configuring Startup Probes
Startup probes support the four basic Kubernetes probing mechanisms:
- Exec: Executes a command within the container. The probe succeeds if the command exits with a 0 code.
- HTTP: Makes an HTTP call to a URL within the container. The probe succeeds if the container issues an HTTP response in the 200-399 range.
- TCP: The probe succeeds if a specific container port is accepting traffic.
- gRPC: Makes a gRPC health checking request to a port inside the container and uses its result to determine whether the probe succeeded.
All these mechanisms share some basic parameters that control the probe’s success criteria and how frequently it’s checked:
- <terminal inline bold>initialDelaySeconds<terminal inline bold>: Set a delay between the time the container starts and the first time the probe is executed. Defaults to zero seconds.
- <terminal inline bold>periodSeconds<terminal inline bold>: Defines how frequently the probe will be executed after the initial delay. Defaults to ten seconds.
- <terminal inline bold>timeoutSeconds<terminal inline bold>: Each probe will time out and be marked as failed after this many seconds. Defaults to one second.
- <terminal inline bold>failureThreshold<terminal inline bold>: Instructs Kubernetes to retry the probe this many times after a failure is first recorded. The container will only be restarted if the retries also fail. Defaults to three.
Effective configuration of a startup probe relies on these values being set correctly.
Creating a Startup Probe
Startup probes are created by adding a <terminal inline>startupProbe<terminal inline> field within the <terminal inline>spec.containers<terminal inline> portion of a pod’s manifest. Here’s a simple example of a startup probe using the <terminal inline>exec<terminal inline> mechanism. It runs a command inside the container:
Add the pod to your cluster using kubectl:
The container will start and run normally. You can verify this by viewing its details in kubectl:
The probe in the example above uses the presence of the <terminal inline>/etc/hostname<terminal inline> file to determine whether the container has started. As this file exists inside the container, the startup probe will succeed without logging any events.
The values of <terminal inline>periodSeconds<terminal inline> and <terminal inline>failureThreshold<terminal inline> need to be adjusted to suit your own application. Together, they should cover the container’s maximum permitted startup time. In the example above, a <terminal inline>periodSeconds<terminal inline> of <terminal inline>10<terminal inline> and a <terminal inline>failureThreshold<terminal inline> of <terminal inline>10<terminal inline> means the container will have up to a hundred seconds in which to start—up to ten checks with ten seconds between them. The container will be restarted if the probe still doesn’t succeed after this time.
You can use the other config parameters to further tune your probe. If you know a container has a minimum startup time, setting <terminal inline>initialDelaySeconds<terminal inline> will prevent it from being probed immediately after creation, when you know the check will fail.
Adjusting and Troubleshooting Probes
Here’s an example of a pod with a startup probe that will fail:
In this case, the probe looks at <terminal inline>/etc/foobar<terminal inline>, which doesn’t exist in the container. The probe will run every ten seconds, as specified by the value of <terminal inline>periodSeconds<terminal inline>. Up to ten attempts will be made, as allowed by <terminal inline>failureThreshold<terminal inline>. If the container creates <terminal inline>/etc/foobar<terminal inline> before the last attempt, the probe will succeed, and Kubernetes will begin to direct liveness and readiness probes to the container. Otherwise, the startup probe will be marked as failed, and the container will be killed.
You can inspect failing startup probes by retrieving the pod’s events with kubectl:
This event log shows that the startup probe failed because of the missing <terminal inline>/etc/foobar<terminal inline> file. After ten attempts, the container’s status changed to <terminal inline>Killing<terminal inline>, and a restart was scheduled. Looking for <terminal inline>failed startup probe<terminal inline> lines in your pod’s logs will help you find containers that have been restarted for this reason.
HTTP probes are created in a similar manner to exec commands. They’re considered failed when the issued response lies outside the 200-399 status range. Nest an <terminal inline>httpGet<terminal inline> field instead of <terminal inline>exec<terminal inline> in your <terminal inline>startupProbe<terminal inline> definition:
The <terminal inline>startupProbe.httpGet<terminal inline> field supports optional <terminal inline>host<terminal inline>, <terminal inline>scheme<terminal inline>, <terminal inline>path<terminal inline>, and <terminal inline>httpHeaders<terminal inline> fields to customize the request that’s made. The <terminal inline>host<terminal inline> defaults to the pod’s internal IP address; the default scheme is <terminal inline>http<terminal inline>. The following pod manifest includes a startup probe that makes an HTTPS request with a custom header:
Apply the pod to your cluster with kubectl:
Now retrieve the pod’s events to check whether the probe’s succeeded:
This example leaves the pod in an unhealthy state because the startup probe fails. The NGINX image is not configured to support HTTPS by default, so the probe received an invalid response.
TCP probes try to open a socket to your container on a specified port. Add a <terminal inline>tcpSocket.port<terminal inline> field to your <terminal inline>startupProbe<terminal inline> configuration to use this probe type:
The probe will be considered failed if the socket can’t be opened.
gRPC probes are available with Kubernetes v1.23 when the <terminal inline>GRPCContainerProbe<terminal inline> feature gate is enabled. Add a <terminal inline>grpc.port<terminal inline> field to your pod’s <terminal inline>startupProbe<terminal inline> to define where health checks should be directed to:
The <terminal inline>etcd<terminal inline> container image is used here as an example of a gRPC-compatible service. Kubernetes will send gRPC health check requests to port 2379 in the container. The startup probe will be marked as failed if the container issues an unhealthy response.
Misconfigured startup probes can easily lead to restart loops. You must pay attention to your probe’s configuration to make sure it’s suited to your application.
If your container takes longer to start than the window offered by the probe’s <terminal inline>periodSeconds<terminal inline> and <terminal inline>failureThreshold<terminal inline>, it’ll be restarted before the probe completes. The replacement container won’t start in time either, creating an endless loop of restarts that prevents your workload from becoming operational. You should measure your application’s typical startup time and use that to determine your <terminal inline>periodSeconds<terminal inline>, <terminal inline>failureThreshold<terminal inline>, and <terminal inline>initialDelaySeconds<terminal inline> values.
Conversely, another common issue is startup probes that are too conservative, leading to excessive delays in new containers becoming available. You can avoid this by using a short <terminal inline>periodSeconds<terminal inline> in conjunction with a very high <terminal inline>failureThreshold<terminal inline>. This will let Kubernetes rapidly poll your container’s status, ensuring its startup is noticed with minimal delay, while avoiding premature failure due to the threshold being reached.
Should Startup Probes Match Liveness/Readiness Probes?
It’s often effective to configure startup probes with the same command or HTTP request as your liveness and readiness probes. By using this technique, you can guarantee that liveness and readiness probes will succeed once Kubernetes begins directing them to the container.
Depending on your application’s implementation, using a different command or request could create a situation where the startup probe succeeds, but subsequent probes still can’t be handled correctly. This can be confusing to debug. Mirroring liveness and readiness probe actions in your startup probe helps ensure reliability; failures in the action during the startup phase won’t have any negative effects, provided a success occurs before the startup probe’s <terminal inline>failureThreshold<terminal inline> is reached.
Startup probes let your containers inform Kubernetes when they’ve started up and are ready to be assessed for liveness and readiness. It’s good practice to add a startup probe wherever you’re using liveness and readiness probes, as otherwise, containers may get restarted before they’ve finished initializing.
In this guide, we’ve explored the use cases for startup probes and shown how you can create and troubleshoot them. You can also use ContainIQ to monitor your pods and their startup probes in real-time, offering a convenient way to visualize your applications and their healthiness. This can offer a simpler and more accessible management experience than the kubectl CLI.