Start your free 14-day ContainIQ trial

Kubernetes Startup Probe | Practical Guide

April 18, 2022

Startup probes can prevent problems caused by long startup times. In this post, you’ll learn about startup probes and how to use them to ensure that your application is working as expected.

James Walker
Software Engineer

Kubernetes probes are a mechanism for providing the Kubernetes control plane with information about the internal state of your applications. They let your cluster identify running pods that are in an unhealthy state.

Startup probes detect when a container’s workload has launched and the container is ready to use. Kubernetes relies on this information when determining whether a container can be targeted by liveness and readiness probes.

It’s important to add startup probes in conjunction with these other probe types. Otherwise, a container could be targeted by a liveness or readiness probe before it’s able to handle the request. This would cause the container to be restarted and flagged as unhealthy, when it was actually healthy but still initializing.

In this article, you’ll learn how to use a startup probe to prevent this scenario occurring. We’ll also cover some of the common pitfalls and sticking points associated with these probes.

Kubernetes Probe Types

Kubernetes has three basic probe types:

  • Liveness probes: These detect whether a pod is healthy by running a command or making a network request inside the container. Containers that fail the check are restarted.
  • Readiness probes: Readiness probes identify when a container is able to handle external traffic received from a service. Containers don’t become part of their services until they pass a readiness probe.
  • Startup probes: Startup probes provide a way to defer the execution of liveness and readiness probes until a container indicates it’s able to handle them. Kubernetes won’t direct the other probe types to a container if it has a startup probe that hasn’t yet succeeded.

In this article, you’ll be focusing on startup probes. As their role is to prevent other probes from running, you’ll always be using them alongside liveness and readiness probes. A startup probe doesn’t alter your workload’s behavior on its own.

Startup probes should be used when the application in your container could take a significant amount of time to reach its normal operating state. Applications that would crash or throw an error if they handled a liveness or readiness probe during startup need to be protected by a startup probe. This ensures the container doesn’t enter a restart loop due to failing healthiness checks before it’s finished launching.

Configuring Startup Probes

Startup probes support the four basic Kubernetes probing mechanisms:

  • Exec: Executes a command within the container. The probe succeeds if the command exits with a 0 code.
  • HTTP: Makes an HTTP call to a URL within the container. The probe succeeds if the container issues an HTTP response in the 200-399 range.
  • TCP: The probe succeeds if a specific container port is accepting traffic.
  • gRPC: Makes a gRPC health checking request to a port inside the container and uses its result to determine whether the probe succeeded.

All these mechanisms share some basic parameters that control the probe’s success criteria and how frequently it’s checked:

  • <terminal inline bold>initialDelaySeconds<terminal inline bold>: Set a delay between the time the container starts and the first time the probe is executed. Defaults to zero seconds.
  • <terminal inline bold>periodSeconds<terminal inline bold>: Defines how frequently the probe will be executed after the initial delay. Defaults to ten seconds.
  • <terminal inline bold>timeoutSeconds<terminal inline bold>: Each probe will time out and be marked as failed after this many seconds. Defaults to one second.
  • <terminal inline bold>failureThreshold<terminal inline bold>: Instructs Kubernetes to retry the probe this many times after a failure is first recorded. The container will only be restarted if the retries also fail. Defaults to three.

Effective configuration of a startup probe relies on these values being set correctly.

Creating a Startup Probe

Startup probes are created by adding a <terminal inline>startupProbe<terminal inline> field within the <terminal inline>spec.containers<terminal inline> portion of a pod’s manifest. Here’s a simple example of a startup probe using the <terminal inline>exec<terminal inline> mechanism. It runs a command inside the container:


apiVersion: v1
kind: Pod
metadata:
  name: startup-probe-demo
spec:
  containers:
  - name: startup-probe-demo
    image: busybox:latest
    args:
    - /bin/sh
    - -c
    - sleep 300
    startupProbe:
    exec:
        command:
        - cat
        - /etc/hostname
    periodSeconds: 10
    failureThreshold: 10

Add the pod to your cluster using kubectl:


$ kubectl apply -f startup-probe-demo.yml

The container will start and run normally. You can verify this by viewing its details in kubectl:

$ kubectl describe pod startup-probe-demo


<omitted>

Events:

TYPE REASON AGE FROM MESSAGE
---- ------ --- ---- ------
Normal Scheduled 9s default-scheduler Successfully assigned default/startup-probe-demo to default
Normal Pulling 8s kubelet Pulling image "busybox:latest"
Normal Pulled 7s kubelet Successfully pulled image "busybox:latest" in 860.669288ms
Normal Created 7s kubelet Created container startup-probe-demo
Normal Started 7s kubelet Started container startup-probe-demo

The probe in the example above uses the presence of the <terminal inline>/etc/hostname<terminal inline> file to determine whether the container has started. As this file exists inside the container, the startup probe will succeed without logging any events.

The values of <terminal inline>periodSeconds<terminal inline> and <terminal inline>failureThreshold<terminal inline> need to be adjusted to suit your own application. Together, they should cover the container’s maximum permitted startup time. In the example above, a <terminal inline>periodSeconds<terminal inline> of <terminal inline>10<terminal inline> and a <terminal inline>failureThreshold<terminal inline> of <terminal inline>10<terminal inline> means the container will have up to a hundred seconds in which to start—up to ten checks with ten seconds between them. The container will be restarted if the probe still doesn’t succeed after this time.

You can use the other config parameters to further tune your probe. If you know a container has a minimum startup time, setting <terminal inline>initialDelaySeconds<terminal inline> will prevent it from being probed immediately after creation, when you know the check will fail.

Adjusting and Troubleshooting Probes

Here’s an example of a pod with a startup probe that will fail:


apiVersion: v1
kind: Pod
metadata:
  name: startup-probe-demo
spec:
  containers:
  - name: startup-probe-demo
    image: busybox:latest
    args:
    - /bin/sh
    - -c
    - sleep 300
    startupProbe:
    exec:
        command:
        - cat
        - /etc/foobar
    periodSeconds: 10
    failureThreshold: 10

In this case, the probe looks at <terminal inline>/etc/foobar<terminal inline>, which doesn’t exist in the container. The probe will run every ten seconds, as specified by the value of <terminal inline>periodSeconds<terminal inline>. Up to ten attempts will be made, as allowed by <terminal inline>failureThreshold<terminal inline>. If the container creates <terminal inline>/etc/foobar<terminal inline> before the last attempt, the probe will succeed, and Kubernetes will begin to direct liveness and readiness probes to the container. Otherwise, the startup probe will be marked as failed, and the container will be killed.

You can inspect failing startup probes by retrieving the pod’s events with kubectl:

$ kubectl describe pod startup-probe-demo


<omitted>

Events:

TYPE REASON AGE FROM MESSAGE
---- ----- --- ---- -------
Normal Scheduled 2m42s default-scheduler Successfully assigned default/startup-probe-demo to default
Normal Pulling 2m41s kubelet Pulling image "busybox:latest"
Normal Pulled 2m40s kubelet Successfully pulled image "busybox:latest" in 860.669288ms
Normal Created 2m40s kubelet Created container startup-probe-demo
Normal Started 2m40s kubelet Started container startu-probe-demo
Warning Unhealthy 61s (x10 over 2m31s) kubelet Startup probe failed: cat: can't open '/etc/foobar': No such file or directory
Normal Pulling 60s kubelet Pulling image "busybox:latest"
Normal Killing 59s kubelet Container startup-probe-demo failed startup probe, will be restarted

This event log shows that the startup probe failed because of the missing <terminal inline>/etc/foobar<terminal inline> file. After ten attempts, the container’s status changed to <terminal inline>Killing<terminal inline>, and a restart was scheduled. Looking for <terminal inline>failed startup probe<terminal inline> lines in your pod’s logs will help you find containers that have been restarted for this reason.

HTTP Probes

HTTP probes are created in a similar manner to exec commands. They’re considered failed when the issued response lies outside the 200-399 status range. Nest an <terminal inline>httpGet<terminal inline> field instead of <terminal inline>exec<terminal inline> in your <terminal inline>startupProbe<terminal inline> definition:


apiVersion: v1
kind: Pod
metadata:
  name: startup-probe-demo
spec:
  containers:
  - name: startup-probe-demo
    image: nginx:latest
    startupProbe:
    httpGet:
        path: /
        port: 80
    periodSeconds: 10
    failureThreshold: 10

The <terminal inline>startupProbe.httpGet<terminal inline> field supports optional <terminal inline>host<terminal inline>, <terminal inline>scheme<terminal inline>, <terminal inline>path<terminal inline>, and <terminal inline>httpHeaders<terminal inline> fields to customize the request that’s made. The <terminal inline>host<terminal inline> defaults to the pod’s internal IP address; the default scheme is <terminal inline>http<terminal inline>. The following pod manifest includes a startup probe that makes an HTTPS request with a custom header:


apiVersion: v1
kind: Pod
metadata:
  name: startup-probe-demo
spec:
  containers:
  - name: startup-probe-demo
    image: nginx:latest
    startupProbe:
    httpGet:
        path: /
        port: 80
        scheme: HTTPS
        httpHeaders:
        - name: X-Client-Identity
            value: Kubernetes-Startup-Probe

Apply the pod to your cluster with kubectl:


kubectl apply -f pod.yaml

Now retrieve the pod’s events to check whether the probe’s succeeded:

$ kubectl describe pod startup-probe-demo


<omitted>

Events:

TYPE REASON AGE FROM MESSAGE
---- ----- --- ---- -------
Normal Scheduled 12s default-scheduler Successfully assigned default/startup-probe-demo to default
Normal Pulling 11s kubelet Pulling image "nginx:latest"
Normal Pulled 10s kubelet Successfully pulled image "nginx:latest" in 797.884311ms
Normal Created 10s kubelet Created container startup-probe-demo
Normal Started 10s kubelet Started container startu-probe-demo
Warning Unhealthy 8s kubelet Startup probe failed: Get "https://10.244.0.163/": http: server gave HTTP response to HTTPS client

This example leaves the pod in an unhealthy state because the startup probe fails. The NGINX image is not configured to support HTTPS by default, so the probe received an invalid response.

TCP Probes

TCP probes try to open a socket to your container on a specified port. Add a <terminal inline>tcpSocket.port<terminal inline> field to your <terminal inline>startupProbe<terminal inline> configuration to use this probe type:


apiVersion: v1
kind: Pod
metadata:
  name: startup-probe-demo
spec:
  containers:
  - name: startup-probe-demo
    image: nginx:latest
    ports:
    - containerPort: 80
    startupProbe:
    tcpSocket:
        port: 80
    periodSeconds: 10
    failureThreshold: 10

The probe will be considered failed if the socket can’t be opened.

gRPC Probes

gRPC probes are available with Kubernetes v1.23 when the <terminal inline>GRPCContainerProbe<terminal inline> feature gate is enabled. Add a <terminal inline>grpc.port<terminal inline> field to your pod’s <terminal inline>startupProbe<terminal inline> to define where health checks should be directed to:


apiVersion: v1
kind: Pod
metadata:
  name: startup-probe-demo
spec:
  containers:
  - name: startup-probe-demo
    image: k8s.gcr.io/etcd:3.5.1-0
    command: ["/usr/local/bin/etcd", "--data-dir",  "/var/lib/etcd", "--listen-client-urls", "http://0.0.0.0:2379", "--advertise-client-urls", "http://127.0.0.1:2379"]
    ports:
    - containerPort: 2379
    startupProbe:
    grpc:
        port: 2379
    periodSeconds: 10
    failureThreshold: 10

The <terminal inline>etcd<terminal inline> container image is used here as an example of a gRPC-compatible service. Kubernetes will send gRPC health check requests to port 2379 in the container. The startup probe will be marked as failed if the container issues an unhealthy response.

Common Problems

Misconfigured startup probes can easily lead to restart loops. You must pay attention to your probe’s configuration to make sure it’s suited to your application.

If your container takes longer to start than the window offered by the probe’s <terminal inline>periodSeconds<terminal inline> and <terminal inline>failureThreshold<terminal inline>, it’ll be restarted before the probe completes. The replacement container won’t start in time either, creating an endless loop of restarts that prevents your workload from becoming operational. You should measure your application’s typical startup time and use that to determine your <terminal inline>periodSeconds<terminal inline>, <terminal inline>failureThreshold<terminal inline>, and <terminal inline>initialDelaySeconds<terminal inline> values.

Conversely, another common issue is startup probes that are too conservative, leading to excessive delays in new containers becoming available. You can avoid this by using a short <terminal inline>periodSeconds<terminal inline> in conjunction with a very high <terminal inline>failureThreshold<terminal inline>. This will let Kubernetes rapidly poll your container’s status, ensuring its startup is noticed with minimal delay, while avoiding premature failure due to the threshold being reached.

Should Startup Probes Match Liveness/Readiness Probes?

It’s often effective to configure startup probes with the same command or HTTP request as your liveness and readiness probes. By using this technique, you can guarantee that liveness and readiness probes will succeed once Kubernetes begins directing them to the container.

Depending on your application’s implementation, using a different command or request could create a situation where the startup probe succeeds, but subsequent probes still can’t be handled correctly. This can be confusing to debug. Mirroring liveness and readiness probe actions in your startup probe helps ensure reliability; failures in the action during the startup phase won’t have any negative effects, provided a success occurs before the startup probe’s <terminal inline>failureThreshold<terminal inline> is reached.

Final Thoughts

Startup probes let your containers inform Kubernetes when they’ve started up and are ready to be assessed for liveness and readiness. It’s good practice to add a startup probe wherever you’re using liveness and readiness probes, as otherwise, containers may get restarted before they’ve finished initializing.

In this guide, we’ve explored the use cases for startup probes and shown how you can create and troubleshoot them. You can also use ContainIQ to monitor your pods and their startup probes in real-time, offering a convenient way to visualize your applications and their healthiness. This can offer a simpler and more accessible management experience than the kubectl CLI.

Start your free 14-day ContainIQ trial
Start Free TrialBook a Demo
James Walker
Software Engineer

James Walker is the founder of Heron Web, a UK-based digital agency providing bespoke software development services to SMEs. He has experience managing complete end-to-end web development workflows with DevOps, CI/CD, Docker, and Kubernetes. James also writes technical articles on programming and the software development lifecycle, using the insights acquired from his industry career. He's currently a regular contributor to CloudSavvy IT and has previously written for DigitalJournal.com, OnMSFT.com, and other technology-oriented publications.

READ MORE