Start your free 14-day ContainIQ trial

Kubernetes Liveness Probe | Practical Guide

June 27, 2022

It’s often helpful to check if your Kubernetes application responds to requests in a healthy manner. In this post, you’ll learn about liveness probes, including when and how to use them.

James Walker
Software Engineer

Kubernetes liveness probes determine whether your pods are running normally. Setting up these probes helps you check whether your workloads are healthy. They can identify application instances that have entered a failed state, even when the pod that contains the instance appears to be operational.

Kubernetes automatically monitors your pods and restarts them when failures are detected. This handles issues where your application crashes, terminating its process and emitting a non-zero exit code. Not all issues exhibit this behavior, though. Your application could have lost its database connection, or be experiencing timeouts while communicating with a third-party service. In these situations, the pod will look from the outside like it’s running, but users won’t be able to access the application within.

Liveness probes are a mechanism for indicating your application’s internal health to the Kubernetes control plane. Kubernetes uses liveness probes to detect issues within your pods. When a liveness check fails, Kubernetes restarts the container in an attempt to restore your service to an operational state.

In this article, you’ll explore when liveness probes should be used, how you can create them, and some best practices to be aware of as you add probes to your cluster.

Why Do Liveness Probes Matter?

Liveness probes enhance Kubernetes’ ability to manage workloads on your behalf. Without probes, you need to manually monitor your pods to distinguish which application instances are healthy and which are not. This becomes time-consuming and error-prone when you’re working with hundreds or thousands of pods.

Allowing unhealthy pods to continue without detection degrades your service’s stability over time. Pods that are silently failing as they age, perhaps due to race conditions, deadlocks, or corrupted caches, will gradually reduce your service’s capacity to handle new requests. Eventually, your entire pod fleet could be affected, even though all the containers report as running.

Debugging this kind of issue is often confusing and inefficient. Since all your dashboards show your pods as operational, it’s easy to turn down the wrong diagnostic path. Using liveness probes to communicate information about pods’ internal states to Kubernetes lets your cluster handle problems for you, reducing the maintenance burden and ensuring you always have serviceable pods available.

Types of Liveness Probes

There are four basic types of liveness probes:

  • Exec: The probe runs a command inside the container. The probe is considered successful if the command terminates with <terminal inline>0<terminal inline> as its exit code.
  • HTTP: The probe makes an HTTP <terminal inline>GET<terminal inline> request against a URL in the container. The probe is successful when the container’s response has an HTTP status code in the 200-399 range.
  • TCP: The probe tries to connect to a specific TCP port inside the container; if the port is open, the probe is deemed successful.
  • gRPC: gRPC health-checking probes are supported for applications that use gRPC. This type of probe is available in alpha as of Kubernetes v1.23.

The probe types share five basic parameters for configuring the frequency and success criteria of the checks:

  • <terminal inline bold>initialDelaySeconds<terminal inline bold>: Set a delay between the time the container starts and the first time the probe is executed. Defaults to zero seconds.
  • <terminal inline bold>periodSeconds<terminal inline bold>: Defines how frequently the probe will be executed after the initial delay. Defaults to ten seconds.
  • <terminal inline bold>timeoutSeconds<terminal inline bold>: Each probe will time out and be marked as failed after this many seconds. Defaults to one second.
  • <terminal inline bold>failureThreshold<terminal inline bold>: Instructs Kubernetes to retry the probe this many times after a failure is first recorded. The container will only be restarted if the retries also fail. Defaults to three.
  • <terminal inline bold>successThreshold<terminal inline bold>: This sets the criteria for reverting an unhealthy container to a healthy state. It means the container must successfully pass this number of consecutive liveness checks before it’s deemed healthy again. Defaults to one.

Successful liveness probes have no impact on your cluster. The targeted container will keep running, and a new probe will be scheduled to run after the configured <terminal inline>periodSeconds<terminal inline> delay. A failed probe will trigger a restart of the container, as it’s expected that the fresh instance will be healthy.

Creating Liveness Probes

Liveness probes are defined by a pod’s <terminal inline>spec.containers.livenessProbe<terminal inline> field. Here’s a simple example of an <terminal inline>exec<terminal inline> (command) type liveness probe:


apiVersion: v1
kind: Pod
metadata:
  name: liveness-probe-demo
spec:
  containers:
  - name: liveness-probe-demo
    image: busybox:latest
    args:
    - /bin/sh
    - -c
    - touch /healthcheck; sleep 30; rm -rf /healthcheck; sleep 300
    livenessProbe:
    exec:
        command:
        - cat
        - /healthcheck
    initialDelaySeconds: 5
    periodSeconds: 15
    failureThreshold: 1

This pod’s containers have a liveness probe that has an initial delay of five seconds, then reads the content of the <terminal inline>/healthcheck<terminal inline> file every fifteen seconds. The container is configured to create the <terminal inline>/healthcheck<terminal inline> file when it starts up; it then removes the file after thirty seconds have elapsed. The liveness probe’s <terminal inline>cat<terminal inline> command will begin to issue non-zero status codes at this point, causing subsequent probes to be marked as failed.

Apply the YAML manifest to your cluster:


$ kubectl apply -f liveness-pod-demo.yml

Now inspect the events of the pod you’ve created:

$ kubectl describe pod liveness-probe-demo


<omitted>

Events:

TYPE REASON AGE FROM MESSAGE
---- ------ --- ---- -------
Normal Scheduled 30s default-scheduler Successfully assigned liveness-probe-demo...
Normal Pulling 29s kubelet Pulling image "busybox:latest"
Normal Pulled 28s kubelet Successfully pulled image "busybox:latest" in 1.1243596453s
Normal Created 28s kubelet Created container liveness-probe-demo
Normal Started 28s kubelet Started container liveness-probe-demo

Everything looks good! The container was created and started successfully. There’s no sign of any failed liveness probes.

Now wait for thirty seconds before retrieving the events again:

$ kubectl describe pod liveness-probe-demo


<omitted>

Events:

TYPE REASON AGE FROM MESSAGE
---- ------ --- ---- -------
Normal Scheduled 70s default-scheduler Successfully assigned liveness-probe-demo...
Normal Pulling 69s kubelet Pulling image "busybox:latest"
Normal Pulled 68s kubelet Successfully pulled image "busybox:latest" in 1.1243596453s
Normal Created 68s kubelet Created container liveness-probe-demo
Normal Started 68s kubelet Started container liveness-probe-demo
Normal Started 10s kubelet Liveness probe failed: cat: can't open '/healthcheck': No such file or directory
Normal Killing 10s kubelet Container liveness-probe-demo failed liveness probe, will be restarted

The event log now shows that the liveness probe began to fail after the container deleted its <terminal inline>/healthcheck<terminal inline> file. The event reveals the output from the liveness probe’s command. If you used a different probe type, such as HTTP or TCP, you’d see relevant information such as the HTTP status code instead.

HTTP Probes

HTTP probes are created in a similar manner to exec commands. Nest an <terminal inline>httpGet<terminal inline> field instead of <terminal inline>exec<terminal inline> in your <terminal inline>livenessProbe<terminal inline> definition:


apiVersion: v1
kind: Pod
metadata:
  name: liveness-probe-demo
spec:
  containers:
  - name: liveness-probe-demo
    image: k8s.gcr.io/liveness
    args:
    - /server
    livenessProbe:
    httpGet:
        path: /healthz
        port: 8080
    initialDelaySeconds: 5
    periodSeconds: 15
    failureThreshold: 1

This probe sends an HTTP <terminal inline>GET<terminal inline> request to <terminal inline>/healthz<terminal inline> on the container’s port <terminal inline>8080<terminal inline> every fifteen seconds. The image used is a minimal HTTP server provided by Kubernetes as an example liveness check provider. The server issues a successful response with a <terminal inline>200<terminal inline> status code for the first ten seconds of its life. After that point, it will return a <terminal inline>500<terminal inline>, failing the liveness probe and causing the container to restart.

The <terminal inline>livenessProbe.httpGet<terminal inline> field supports optional <terminal inline>host<terminal inline>, <terminal inline>scheme<terminal inline>, <terminal inline>path<terminal inline>, and <terminal inline>httpHeaders<terminal inline> fields to customize the request that’s made. The <terminal inline>host<terminal inline> defaults to the pod’s internal IP address; the scheme is <terminal inline>http<terminal inline>. The following snippet sets up a probe to make an HTTP request with a custom header:


httpGet:
  path: /healthz
  port: 8080
  scheme: https
  httpHeaders:
    - name: X-Client-Identity
    value: Kubernetes-Liveness-Probe

TCP Probes

TCP probes try to open a socket to your container on a specified port. Add a <terminal inline>tcpSocket.port<terminal inline> field to your <terminal inline>livenessProbe<terminal inline> configuration to use this probe type:


apiVersion: v1
kind: Pod
metadata:
  name: liveness-probe-demo
spec:
  containers:
  - name: liveness-probe-demo
    image: k8s.gcr.io/goproxy:0.1
    ports:
    - containerPort: 8080
    livenessProbe:
    tcpSocket:
        port: 8080
    initialDelaySeconds: 5
    periodSeconds: 15
    failureThreshold: 1

The probe will be considered failed if the socket can’t be opened.

gRPC Probes

gRPC probes are the newest type of probe. The implementation is similar to the grpc-health-probe utility, which was commonly used before Kubernetes integrated the functionality.

To use a gRPC probe, ensure you’re on Kubernetes v1.23 and have the <terminal inline>GRPCContainerProbe<terminal inline> feature gate enabled. Add a <terminal inline>grpc.port<terminal inline> field to your pod’s <terminal inline>livenessProbe<terminal inline> to define where health checks should be directed to:


apiVersion: v1
kind: Pod
metadata:
  name: liveness-probe-demo
spec:
  containers:
  - name: liveness-probe-demo
    image: k8s.gcr.io/etcd:3.5.1-0
    command: ["/usr/local/bin/etcd", "--data-dir",  "/var/lib/etcd", "--listen-client-urls", "http://0.0.0.0:2379", "--advertise-client-urls", "http://127.0.0.1:2379"]
    ports:
    - containerPort: 2379
    livenessProbe:
    grpc:
        port: 2379
    initialDelaySeconds: 5
    periodSeconds: 15
    failureThreshold: 1

The <terminal inline>etcd<terminal inline> container image is used here as an example of a gRPC-compatible service. Kubernetes will send gRPC health check requests to port 2379 in the container. The liveness probe will be marked as failed when the container issues an unhealthy response. The probe is also considered failed if the service doesn’t implement the gRPC health checking protocol.

K8s Metrics, Logging, and Tracing
Monitor the health of your cluster and troubleshoot issues faster with pre-built dashboards that just work.
Start Free Trial Book a Demo

Best Practices for Effective Probes

Liveness probes have some pitfalls that you need to watch out for. Foremost among these are the impact misconfigured probes can have on your application. A probe that’s run too frequently wastes resources and impedes performance; conversely, probing infrequency can let containers sit in an unhealthy state for too long.

The <terminal inline>periodSeconds<terminal inline>, <terminal inline>timeoutSeconds<terminal inline>, and success and failure threshold parameters should be used to tune your probes to your application. Pay attention to how long your probe’s command, API request, or gRPC call takes to complete. Use this value with a small buffer period as your <terminal inline>timeoutSeconds<terminal inline>. The <terminal inline>periodSeconds<terminal inline> needs to be bespoke to your environment; a good rule of thumb is to use the smallest value possible for simple, short-running probes. More intensive commands may need to wait longer between repetitions.

Probes themselves should be as lightweight as possible. To ensure that your checks can execute quickly and efficiently, avoid using expensive operations within your probes. The target of your probe’s command or HTTP request should be independent of your main application, so it can run to completion even during failure conditions. A probe that’s served by your standard application entry point could lead to inaccurate results if its framework fails to start or a required external dependency is unavailable.

Here are a few other best practices to keep in mind:

  • Probes are affected by restart policies. Container restart policies are applied after probes. This means your containers need <terminal inline>restartPolicy: Always<terminal inline> (the default) or <terminal inline>restartPolicy: OnFailure<terminal inline> so Kubernetes can restart them after a failed probe. Using the Never policy will keep the container in the failed state.
  • Probes should be consistent in their execution. You should be able to approximate the execution time of your probes, so you can configure their period, delay, and timeout correctly. Observe your real-world workloads instead of using the defaults that Kubernetes provides.
  • Not every container needs a probe. Simple containers that always terminate on failure don’t need a probe. You may also omit probes from low-priority services, where the command would need to be relatively expensive to accurately determine healthiness.
  • Revisit your probes regularly. New features, optimizations, and regressions in your app can all impact probe performance and what constitutes a “healthy” state. Set a reminder to regularly check your probes and make necessary adjustments.

Other Types of Probes

Liveness probes aren’t your only option for disclosing a pod’s internal status to Kubernetes. Liveness probes exclusively focus on the ongoing health of your application; two other probes are better suited for detecting problems early in a pod’s lifecycle.

Readiness probes determine when new containers are able to receive traffic. Pods with one of these probes won’t become part of services until the probe indicates its assent. You can use this mechanism to prevent a new container from handling user requests while its bootstrap scripts are running.

Startup probes are the final type of probe. They indicate if a container’s application has finished launching. When a container has this type of probe, its liveness and readiness probes won’t be executed until the startup probe has succeeded. It’s a way to avoid continual container restarts due to probes that fail because the application’s not ready to handle them.

Final Thoughts

Liveness probes are a Kubernetes mechanism for exposing whether the applications inside your containers are healthy. It’s a way to address the disconnect between Kubernetes’ perception of your pods and the reality of if users can actually access your service.

In this article, you’ve learned why you should use liveness probes, the types of probes that are available, and how you can start attaching them to your pods. We’ve also discussed some of the config tweaks necessary to prevent probes becoming problems themselves.

You can keep tabs on your Kubernetes installation and watch for unhealthy pods using ContainIQ. The platform provides an integrated view of your cluster and its activities, including liveness probes and container restart events, and is a great way to check whether your probes are having the effect you intended.

Start your free 14-day ContainIQ trial
Start Free TrialBook a Demo
No card required
James Walker
Software Engineer

James Walker is the founder of Heron Web, a UK-based digital agency providing bespoke software development services to SMEs. He has experience managing complete end-to-end web development workflows with DevOps, CI/CD, Docker, and Kubernetes. James also writes technical articles on programming and the software development lifecycle, using the insights acquired from his industry career. He's currently a regular contributor to CloudSavvy IT and has previously written for DigitalJournal.com, OnMSFT.com, and other technology-oriented publications.

READ MORE