Resources

Kubernetes Jobs | Use Cases, Scheduling, and Failure

October 18, 2021

Learn more about Kubernetes best practices and job cases. This article will even teach you how to create kubernetes jobs and how to handle failures.

Tyler Charboneau
Software Engineer

Millions of organizations have turned to the Kubernetes system as an ideal way to manage their microservices. These microservice applications are undeniably intricate—featuring many more moving parts than their monolithic predecessors. Accordingly, Kubernetes structure is a crucial element in managing them. Clusters encompass nodes (worker machines); nodes contain pods (deployable objects representing single processes); and pods play host to containers (application software packages). That hierarchy can be quite confusing to Kubernetes novices. As efficient as the software is, even seasoned professionals must wrangle these resources into prime working order.

Naturally, DevOps teams can’t just implement and forget a Kubernetes backend. With companies running 250+ Kubernetes containers on average, there are a multitude of processes happening in production at any given time. Keeping tabs on processes helps with gauging overall cluster performance. That’s where jobs come into play. When pods carry out processes, the job’s role is to supervise it from start to finish.

Additionally, a job will create pods in conjunction with a termination schedule, that is, a job governs process execution until “a specified number of successful completions is reached.” Deleting a job removes its associated pods. Suspending a job does the same, unless that job resumes later on.

In this article, we’ll cover some job use cases, best practices, and show you how to create them.

Job Use Cases

At their simplest, one job can run one pod. Accordingly, you may increase a job’s scope to encompass a greater number of pods or active processes. Should a pod initially fail or be deleted, the job will replace that troublesome pod according to its instructions. Jobs are considered objects within the Kubernetes universe. Per the Kubernetes documentation, this means that jobs are persistent entities that represent cluster states. They describe what containerized apps are active, any behavioral policies, and the available resources at specific times. You might see where these jobs have immense introspective value on the infrastructure side. Because jobs have a <terminal inline>status<terminal inline>, you can continually assess a job’s current state, whether running, pending, completed, or ContainerCreating. You may also view attributes like Restarts, Age, and Readiness.

You can run many jobs in Kubernetes. Here’s a shortened list of what they can oversee:

  • Countdowns
  • Computations
  • Prints
  • Message processing
  • Work queues
  • Node behaviors
  • Resource consumption

You can kick off a job by using a <terminal inline>kubectl<terminal inline> command. Your job’s specific code resides within a YAML or JSON file; it’s necessary to insert your resource URL within that command, as follows (originating from an official Kubernetes source, in this case):


kubectl apply -f https://www.kubernetes.io/queues/controllers/random-job.yaml

It’s then helpful to check your job’s status via the same command interface. Kubernetes will spit out an output in human-readable YAML, though occasional numeric outputs are more geared toward computers. This may be the case with logs, where pod records are expressed as integer sequences. Say we’re checking in on a job named “inspect.” This might be our output after running a command like <terminal inline>kubectl describe jobs/inspect<terminal inline>.

Name: inspect

Namespace: default

Selector: controller-uid=c9068879-e47d-4c6d-9901-bg3c56h7ha3x

Labels: controller-uid=c9068879-e47d-4c6d-9901-bg3c56h7ha3x

job-name=inspect

Annotations: kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"batch/v1","kind":"Job","metadata":{"annotations":{},"name":"inspect","namespace":"default"}, "spec":{"backoffLimit":4, "template":...

Parallelism: 1

Completions: 1

Start Time: Mon, 24 May 2021 23:09:05 -0600

Completed At: Mon, 24 May 2021 23:10:15 -0600

Duration: 70s

Pods statuses: 0 Running / 1 Succeeded / 0 Failed

Pod Template:

Labels: controller-uid=c9068879-e47d-4c6d-9901-bg3c56h7ha3x

job-name=inspect

Containers:

inspect:

Image: perl

Port: <none>

Host Port: <none>

Command:

Perl

-Mbignum=bpi

-wle

Print bpi(2000)

Environment: <none>

Mounts: <none>

Volumes: <none>

Events:

Type Reason Age From Message
------- ---------- ----- ------ ------------
Normal Successful 12m random-job-controller Created pod: inspect-7fgrp8

Understanding the Definition File

You cannot expect a job to execute properly without some predefined parameters. Accordingly, every job originates from a definition file or resource that determines a job object’s configuration. This determines how each job will function and has relevance to your overall objective. It may be structured like the following, for a job named random-job:


---
apiVersion: batch/v1
kind: Job
metadata:
name: inspect
spec:
 template:
  metadata:
   name: inspect
  spec:
   containers:
- name: inspect
image: Docker
command: ["echo", "Running a job"]
restartPolicy: onFailure

What if you want to apply that definition to a certain Kubernetes cluster? Use the following command:


$ kubectl apply -f random-job.yaml
job.batch/random-job created

By checking the status of your new job, you can track the job’s life cycle from <terminal inline>ContainerCreating<terminal inline> to <terminal inline>Completed<terminal inline>. Pay careful attention to <terminal inline>restartPolicy<terminal inline>, as this field is dichotic. Your only available options are Never and OnFailure, as there’s no sense in always restarting a pod following completion.

Job Types

There are three principal job types in Kubernetes, defined according to how one or more processes are handled:

  1. Multiple Parallel Jobs: Often called a work queue, this involves running multiple jobs concurrently, or in parallel. In many instances, it’s not practical to allow one job to finish before starting another one. Parallel processing is highly efficient and favorable when computer resources adequately support them.
  2. Parallel Jobs with Fixed Completion Count: These jobs occur concurrently, but run a set amount of times before terminating successfully. By setting <terminal inline>.spec.completions<terminal inline> to a value greater than one, you trigger the formation of successful pods. You may also add an index to these jobs, meaning that each pod is assigned a portion of the overall task to complete.
  3. Non-parallel Jobs: This specifies a job that executes single-handedly or independently. Only one successful pod is started, with additional pods forming in response to any startup failures. Once a pod terminates successfully, that specific job is complete.

Non-parallel jobs automatically default to <terminal inline>Completions: 1<terminal inline> when <terminal inline>.spec.completions<terminal inline> isn’t defined. Work queues must have an unset <terminal inline>.spec.completions<terminal inline> attribute, and none of these job types utilizes negative integers (as a job cannot be run fewer than zero times).

As a final note, you may control parallelism for parallel job types. By setting the <terminal inline>.spec.parallelism<terminal inline> attribute to 0, jobs are paused until that number increases. By leaving that same field unset, Kubernetes defaults to 1.

Handling Failures

Not every job runs smoothly, but thankfully there are ways to counteract this issue within Kubernetes. We touched briefly on the <terminal inline>restartPolicy<terminal inline> feature earlier, but we’ll add more nuance here. Pod failures and container failures can cause problems across your ecosystem. When a container fails, your applications might labor. By stipulating <terminal inline>.spec.template.spec.restartPolicy = “OnFailure”<terminal inline>, pods will remain on their nodes while rebooting the container. Additionally, a container failure when <terminal inline>restartPolicy = “Never”<terminal inline> may also cause a pod to fail. Applications must be smart enough to deal with launching in a new pod, and handle files or processes seamlessly.

In parallel workloads, it may be useful for your pods to handle concurrency well. It’s also possible that some programs will start twice, especially when parallelism and completions are set to 1, while <terminal inline>restartPolicy = “Never”<terminal inline>.

What if a job continues to fail time after time? When this happens, programming or configuration issues are almost certainly at fault. Consequently, it’s not beneficial to subject your pods to multiple failures. The <terminal inline>.spec.backoffLimit<terminal inline> attribute is therefore effective at limiting these retries to a specific number. Should a job falter six times (the default number), it’s automatically considered a failure. The job controller forms replacement pods at preset intervals following pod failures. A typical interval might be 10s, 20s, 30s, and so on, until it reaches the six-minute limit.

Quick Notes on Cron Jobs

The cron job is quite useful when it comes to running automated tasks, or jobs. Packaged with Kubernetes 1.21, the service allows you to execute jobs on a schedule. You can define the elapsed time between jobs, or the frequency at which a job runs. This is typically denoted in seconds or minutes depending on the task at hand.

Accordingly, cron jobs are to Kubernetes what orchestration tools are to IT. If there’s a task you run habitually, automate it. These types of jobs can otherwise be tedious or easy to forget. A cron job ensures this doesn’t happen while removing job orchestration from your plate. Say you want to create periodic backups, generate timely emails, or even schedule jobs around user-activity periods. Cron jobs help make that possible via configuration files and available <terminal inline>kubectl<terminal inline> commands.

You may manually check the status of a job following its creation. Finally, you can delete any cron jobs that are no longer useful to you.

Conclusion

There’s little denying the integral role that jobs have in the Kubernetes world. By gaining an intricate understanding of these objects, it’s possible to unlock greater visibility and control over your ecosystem. As always, following best practices and adhering to the Kubernetes documentation is a surefire way to succeed with job utilization. Additionally, many third-party tutorials exist which can help flatten that learning curve.

Article by

Tyler Charboneau

Software Engineer

Tyler is a hardware-software devotee and researcher. He specializes in simplifying the complex while speaking effectively to all audiences.

Read More