Start your free 14-day ContainIQ trial

Using Kubectl Scale | Tutorial and Best Practices

kubectl scale is one of the many tools that helps you manage your Kubernetes deployments. In this article, you'll learn how this tool can be used, as well as best practices for use.

March 13, 2023
James Walker
Software Engineer

The <terminal inline>kubectl scale<terminal inline> command is used to immediately scale your application by adjusting the number of running containers. This is the quickest and easiest way to increase a deployment’s replica count, and it can be used to react to spikes in demand or prolonged quiet periods.

In this article, you’ll see how to use <terminal inline>kubectl scale<terminal inline> to scale a simple deployment. You’ll also learn about the options you can use when you need a more sophisticated change. Finally, you’ll look at the best practices for running <terminal inline>kubectl scale<terminal inline>, as well as at some alternative methods for adjusting Kubernetes replica counts.

kubectl scale Use Cases

The kubectl scale command is used to change the number of running replicas inside Kubernetes deployment, replica set, replication controller, and stateful set objects. When you increase the replica count, Kubernetes will start new pods to scale up your service. Lowering the replica count will cause Kubernetes to gracefully terminate some pods, freeing up cluster resources.

You can run <terminal inline>kubectl scale<terminal inline> to manually adjust your application’s replica count in response to changing service capacity requirements. Increased traffic loads can be handled by increasing the replica count, providing more application instances to serve user traffic. When the surge subsides, the number of replicas can be reduced. This helps keep your costs low by avoiding utilization of unneeded resources.

Using kubectl

The most basic usage of <terminal inline>kubectl scale<terminal inline> looks like this:


$ kubectl scale --replicas=3 deployment/demo-deployment

Executing this command will adjust the deployment called <terminal inline>demo-deployment<terminal inline> so it has three running replicas. You can target a different kind of resource by substituting its name instead of <terminal inline>deployment<terminal inline>:


# ReplicaSet
$ kubectl scale --replicas=3 rs/demo-replicaset

# ReplicationController
$ kubectl scale --replicas=3 rc/demo-replicationcontroller

# StatefulSet
$ kubectl scale --replicas=3 sts/demo-statefulset

Basic Scaling

Now we’ll look at a complete example of using <terminal inline>kubectl scale<terminal inline> to scale a deployment. Here’s a YAML file defining a simple deployment:


apiVersion: apps/v1
kind: Deployment
metadata:
  name: demo-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: demo-app
  template:
    metadata:
      labels:
        app: demo-app
    spec:
      containers:
        - name: nginx
          image: nginx:latest

Save this YAML to <terminal inline>demo-deployment.yaml<terminal inline> in your working directory. Next, use kubectl to add the deployment to your cluster:


$ kubectl apply -f demo-deployment.yaml
deployment.apps/demo-deployment created

Now run the <terminal inline>get pods<terminal inline> command to view the pods that have been created for the deployment:

$ kubectl get pods

NAME READY STATUS RESTARTS AGE
demo-deployment-86897ddbb-jl6r6 1/1 Running 0 33s

Only one pod is running. This is expected, as the deployment’s manifest declares one replica in its <terminal inline>spec.replicas<terminal inline> field.

A single replica isn’t sufficient for a production application. You could experience downtime if the node hosting the pod goes offline for any reason. Use <terminal inline>kubectl scale<terminal inline> to increase the replica count to provide more headroom:


$ kubectl scale --replicas=5 deployment/demo-deployment
deployment.apps/demo-deployment scaled

Repeat the <terminal inline>get pods<terminal inline> command to confirm that the deployment has been scaled successfully:

$ kubectl get pods

NAME READY STATUS RESTARTS AGE
demo-deployment-86897ddbb-66lzc 1/1 Running 0 46s
demo-deployment-86897ddbb-66s9d 1/1 Running 0 46s
demo-deployment-86897ddbb-jl6r6 1/1 Running 0 3m33s
demo-deployment-86897ddbb-sgcjb 1/1 Running 0 46s
demo-deployment-86897ddbb-tgvnw 1/1 Running 0 46s

There are now five pods running for the <terminal inline>demo-deployment<terminal inline> deployment. You can see from the <terminal inline>AGE<terminal inline> column that the <terminal inline>scale<terminal inline> command retained the original pod and added four new ones.

After further consideration, you might decide five replicas are unnecessary for this application. It’s only running a static NGINX web server, so resource consumption per user request should be low. Use the <terminal inline>scale<terminal inline> command again to lower the replica count and avoid wasting cluster capacity:


$ kubectl scale --replicas=3 deployment/demo-deployment
deployment.apps/demo-deployment created

Repeat the <terminal inline>get pods<terminal inline> command:

$ kubectl get pods

NAME READY STATUS RESTARTS AGE
demo-deployment-86897ddbb-66lzc 1/1 Terminating 0 3m21s
demo-deployment-86897ddbb-66s9d 1/1 Terminating 0 3m21s
demo-deployment-86897ddbb-jl6r6 1/1 Running 0 6m8s
demo-deployment-86897ddbb-sgcjb 1/1 Running 0 3m21s
demo-deployment-86897ddbb-tgvnw 1/1 Running 0 3m21s

Kubernetes has marked two of the running pods for termination. This will reduce the running replica count down to the requested three pods. The pods selected for eviction are sent a SIGTERM signal and allowed to gracefully terminate. They’ll be removed from the pod list once they’ve stopped.

Monitor Kubernetes Events in Real-Time
Monitor the health of your cluster and troubleshoot issues faster with pre-built dashboards that just work.
Learn More Book a Demo
event dashboard

Conditional Scaling

Sometimes you might want to scale a resource, but only if there’s a specific number of replicas already running. This avoids unintentional overwrites of previous scaling changes, such as those made by other users in your cluster.

Include the <terminal inline>--current-replicas<terminal inline> flag in the command to use this behavior:


$ kubectl scale --current-replicas=3 --replicas=5 deployment/demo-deployment
deployment.apps/demo-deployment scaled

This example scales the <terminal inline>demo-deployment<terminal inline> deployment to five replicas, but only if there’s currently three replicas running. The <terminal inline>--current-replicas<terminal inline> value is always matched exactly; you can’t express a condition as “less than” or “greater than” a particular count.

Scaling Multiple Resources

The <terminal inline>kubectl scale<terminal inline> command can scale several resources at once when you supply more than one name as arguments. Each of the resources will be scaled to the same replica count set by the <terminal inline>--replicas<terminal inline> flag.


$ kubectl scale --replicas=5 deployment/app deployment/database
deployment.apps/app scaled
deployment.apps/database scaled

This command scales the <terminal inline>app<terminal inline> and <terminal inline>database<terminal inline> deployments to five replicas each.

You can scale every resource of a particular type by supplying the <terminal inline>--all<terminal inline> flag, such as this example to scale all the deployments in your <terminal inline>default<terminal inline> namespace:


$ kubectl scale --all --replicas=5 --namespace=default deployment
deployment.apps/app scaled
deployment.apps/database scaled

This selects every matching resource inside the currently active namespace. The objects that were scaled are shown in the command’s output.

You can obtain granular control over the objects that are scaled with the <terminal inline>--selector<terminal inline> flag. This lets you use standard selection syntax to filter objects based on their labels. Here’s an example that scales all the deployments with an <terminal inline>app-name=demo-app<terminal inline> label:


$ kubectl scale --replicas=5 --selector=app-name=demo-app deployment
deployment.apps/app scaled
deployment.apps/database scaled

Changing the Timeout

The <terminal inline>--timeout<terminal inline> flag sets the time Kubectl will wait before it gives up on a scale operation. By default, there’s no waiting period. The flag accepts time values in human-readable format, such as <terminal inline>5m<terminal inline> or <terminal inline>1h<terminal inline>:


$ kubectl scale --replicas=5 --timeout=1m deployment/demo-deployment

This lets you avoid lengthy terminal hangs if a scaling change can’t be immediately fulfilled. Although <terminal inline>kubectl scale<terminal inline> is an imperative command, changes to scaling can sometimes take several minutes to complete while new pods are scheduled to nodes.

Best Practices

Using <terminal inline>kubectl scale<terminal inline> is generally the fastest and most reliable way to scale your workloads. However, there are some best practices to remember for safe operations. Here are a few tips.

  • Avoid scaling too often. Changes to replica counts should be in response to specific events, such as congestion that’s causing requests to run slowly or be dropped. It’s best to analyze your current service capacity, estimate the capacity needed to satisfactorily handle all the traffic, then add an extra buffer on top to anticipate any future growth. Avoid scaling your application too often, as each operation can cause delays while pods are scheduled and terminated.
  • Scaling down to zero will stop your application. You can run <terminal inline>kubectl scale --replicas=0<terminal inline>, which will remove all the containers across the selected objects. You can scale back up again by repeating the command with a positive value.
  • Make sure you’ve selected the correct objects. There’s no confirmation prompt, so be sure to pay attention to the objects you’re selecting. Manually selecting objects by name is the safest approach, and prevents you from accidentally scaling other parts of your application, which could cause an outage or waste resources.
  • Use <terminal inline bold>--current-replicas<terminal inline bold> to avoid accidents. Using the <terminal inline>--current-replicas<terminal inline> flag increases safety by ensuring the scale only changes if the current count matches your expectation. Otherwise, you might unintentionally overwrite scaling changes applied by another user or the Kubernetes autoscaler.

Alternatives to kubectl scale

Running <terminal inline>kubectl scale<terminal inline> is an imperative operation that has a direct effect on your cluster. You’re instructing Kubernetes to supply a specific number of replicas as soon as possible. This is logical if you created the object with the imperative <terminal inline>kubectl create<terminal inline> command, but it’s inappropriate if you originally ran kubectl apply with a declarative YAML file, as shown above. After you run the <terminal inline>scale<terminal inline> command, the number of replicas in your cluster will differ from that defined in your YAML’s <terminal inline>spec.replicas<terminal inline> field. It’s better practice to modify the YAML file instead, then re-apply it to your cluster.

First change the <terminal inline>spec.replicas<terminal inline> field to your new desired replica count:


apiVersion: apps/v1
kind: Deployment
metadata:
  name: demo-deployment
spec:
  replicas: 5
  selector:
    matchLabels:
      app: demo-app
  template:
    metadata:
      labels:
        app: demo-app
    spec:
      containers:
        - name: nginx
          image: nginx:latest

Now repeat the <terminal inline>kubectl apply<terminal inline> command with the modified file:


$ kubectl apply -f demo-deployment.yaml

kubectl will automatically diff the changes and take action to evolve the state of your cluster towards what is declared in the file. This will result in pods being automatically created or terminated, so the number of running instances matches the <terminal inline>spec.replicas<terminal inline> field again.

Another alternative to <terminal inline>kubectl scale<terminal inline> is Kubernetes’ support for autoscaling. Configuring this mechanism allows Kubernetes to automatically adjust replica counts between a configured minimum and maximum based on metrics such as CPU usage and network activity.

Final Thoughts

The <terminal inline>kubectl scale<terminal inline> command is an imperative mechanism for scaling your Kubernetes deployments, replica sets, replication controllers, and stateful sets. It targets one or more objects on each invocation and scales them so a specified number of pods are running. You can optionally set a condition, so the scale is only changed when there’s a specific number of existing replicas, avoiding unintentional resizes in the wrong direction.

You can track the number of replicas in your cluster by using a dedicated Kubernetes monitoring platform.

Start your free 14-day ContainIQ trial
Start Free TrialBook a Demo
No card required
James Walker
Software Engineer

James Walker is the founder of Heron Web, a UK-based digital agency providing bespoke software development services to SMEs. He has experience managing complete end-to-end web development workflows with DevOps, CI/CD, Docker, and Kubernetes. James also writes technical articles on programming and the software development lifecycle, using the insights acquired from his industry career. He's currently a regular contributor to CloudSavvy IT and has previously written for DigitalJournal.com, OnMSFT.com, and other technology-oriented publications.

READ MORE