The <terminal inline>kubectl scale<terminal inline> command is used to immediately scale your application by adjusting the number of running containers. This is the quickest and easiest way to increase a deployment’s replica count, and it can be used to react to spikes in demand or prolonged quiet periods.
In this article, you’ll see how to use <terminal inline>kubectl scale<terminal inline> to scale a simple deployment. You’ll also learn about the options you can use when you need a more sophisticated change. Finally, you’ll look at the best practices for running <terminal inline>kubectl scale<terminal inline>, as well as at some alternative methods for adjusting Kubernetes replica counts.
kubectl scale Use Cases
The kubectl scale command is used to change the number of running replicas inside Kubernetes deployment, replica set, replication controller, and stateful set objects. When you increase the replica count, Kubernetes will start new pods to scale up your service. Lowering the replica count will cause Kubernetes to gracefully terminate some pods, freeing up cluster resources.
You can run <terminal inline>kubectl scale<terminal inline> to manually adjust your application’s replica count in response to changing service capacity requirements. Increased traffic loads can be handled by increasing the replica count, providing more application instances to serve user traffic. When the surge subsides, the number of replicas can be reduced. This helps keep your costs low by avoiding utilization of unneeded resources.
Using kubectl
The most basic usage of <terminal inline>kubectl scale<terminal inline> looks like this:
Executing this command will adjust the deployment called <terminal inline>demo-deployment<terminal inline> so it has three running replicas. You can target a different kind of resource by substituting its name instead of <terminal inline>deployment<terminal inline>:
Basic Scaling
Now we’ll look at a complete example of using <terminal inline>kubectl scale<terminal inline> to scale a deployment. Here’s a YAML file defining a simple deployment:
Save this YAML to <terminal inline>demo-deployment.yaml<terminal inline> in your working directory. Next, use kubectl to add the deployment to your cluster:
Now run the <terminal inline>get pods<terminal inline> command to view the pods that have been created for the deployment:
Only one pod is running. This is expected, as the deployment’s manifest declares one replica in its <terminal inline>spec.replicas<terminal inline> field.
A single replica isn’t sufficient for a production application. You could experience downtime if the node hosting the pod goes offline for any reason. Use <terminal inline>kubectl scale<terminal inline> to increase the replica count to provide more headroom:
Repeat the <terminal inline>get pods<terminal inline> command to confirm that the deployment has been scaled successfully:
There are now five pods running for the <terminal inline>demo-deployment<terminal inline> deployment. You can see from the <terminal inline>AGE<terminal inline> column that the <terminal inline>scale<terminal inline> command retained the original pod and added four new ones.
After further consideration, you might decide five replicas are unnecessary for this application. It’s only running a static NGINX web server, so resource consumption per user request should be low. Use the <terminal inline>scale<terminal inline> command again to lower the replica count and avoid wasting cluster capacity:
Repeat the <terminal inline>get pods<terminal inline> command:
Kubernetes has marked two of the running pods for termination. This will reduce the running replica count down to the requested three pods. The pods selected for eviction are sent a SIGTERM signal and allowed to gracefully terminate. They’ll be removed from the pod list once they’ve stopped.
Conditional Scaling
Sometimes you might want to scale a resource, but only if there’s a specific number of replicas already running. This avoids unintentional overwrites of previous scaling changes, such as those made by other users in your cluster.
Include the <terminal inline>--current-replicas<terminal inline> flag in the command to use this behavior:
This example scales the <terminal inline>demo-deployment<terminal inline> deployment to five replicas, but only if there’s currently three replicas running. The <terminal inline>--current-replicas<terminal inline> value is always matched exactly; you can’t express a condition as “less than” or “greater than” a particular count.
Scaling Multiple Resources
The <terminal inline>kubectl scale<terminal inline> command can scale several resources at once when you supply more than one name as arguments. Each of the resources will be scaled to the same replica count set by the <terminal inline>--replicas<terminal inline> flag.
This command scales the <terminal inline>app<terminal inline> and <terminal inline>database<terminal inline> deployments to five replicas each.
You can scale every resource of a particular type by supplying the <terminal inline>--all<terminal inline> flag, such as this example to scale all the deployments in your <terminal inline>default<terminal inline> namespace:
This selects every matching resource inside the currently active namespace. The objects that were scaled are shown in the command’s output.
You can obtain granular control over the objects that are scaled with the <terminal inline>--selector<terminal inline> flag. This lets you use standard selection syntax to filter objects based on their labels. Here’s an example that scales all the deployments with an <terminal inline>app-name=demo-app<terminal inline> label:
Changing the Timeout
The <terminal inline>--timeout<terminal inline> flag sets the time Kubectl will wait before it gives up on a scale operation. By default, there’s no waiting period. The flag accepts time values in human-readable format, such as <terminal inline>5m<terminal inline> or <terminal inline>1h<terminal inline>:
This lets you avoid lengthy terminal hangs if a scaling change can’t be immediately fulfilled. Although <terminal inline>kubectl scale<terminal inline> is an imperative command, changes to scaling can sometimes take several minutes to complete while new pods are scheduled to nodes.
Best Practices
Using <terminal inline>kubectl scale<terminal inline> is generally the fastest and most reliable way to scale your workloads. However, there are some best practices to remember for safe operations. Here are a few tips.
- Avoid scaling too often. Changes to replica counts should be in response to specific events, such as congestion that’s causing requests to run slowly or be dropped. It’s best to analyze your current service capacity, estimate the capacity needed to satisfactorily handle all the traffic, then add an extra buffer on top to anticipate any future growth. Avoid scaling your application too often, as each operation can cause delays while pods are scheduled and terminated.
- Scaling down to zero will stop your application. You can run <terminal inline>kubectl scale --replicas=0<terminal inline>, which will remove all the containers across the selected objects. You can scale back up again by repeating the command with a positive value.
- Make sure you’ve selected the correct objects. There’s no confirmation prompt, so be sure to pay attention to the objects you’re selecting. Manually selecting objects by name is the safest approach, and prevents you from accidentally scaling other parts of your application, which could cause an outage or waste resources.
- Use <terminal inline bold>--current-replicas<terminal inline bold> to avoid accidents. Using the <terminal inline>--current-replicas<terminal inline> flag increases safety by ensuring the scale only changes if the current count matches your expectation. Otherwise, you might unintentionally overwrite scaling changes applied by another user or the Kubernetes autoscaler.
Alternatives to kubectl scale
Running <terminal inline>kubectl scale<terminal inline> is an imperative operation that has a direct effect on your cluster. You’re instructing Kubernetes to supply a specific number of replicas as soon as possible. This is logical if you created the object with the imperative <terminal inline>kubectl create<terminal inline> command, but it’s inappropriate if you originally ran kubectl apply with a declarative YAML file, as shown above. After you run the <terminal inline>scale<terminal inline> command, the number of replicas in your cluster will differ from that defined in your YAML’s <terminal inline>spec.replicas<terminal inline> field. It’s better practice to modify the YAML file instead, then re-apply it to your cluster.
First change the <terminal inline>spec.replicas<terminal inline> field to your new desired replica count:
Now repeat the <terminal inline>kubectl apply<terminal inline> command with the modified file:
kubectl will automatically diff the changes and take action to evolve the state of your cluster towards what is declared in the file. This will result in pods being automatically created or terminated, so the number of running instances matches the <terminal inline>spec.replicas<terminal inline> field again.
Another alternative to <terminal inline>kubectl scale<terminal inline> is Kubernetes’ support for autoscaling. Configuring this mechanism allows Kubernetes to automatically adjust replica counts between a configured minimum and maximum based on metrics such as CPU usage and network activity.
Final Thoughts
The <terminal inline>kubectl scale<terminal inline> command is an imperative mechanism for scaling your Kubernetes deployments, replica sets, replication controllers, and stateful sets. It targets one or more objects on each invocation and scales them so a specified number of pods are running. You can optionally set a condition, so the scale is only changed when there’s a specific number of existing replicas, avoiding unintentional resizes in the wrong direction.
You can track the number of replicas in your cluster by using a dedicated Kubernetes monitoring platform.