How to Use Kubectl Scale on Deployment

Kubernetes is a powerful platform for managing containerized applications. One of the key features of Kubernetes is the ability to scale the pods up or down to optimize performance and resource consumption.

In this tutorial, you will learn how to use the `kubectl scale` command to adjust the number of pods hosting your application.

Key Takeaways

You can scale a deployment declaratively by editing the deployment manifest file and applying it to the cluster or imperatively by using the `kubectl scale` command to change the number of replicas directly.
You should consider the scaling strategy, network resources, monitoring and troubleshooting, cluster autoscaler, and horizontal Pod autoscaler when scaling a deployment in Kubernetes.
Scaling a deployment can help you adapt your application to the changing needs and resources, but it also requires careful planning and execution to avoid errors or disruptions.

Prerequisites

To follow along with the examples in this article, you need a running Kubernetes cluster. You can create one with a Kubernetes tool, such as Minikube, Kind, etc. You also need the kubectl tool, which you can install and configure by following this guide. Additionally, you need a basic knowledge of Kubernetes concepts, such as pods, deployments, services, and replica sets. You can review the Kubernetes basics here.

What is a Deployment?

A deployment is a Kubernetes resource that defines the desired state of your application. It specifies what container image to use, what ports to expose, and other configuration details. A deployment also creates and manages a replica set, which is a group of Pods that run the same application.

What is a Pod?

A Pod is the smallest unit of deployment in Kubernetes. It consists of one or more containers that share the same network and storage resources. A Pod can run on any node in the cluster, and Kubernetes ensures that it has enough resources to function properly.

Why Scale a Deployment?

Scaling a deployment means changing the number of replicas of your application. Replicas are copies of a Pod that are created and managed by the Kubernetes system.

Below are some of the reasons why you might want to scale a deployment:

To handle increased traffic or load on your application
To improve the availability and reliability of your application
To test the performance and resilience of your application
To save costs by reducing the resources used by your application

How to Scale a Deployment?

There are two ways to scale a deployment in Kubernetes: declaratively and imperatively.

Declarative Scaling

Declarative scaling means specifying the desired number of replicas in the deployment manifest file, and then applying it to the cluster using the `kubectl apply` command. For example, if you have a deployment file named deployment.yaml that looks like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80

You can change the number of replicas from 3 to 5 by editing the replicas field in the file and then applying it to the cluster:

kubectl apply -f deployment.yaml

This will update the deployment and create two more Pods for your application.

Imperative Scaling

Imperative scaling means directly changing the number of replicas of a deployment using the `kubectl scale` command. For example, if you want to scale up your nginx-deployment from 3 to 5 replicas, you can run:

kubectl scale deploy nginx-deployment --replicas=5

This will update the deployment and create two more Pods for your application.

You can also scale down your deployment by specifying a lower number of replicas. For example, if you want to scale down your nginx-deployment from 5 to 2 replicas, you can run:

kubectl scale deploy nginx-deployment --replicas=2

This will update the deployment and delete three Pods from your application.

You can also use the `kubectl scale` command to scale all deployments in a namespace by using the --all flag. For example, if you want to scale down all deployments in the default namespace to zero replicas, you can run the following:

kubectl scale deploy -n default --replicas=0 --all

This will update all deployments in the default namespace and delete all Pods from your applications.

How to Check the Status of Scaling?

You can use the kubectl get deployment command to check the status of your deployments and Pods. For example, if you want to see how many replicas of your nginx-deployment are running, you can run:

kubectl get deploy nginx-deployment

This will show an output similar to this:

This means that there are two replicas of your nginx-deployment that are ready, up-to-date, and available.

You can use the kubectl get pods command to check the status of your Pods. For example, if you want to see how many Pods of your nginx-deployment are running, you can run:

kubectl get pods -l app=nginx

This will show an output similar to this:

This means that two Pods of your nginx-deployment are running, and each Pod has one ready and running container.

You can also use the `kubectl describe` command to get more details about your deployments and Pods. For example, if you want to see more details about your nginx-deployment, you can run:

kubectl describe deploy nginx-deployment

This will show an output similar to this:

This shows you more information about your deployment, such as the labels, annotations, selector, strategy, template, conditions, events, and the replica sets that are associated with it.

Take on real Kubernetes tasks on a live system with KodeKloud Engineer. Ready to Get Started? It's free!

TRY NOW!

Best Practices for Scaling a Deployment

Here are some of the best practices that you should follow when scaling your application in Kubernetes.

Using the cluster autoscaler: The cluster autoscaler is a tool that automatically adjusts the size of your node pool based on the demand for compute resources. It can help you optimize resource utilization by adding or removing nodes as needed, thereby, reducing the cost incurred by the cluster. However, you should also be aware of its limitations and requirements, such as the minimum and maximum number of nodes, the node pool labels, the Pod disruption budget, and the node drain time.
Using horizontal Pod autoscaler: The horizontal Pod autoscaler is a tool that automatically adjusts the number of replicas of your deployment based on the CPU or memory usage of your Pods. It can help you improve the availability and reliability of your application by adding or removing Pods as needed. You should use horizontal Pod autoscaler when you have variable or unpredictable workload patterns that require scaling based on CPU or memory usage.
Choosing the right scaling strategy: Kubernetes supports two types of scaling strategies for deployments: rolling update and recreate. Rolling update gradually replaces the old Pods with the new ones, while recreate deletes all the old Pods before creating the new ones. Rolling update is the default and preferred strategy, as it ensures zero downtime and minimal disruption to your application. However, in some cases, you might need to use recreate strategy, such as when your application is not compatible with running multiple versions simultaneously, or when you need to perform schema migrations or other one-time operations. You can specify the scaling strategy in the deployment manifest file using the .spec.strategy.type field.
Monitoring and troubleshooting: Scaling a deployment can introduce new issues or expose existing ones in your application. You should monitor the status and performance of your deployment and Pods using tools such as kubectl get, kubectl describe, kubectl logs, kubectl top, and kubectl exec commands. You should also use metrics, alerts, dashboards, and tracing tools to collect and analyze data about your application’s behavior and health. You should be able to identify and resolve any errors or bottlenecks that might occur during or after scaling your deployment.
Managing network resources: Scaling a deployment can affect the network resources that your application uses, such as IP addresses, load balancers, DNS records, and firewall rules. You should ensure that your network configuration can support the increased or decreased number of Pods and services that your application requires. For example, you should use a managed NAT gateway with at least two public IPs for cluster egress, and use internal load balancers or services for internal communication.

Looking to polish your Kubernetes skills? Check out KodeKloud’s Kubernetes for the Absolute Beginner course.

Conclusion

In this tutorial, you learned how to use the kubectl scale command to adjust the number of replicas of your deployment. You also learned how to check the status of your deployment and Pods using the kubectl get and kubectl describe commands. Scaling a deployment is a common and useful operation in Kubernetes that allows you to adapt your application to changing needs and resources.

If you want to learn more about the kubectl scale command, you can check the official documentation here.

Looking to certify your Kubernetes skills? Check out the following certification exam preparation courses from KodeKloud:

Barry Ugochukwu

Sep 5, 2023 • 6 min read

Key Takeaways

Prerequisites

What is a Deployment?

What is a Pod?

Why Scale a Deployment?

How to Scale a Deployment?

Declarative Scaling

Imperative Scaling

How to Check the Status of Scaling?

Best Practices for Scaling a Deployment

Conclusion

Barry Ugochukwu

DevOps Blog