Vertical Pod Autoscaler (VPA) in Kubernetes Explained through an Example

Kubernetes provides a shared pool of resources allocated based on how you configure your containerized application. The allocation process is handled by a Scheduler, which checks the resource requirements of each container and selects an appropriate node to deploy the container’s pod.

You define a container’s resource requirements by specifying resource requests and limits (compute resources).

  • Resource requests specify the resources that have to be reserved for a container. A Scheduler has to ensure that the container’s pod is placed in a node that guarantees the availability of requested resources.
  • Resource limit specifies the maximum amount of resources a container is allowed to use. If a container’s resource needs exceed the set limit, the kubelet automatically restarts it.

You can specify several types of resource limits and requests, but in this article, we’ll focus on CPU and memory.

Where do we configure these resource requests and limits? To see this, we must first understand a pod's role in resource allocation. A pod in Kubernetes is the smallest and simplest unit in the Kubernetes object model that you create or deploy. They are self-contained and can be used to run single or multiple instances of an application. We deploy applications as pods with resource requirements specified in the pod definition YAML.

Note: You can also deploy an application without specifying the resource requirements. In such a case, Kubernetes automatically sets them for you.

Let’s assume we have a containerized web application - with resource requests and limits specified- deployed in a pod defined by the YAML file below:

apiVersion: v1
kind: Pod
metadata:
  name: webapp
spec:
  containers:
  - name: webapp
    image: nginx
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"

In this example, the webapp container requests 64Mi of memory and 250m CPU units, and it has limits of 128Mi of memory and 500m CPU units. The Scheduler will ensure that the pod is placed in a node that guarantees a memory of 64Mi and 250m CPU units. The kubelet will restart the pod if it exceeds 128Mi or 500m.

But, what if our web app starts to get a lot more traffic, and we don't want to drop any of users' requests? Keep in mind that if the traffic increases, the resources utilized by the app will also increase.

If it reaches the current limit, the container will be restarted, causing a temporary outage that degrades the users’ experience. And we don’t want that! This is where the concept of autoscaling comes in.

Autoscaling is the dynamic assignment of resources to match changing applications' needs. Kubernetes has three built-in methods of autoscaling: vertical pod autoscaling, horizontal pod autoscaling, and cluster autoscaling. This article will focus on Vertical Pod Autoscaler (VPA).

Try the Kubernetes Deployments Lab for free

Kubernetes Deployments Lab
Kubernetes Deployments Lab

Vertical Pod Autoscaler (VPA)

The VPA automatically adjusts the amount of CPU and memory allocated to pods in response to changing conditions. This ensures that pods have the resources they need to operate effectively, even as demand increases or decreases.

VPA operates using three components: the VPA Recommender, the VPA Updater, and the VPA Admission Controller.

  1. VPA Recommender: The VPA recommender monitors the resource usage of a pod. For instance, suppose the webapp pod consistently uses 120Mi of its 128Mi memory limit and 450m of its 500m CPU limit. Based on this observation, the recommender might determine that the webapp pod would benefit from having more resources available to it. It might recommend increasing the memory request and limit to 128Mi and 256Mi, respectively, and the CPU request and limit to 500m and 1 CPU unit.
  2. VPA Updater: Once the recommender has made its recommendations, the VPA updater will check the current state of the webapp pod. If it notices that the pod's current resource allocation does not match the recommender's suggestion, it swings into action. To allow the pod to be rescheduled with the new resources, the updater may decide to evict the pod, depending on the VPA configuration.
  3. VPA Admission Controller: Before the webapp pod is recreated after eviction, the VPA admission controller steps in. As the pod is being scheduled, the admission controller updates the pod's resource requests and limits with the new values recommended by the VPA recommender. Our webapp pod will then be scheduled with the new resource requests of 128Mi memory and 500m CPU, and limits of 256Mi memory and 1 CPU unit.

Deploy VPA to the Kubernetes Cluster

The Vertical Pod Autoscaler (VPA) needs to be deployed to your Kubernetes cluster before you can create VPA objects to manage your pods. All three VPA components - the Recommender, the Updater, and the Admission Controller - are typically deployed as pods in your cluster.

Here are the general steps to deploy the VPA to your cluster:

1. Clone the VPA Repository

The VPA components are open source, and their code is hosted on GitHub. The first step is to clone the repository. You can do this with the following command:

git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler/

2. Deploy the VPA Components

Once you have the code, you can deploy the VPA components to your cluster. The repository contains several YAML files that define the VPA components and their configurations. You can apply these with the kubectl apply command. Here is how you would do it:

./hack/vpa-up.sh

This will create the necessary Custom Resource Definitions (CRDs) for the VPA, set up the necessary permissions with Role-Based Access Control (RBAC), and deploy the VPA components (the recommender, updater, and admission controller).

3. Verify the VPA Deployment

You can verify that the VPA components have been deployed correctly by checking that their pods are running:

kubectl get pods -n kube-system | grep vpa

This will list all pods in the kube-system namespace that have vpa their name. You should see three pods, one for each VPA component.

Please note that these are general steps and might differ based on your specific Kubernetes setup and version. Always refer to the official VPA documentation for the most accurate and up-to-date information.

Configure Vertical Pod Autoscaler for your applications

First, to deploy the VPA, you would create a VPA object. The VPA object specifies which pods the VPA applies to and what policies to use when scaling. Here is a simple example:

apiVersion: "autoscaling.k8s.io/v1"
kind: VerticalPodAutoscaler
metadata:
  name: webapp-vpa
spec:
  targetRef:
    apiVersion: "v1"
    kind: Pod
    name: webapp
  updatePolicy:
    updateMode: "Auto"

In this example, the targetRef specifies that this VPA applies to the webapp pod. The updatePolicy is set to "Auto", which means the VPA can automatically update the pod's resource requests and limits based on its recommendations.

You can create the VPA by saving this configuration to a file, say webapp-vpa.yaml, and then running the following command:

kubectl apply -f webapp-vpa.yaml

After the VPA is running, you can retrieve its recommendations using the kubectl describe command:

kubectl describe vpa webapp-vpa

This will give you an output that includes the current recommendations for the webapp pod. It might look something like this:

Name:         webapp-vpa
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  autoscaling.k8s.io/v1
Kind:         VerticalPodAutoscaler
...
Status:
  Recommendation:
    Container Recommendations:
      Container Name:  webapp
      Lower Bound:
        Cpu:     500m
        Memory:  128Mi
      Target:
        Cpu:     1
        Memory:  256Mi
      Uncapped Target:
        Cpu:     1
        Memory:  256Mi
      Upper Bound:
        Cpu:     2
        Memory:  512Mi
...

The "Target" values represent the current VPA recommendations. The "Lower Bound" and "Upper Bound" values represent the recommended range for the resources based on the observed usage.

Please note that these are basic examples, and actual usage may require more complex configurations depending on your use case. Also, the VPA needs to be enabled in your Kubernetes cluster, and you need to have the necessary permissions to deploy and manage it.

Why does VPA use an admission controller to inject resource requests and limits? Why not just update the deployment?

The Vertical Pod Autoscaler's (VPA) Updater component is primarily responsible for evicting pods whose resource requests need to be adjusted. It does not directly update the pod specification or the associated deployment. There are a few reasons why the updater doesn't directly modify the deployment:

  1. Immutability of Pods: In Kubernetes, once a pod is created, its specification is immutable. This includes the resource requests and limits of its containers. Therefore, to change these values, the pod must be recreated. The VPA Updater does this by evicting the pod, causing it to be rescheduled.
  2. Separation of Concerns: Deployments, StatefulSets, and other higher-level constructs are responsible for maintaining the desired state of pods. It's generally a good practice to keep these responsibilities separate. The VPA focuses on adjusting resource requests based on observed usage, while the Deployment controller focuses on maintaining the desired number of pod replicas.

The role of the VPA Admission Controller is to inject the recommended resource values into the pod specification when a pod is being created and scheduled. This happens after the Deployment controller has created a new pod but before the pod is scheduled to run on a node. The Admission Controller doesn't change the deployment or the pod spec stored in etcd; it only modifies the in-flight pod creation request.

This mechanism allows the VPA to adjust pod resources without modifying the original Deployment, StatefulSet, or other objects that manage the pods. These objects retain their original resource requests and limits, providing a baseline if the VPA is ever removed or disabled.

It's worth noting that while the VPA can be highly beneficial, it's not always the best fit for every scenario. There are situations and workloads where other forms of scaling, such as Horizontal Pod Autoscaling (HPA), may be more appropriate. These decisions will largely depend on the nature of your applications and the demands of your specific use case.

Kubernetes 1.28 Updates and Impact on VPA

The new version of Kubernetes - Kubernetes 1.28 - that will be released on the 15th of August 2023 supports updating resource requirements and limits for a running pod without having to terminate it. The In-place Update of Pod Resources KEP will allow for resource requests and limits on Pods to be updated without having to recreate them. This means the VPA updater will no longer need to evict Pods for the new resource recommendations to be implemented.

It will make the VPA's operation smoother and less disruptive since pods would not need to be terminated and recreated to adjust their resource usage. However, this new feature does not necessarily eliminate the need for the VPA's admission controller component. The admission controller's role is not just to update the pod's resource requirements but also to intercept and modify pod creation requests before they are persisted. This means it can adjust the resource requirements of new pods as soon as they are created based on the VPA's recommendations. The new enhancement does not replicate this functionality.

Looking to take your Kubernetes skills to the next level? We recommend checking out the following courses from KodeKloud: