Scaling the deployment

What happens when you run oc scale deployment/frontend --replicas 3 

Manual Scaling 

  • With this command, you are essentially instructing the Kubernetes API server to update the desired state of the frontend deployment to have three replicas. The API server then propagates this update to the deployment controller, which is responsible for managing the lifecycle of deployments.
  • It then receives the update from the API server and checks the current state of the frontend deployment. It determines that there are currently two pods running (based on the oc get pods command output) and that the desired state is three pods. Therefore, the deployment controller decides to create one new pod.
  • Deployment controller then submits the new pod spec to the Kubernetes scheduler which is responsible for selecting an appropriate node to run the new pod on. It considers various factors such as node resource availability, pod affinity and anti-affinity rules, and node labels. Once the scheduler selects a node, it sends a binding request to the API server.

  • Our work horse - kubelet, which is the node agent, receives a notification about the new pod and its assigned node. It then creates a container runtime environment, mounts the appropriate volumes, and starts the container process.
  • The Deployment controller acts as a vigilant guardian of the frontend deployment, ensuring that all pods are fully functional and ready to serve traffic. It relentlessly monitors the status of each pod, checking for any signs of distress. If a pod falters and falls out of the Ready state, the deployment controller swiftly intervenes, administering a restart to restore it.

------------------------------------- Horizontal Pod AutoScaler---------------------------------------------

The Kubernetes HorizontalPodAutoscaler, a resourceful and adaptable guardian of workloads, tirelessly monitors the flow of demand, automatically adjusting the number of pods, deployments, and statefulsets to maintain optimal performance. Think of it as an able military general - dispatching reinforcements to meet the growing needs of its troops in response to surging demands and when the demands subside and the number of pods exceeds the specified minimum, the HPA acts as a wise advisor, instructing workloads to scale down, ensuring efficient resource utilization.

Horizontal scaling --> deploy more troops ( quantity ) [ pods ]

vertical scaling --> assign more resources to the already deployed troops ( quality ) [ cpu / memory ]


Now - oc autoscale deployment/ frontend --cpu-percent=50 --min=3 --max=10


apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
  name: frontend
  maxReplicas: 10
  minReplicas: 3
    apiVersion: apps/v1
    kind: Deployment
    name: frontend
  targetCPUUtilizationPercentage: 50
  currentReplicas: 0
  desiredReplicas: 0


We are requesting K8s to track my CPU utilisation and when it reaches 50% of usage then come to my aid and dynamically increase pods to handle the surge in demand ( max upto 10 )

And when the utilisation dips , scale down inteliigently until the minimum (3 ) threshold is met. 

To know about Scaling policies and behaviour during scale up and scale up , refer the doc : 

2 Replies

Very informative and useful, thank you for sharing @Chetan_Tiwary_ , keep them coming


pleasure @Wasim_Raja 

Join the discussion
You must log in to join this conversation.