OpenShift Virtualization employs a comprehensive VM scheduling strategy that encompasses node selection, affinity rules, tolerations, taints, scheduler profiles, and node failure handling mechanisms. This multi-layered approach ensures efficient VM placement, tolerance to node imperfections, and resilience to node failures.
The virt-handler daemonset plays a crucial role in VM scheduling by maintaining communication with libvirtd instances to manage VM lifecycle. If the virt-handler daemonset loses the connection to the cluster's API server, the node cannot communicate its status. The node enters a failed state, and the remaining VMs cannot migrate to the healthy nodes.
In the event of a node failure, the virt-handler's absence triggers a sequence of actions:
Node Detection: The virt-handler's absence is detected within minutes by virt-handler and Kubernetes.
Node Marking: Control plane nodes (master nodes) mark the failed node as unschedulable.
Workload Migration: The failed node's workloads, including VMs, are migrated to healthy nodes according to resource placement and scheduling rules.
VM Placement Strategy : OpenShift Virtualization employs a two-pronged approach to VM placement: eviction strategy and node placement rules. The eviction strategy dictates how VMs are redistributed when nodes become unavailable, while node placement rules govern the initial allocation of VMs to nodes. Options are :
Node Selector
The node selector ensures that VMs are scheduled on nodes that match specific label criteria. This mechanism enables granular control over VM placement, allowing users to align VMs with specific hardware configurations or resource availability.
Affinity and Anti-affinity Rules
Affinity and anti-affinity rules provide more nuanced control over VM placement. Affinity rules specify preferences for co-locating VMs with certain characteristics, while anti-affinity rules prevent VMs with specific labels from residing on the same node. These rules can be used to optimize resource utilization, enforce isolation requirements, or improve application performance.
Tolerations and Taints
Tolerations and taints serve as a safety net for VM scheduling. Tolerations enable VMs to withstand certain node taints, ensuring their smooth operation even on nodes with minor imperfections. This flexibility enhances the tolerance of VMs to diverse node environments.
Scheduler Profiles
Scheduler profiles offer a broader perspective on VM placement, influencing the overall distribution of VMs across the cluster. The three available profiles – LowNodeUtilization, HighNodeUtilization, and NoScoring – cater to different resource utilization strategies and scheduling priorities.
Note :
1. Eviction strategies determine if VMs on the failed node are moved to another node or terminated:
Live migration: Perform a live migration to ensure that the VM is not interrupted if the node is placed into maintenance or drained.
Not defined: VMs are terminated if the node is placed into maintenance or drained.
2. The .spec.runStrategy object in a VirtualMachine manifest in OpenShift Virtualization is used to define the restart policy for the virtual machine. It specifies how the virtual machine should be handled in case it enters a non-running state. Options are : Always, RerunOnFailure, Manual etc.
@Chetan_Tiwary_ Very good read, thank you
@Wasim_Raja Thanks !
@Chetan_Tiwary_ Great insight. Please send me the link to this.
I have a question.
According to the doc on ocp-v
If a VMI uses the LiveMigrate eviction strategy, it automatically migrates when the node that the VMI runs on is placed into maintenance mode.
Is this mode also applicable for - if the node becomes healthy and unready due to network/ disc/ kernel problem ? Do we have any documentation or example available anywhere ?
Thanks.
@nihar_redhat AFAIK, no it does not LiveMigrate due to unplanned issues at the node side eg. node becomes unhealthy, unready or unreachable.
When a node remains tainted as NotReady or unreachable for longer than its default five-minute toleration, Kubernetes’ node-controller and kubelet evict all pods on that node—including the VMI pod—by marking them for termination and deleting them.
Once the VMI pod has been removed, the virt-controller observes the VM’s runStrategy (Always or RerunOnFailure) and spawns a new VMI instance on a healthy node. Additionally, if the node encounters memory or disk pressure, the kubelet may trigger node-pressure eviction, immediately terminating pods (including VMIs) to reclaim resource.
https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/
Thanks @Chetan_Tiwary_ .
I did try something like below.
1. VM1 vmi was running on master01.
2. I blocked the outgoing traffic for the node m01.
3. Kubelet stopped sending the heartbeat and it becomes NotReady.
4. The pod for the VM, is stuck in Terminating phase and never gets evicted.
Have you noticed something like this earlier ?
Red Hat
Learning Community
A collaborative learning environment, enabling open source skill development.