I've looking at running Red Hat HA VMs on a VMware platform. The VMware platform has clusters of ESXi hosts; those clusters have vMotion enabled, which enables cluster-level VMware updates (VUM) as well as maintenance mode on individual ESXi hosts. I've discovered Red Hat documentation at Support Policies for RHEL High Availability Clusters - General Conditions with Virtualized Cluster M... which says, in part:
No support for live migration of active cluster nodes: Red Hat does not provide support for concerns or behaviors arising out of situations in which a node is or may have been live migrated across hypervisors or hosts.
Does anyone here run Red Hat HA VMs on VMware clusters? If so, do you disable vMotion on those clusters? Have you seen any issues related to vMotion in the HA VMs?
VM live migration and RHEL HA would intorduce 2 levels of HA. The design documentation(https://access.redhat.com/articles/3349791) for RHEL-HA mentions to add an additional clustered-VM which would be useful during failover/maintenance activity.
Having said that I have not used live migration for RHEL-HA. Below are the reasons why the live migration is not supported (Ref:- https://access.redhat.com/articles/3349791 )
VM administration requirement: Prevent live migration while VM is active in cluster
See: Support policies - General conditions with virtualized cluster members
- If using DRS VM-host affinity (as in above recommendation), live migration is prevented.
- If manually administering VM distribution across hosts, or if not using DRS VM-host affinity - ensure policies and practices prevent any live migration while a VM is active. Always stop the RHEL HA cluster services (pcs cluster stop) before migrating any VM.
- Live migration introduces a pause in VM processing, which can disrupt High Availability membership (and is often observed to, in real production environments). This pause is unpredictable and configuring HA to definitively avoid it is difficult.
- Live migration by vMotion takes special measures to update multicast group registration of VMs at their new host. This has not been assess by Red Hat to determine if RHEL High Availability's udp (multicast) transport protocol is compatible with these measures.
- Live migration may cause any static STONITH configuration host <-> VM mappings in the cluster to become incorrect. Even if there is a plan to update STONITH configuration before or after migration, there would be a period either at the front-end or back-end of the migration where STONITH settings would be incorrect and STONITH could fail if the VM became unresponsive - potentially blocking cluster operations.
I think we can continue to use the VUM by just following the Red hat recomendations
1) Creating an additional clustered-VM which will take care of the RHEL-HA during VM-host upgrade.
2) Stop the pcs cluster services on the VM that is located on the VM-host which is going for the upgrade.
3) In the longrun one can combine ansible scripts and VUM to get a seamless solution for such upgrades.
Would you have an entirely separate VSphere cluster, then? Because VUM (at the VSphere cluster level) would expect to roll through each ESXi host and put it into maintenance mode, outside of your direct control. To make your steps work, I think that additional VM would need to be in a separate VSphere cluster.
This procedure could work, but it still means that you're running an HA VM on a VSphere cluster where vMotion is enabled. Is that something you're doing today, and have you seen any issues with it?