Hello everyone!
I was testing some taint commands and after creating a NoExecute taint on node master01, I lost communication with the cluster API.
After some thoughts I realize this evicted all the pods from the node, and since this is single node cluster, the issue hapenned.
What are the ways to recover from this scenario other than recreating the lab environment?
I know this is not under the course scope and I appreciate all the collaboration in this topic.
@Emanuel_Haine that will be disastrous to the single node OCP cluster and you will not be able to run any oc commands then. I think you need to recreate the cluster i.e recreate the labs here.
@Emanuel_Haine that will be disastrous to the single node OCP cluster and you will not be able to run any oc commands then. I think you need to recreate the cluster i.e recreate the labs here.
@Chetan_Tiwary_ , let's suppose I have a similar environment outside the lab env. In case I ran the same command, is there a way to recover the cluster? I am asking this focusing on learning in how to get out of some scenarios. Is there a way to edit this value inside etcd pod?
@Emanuel_Haine It is only possible to recover an OpenShift cluster if there is still a single integral master left. I dont think we can perform the disaster recovery with the etcd backup here in this case.
But yes, you can recreate the cluster and redeploy your applications using, for example , GitOps so that you can be back online in the original state again as soon as possible. But for the stateful apps you need to have backup of the persistence volume. So, it will be good to have backup solution like OADP but keep in mind - OADP does not serve as a disaster recovery solution for etcd or OpenShift Operators.
Thank you, @Chetan_Tiwary_
Red Hat
Learning Community
A collaborative learning environment, enabling open source skill development.