cancel
Showing results for 
Search instead for 
Did you mean: 
ric
Flight Engineer Flight Engineer
Flight Engineer
  • 596 Views

Feature Request: Pause / Suspend / Hibernate the Lab Environment instead of Stopping (shutting down)

Jump to solution

Hi everyone,

As we know, the "Lab Environment" in the "Red Hat Online Learning (ROL)" courses allows the user to "Stop" that lab environment. That possibility to "stop" the "Lab Environment" has the advantage, for instance, in a RHLS (Red Hat Learning Subscription), to stop "Lab Hours" usage.

However, "stopping" a "Lab Environment", as far as I can tell, has the effect of shutting down the Virtual Machines (VM) of that Lab. That is bad, particularly in cases where the VM and the services take long to start, as is the case of OCP (OpenShift Container Platform) based courses, where the Cluster can take more than 30 minutes to start (leading to that long running "Verifying Cluster State" message, when doing a "lab start" command in some courses, such as the "DO188 - Introduction to Containers with Podman" course).

So, my request is the one that I put in the title: instead of stopping the Lab Environment, I would like to "Pause" or "Suspend" it (in the sense of "suspend to RAM") and/or to "Hibernate" it (in the sense of "suspend to Disk"), which should also lead to stop the usage of "Lab Hours". When I wanted to resume the Lab Environment, I would do just that: resume it and that would then bring the several VM to the exact same state as they were, before being "Paused" / "Suspended" / "Hibernated".

What do you think of this idea? By the way: I've searched these RHLC (Red Hat Learning Community) forums to see if this idea of pausing / suspending / hibernating the VM had been discussed before, but I couldn't find any similar discussion.

Thanks in advance!

Labels (4)
1 Solution

Accepted Solutions
Travis
Moderator
Moderator
  • 563 Views

@ric -

This isn't a bad idea, but unfortunately it won't work and wouldn't help your request. In fact, it could possibly make the situation worse and your OpenShift lab environment completely unusable.

We are using Red Hat OpenStack (RHOSP) to host the virtual lab environments so in theory, it would tecnically be possible to "suspend" systems. This would require a larger amount of disk space and it would, depending on the amount of RAM systems have potentially take longer for the systems to boot.

The OpenShift course lab environments are extremely tricky (whether it is the SNO courses) or the multi-node cluster backed courses. Why is this a problem ... simply put, OpenShift isn't meant to be suspended (for that matter it isn't really meant to be taken offline either). There are tons of containers making up the OpenShift infrastructure and tons of services (including ETCD). Unfortunately, suspending the master nodes and compute nodes of an OCP cluster can cause the system API to become unstable. With the powering down of the environment, it can at least boot up and come up clean. Nodes would most likely become marked as "NotReady", "Failed", or something else. Additionally, what we have done in an attempt to increase performance is used local ETCD storage so that it is only on the compute node that the master is launched, so there is no guarantee that when you "resume" the system that you are on the same OSP compute node as the time when the system was suspended. It is much more reliable (even though it is slow and takes almost 20 minutes - sometimes longer depending on load) to have the system safely powered down and then booted up.

@bchardim - feel free to add anything else to this conversation if I missed something.

 

Travis Michette, RHCA XIII
https://rhtapps.redhat.com/verify?certId=111-134-086
SENIOR TECHNICAL INSTRUCTOR / CERTIFIED INSTRUCTOR AND EXAMINER
Red Hat Certification + Training

View solution in original post

5 Replies
Travis
Moderator
Moderator
  • 564 Views

@ric -

This isn't a bad idea, but unfortunately it won't work and wouldn't help your request. In fact, it could possibly make the situation worse and your OpenShift lab environment completely unusable.

We are using Red Hat OpenStack (RHOSP) to host the virtual lab environments so in theory, it would tecnically be possible to "suspend" systems. This would require a larger amount of disk space and it would, depending on the amount of RAM systems have potentially take longer for the systems to boot.

The OpenShift course lab environments are extremely tricky (whether it is the SNO courses) or the multi-node cluster backed courses. Why is this a problem ... simply put, OpenShift isn't meant to be suspended (for that matter it isn't really meant to be taken offline either). There are tons of containers making up the OpenShift infrastructure and tons of services (including ETCD). Unfortunately, suspending the master nodes and compute nodes of an OCP cluster can cause the system API to become unstable. With the powering down of the environment, it can at least boot up and come up clean. Nodes would most likely become marked as "NotReady", "Failed", or something else. Additionally, what we have done in an attempt to increase performance is used local ETCD storage so that it is only on the compute node that the master is launched, so there is no guarantee that when you "resume" the system that you are on the same OSP compute node as the time when the system was suspended. It is much more reliable (even though it is slow and takes almost 20 minutes - sometimes longer depending on load) to have the system safely powered down and then booted up.

@bchardim - feel free to add anything else to this conversation if I missed something.

 

Travis Michette, RHCA XIII
https://rhtapps.redhat.com/verify?certId=111-134-086
SENIOR TECHNICAL INSTRUCTOR / CERTIFIED INSTRUCTOR AND EXAMINER
Red Hat Certification + Training
bchardim
Flight Engineer
Flight Engineer
  • 507 Views

@Travis, your analysis on the impact of suspending an OCP lab environment is right. It is much better for these labs to power down than suspend.

ric
Flight Engineer Flight Engineer
Flight Engineer
  • 359 Views

Hi, @Travis and @bchardim 

Thank you very much for your replies, explaining why it would be troublesome to allow to Pause / Suspend / Hibernate the (Virtual Machines of the) Lab Evironment in the "Red Hat Online Learning (ROL)" courses, particularly in the case of OCP (OpenShift Container Platform) based  courses  such as the "DO188 - Introduction to Containers with Podman" course).

In the "DO188 - Introduction to Containers with Podman" course, I've noticed that there is now a popup window that gives a related note (about the long time that the OpenShift Cluster takes to start). Here's the screenshot of that popup window:

"Create your lab environment" popup window"Create your lab environment" popup window

 

Text of that popup window screenshot for easier searching and reference:

-----------------------------------------------------------------------------------------------------

Create your lab environment

The lab environment to complete these guided exercises and labs uses an embedded OpenShift cluster. Start the environment now, from the Lab Environment tab, because it might take up to 30 minutes for the cluster to be ready. In the meantime, you can work through the initial sections in the first chapter.

To check the OpenShift cluster's status, after your workstation VM is running and you can access the console, run the following SSH command to the utility VM as the lab user: ssh lab@utility, and run the following script: ./wait.sh

 

This script checks the status of all of the required components of the OpenShift cluster and its underlying infrastructure and returns when everything is ready.
-----------------------------------------------------------------------------------------------------

And, indeed the Lab Environment in the DO188 course seems to be taking about 35 minutes to complete the start of the "OpenShift Cluster", as we can see by running that wait.sh shell script in the utility VM (Virtual Machine) using the lab user, in a recently recreated Lab Environment (I'm based in Lisbon - Portugal - Europe, in case that matters):

[lab@utility ~]$ time ./wait.sh
Waiting for OpenShift cluster start...
API is up
Cluster version is 4.14.0
Router is up
Waiting for authentication...
Waiting for authentication...

(...) Many identical lines snipped ...

Waiting for authentication...
Waiting for authentication...
Waiting for authentication...
Authentication is ready
The ingress operator is ready
Waiting for the kube-apiserver operator to be ready...
Unable to connect to the server: EOF
Could not reach Openshift API, trying again...
Waiting for the kube-apiserver operator to be ready...
Unable to connect to the server: EOF
Could not reach Openshift API, trying again...
Waiting for the kube-apiserver operator to be ready...
Unable to connect to the server: EOF
Could not reach Openshift API, trying again...
Waiting for the kube-apiserver operator to be ready...
Unable to connect to the server: EOF
Could not reach Openshift API, trying again...
Waiting for the kube-apiserver operator to be ready...
Waiting for the kube-apiserver operator to be ready...
Waiting for the kube-apiserver operator to be ready...
Waiting for the kube-apiserver operator to be ready...
Waiting for the kube-apiserver operator to be ready...
Waiting for the kube-apiserver operator to be ready...
Waiting for the kube-apiserver operator to be ready...
The kube-apiserver operator is ready
Machine Config Operator changes applied
API is up
Cluster version is 4.14.0
Router is up
Authentication is ready
The ingress operator is ready
Waiting for the kube-apiserver operator to be ready...
Waiting for the kube-apiserver operator to be ready...
Waiting for the kube-apiserver operator to be ready...
Unable to connect to the server: EOF
Could not reach Openshift API, trying again...
Waiting for the kube-apiserver operator to be ready...
Waiting for the kube-apiserver operator to be ready...
Waiting for the kube-apiserver operator to be ready...
Waiting for the kube-apiserver operator to be ready...
Waiting for the kube-apiserver operator to be ready...
Waiting for the kube-apiserver operator to be ready...
Waiting for the kube-apiserver operator to be ready...
The kube-apiserver operator is ready
Machine Config Operator changes applied


API is up
Cluster version is 4.14.0
Router is up
Authentication is ready
The ingress operator is ready
The kube-apiserver operator is ready
Machine Config Operator changes applied
[OK] OpenShift cluster ready.

real 35m20.356s
user 0m10.943s
sys 0m2.770s
[lab@utility ~]$

 

0 Kudos
garvchaudhary
Flight Engineer
Flight Engineer
  • 355 Views

Just lab building process takes around 1 hour of 80 hours limit of time for your course . So everytime if you start or restart or reset your machine it kills your 1 hours of your time. This is not good . Lab preparation should not kill your total lab hours .

bchardim
Flight Engineer
Flight Engineer
  • 248 Views

I understand what you are saying, but keep in mind that Openshift is not designed to stop/start all of the ocp nodes every day. Every time it happens (stop/start all ocp nodes), the cluster takes 35 min approx to reconcile all the OCP resources and complete the startup. So in our model, this is the best of the workaround that we can apply.

0 Kudos
Join the discussion
You must log in to join this conversation.