Just encountered what appears to be a catastrophic bug in either:
1. The "lab finish policy-review" command at the end of Chapter 3; or
2. The "lab start --training do0004l network-analyze" command in Chapter 4
At the end of Chapter 3 after running the "lab finish" command, I stopped the lab environment for breakfast. Upon restarting the lab environment to attempt the "network-analyze" lab in Chapter 4, multiple RHACS components were reported as unhealthy in both the central and secured clusters.
RHACS components unhealthy. The central cluster is healthy since I fixed the problem on the central cluster prior to taking the screenshot.
Checking the RHACS components (deployments, daemonsets) in namespace "stackrox" reveal multiple components are not available. Listing all pods in "stackrox" namespace reveal that pods for some components were not created at all.
RHACS sensor and collector workloads not available
RHACS sensor and collector pods missing entirely, scanners in CrashLoopBackOff state
Upon checking the Kubernetes events, the root cause was apparent: some ServiceAccounts in namespace "stackrox" were missing ClusterRoleBindings for their respective SecurityContextConstraints.
RHACS collector ServiceAccount missing binding for appropriate SCC
I fixed the issue for the central cluster by identifying the required SCCs for the affected ServiceAccounts and re-assigning them with the usual flow:
1. Identify the ServiceAccount used for each workload
2. Identify the SCC appropriate for each workload with "oc adm policy scc-subject-review"
3. Bind the appropriate SCC to the ServiceAccount with "oc adm policy add-scc-to-user"
4. Restart the affected workloads with "oc rollout restart"
While this issue is fixable without re-creating the lab environment, I assume this isn't intended behavior? Hope the course authors can look into the scripts to identify and fix the bug, thanks!
Just spent a few lab hours trying to narrow the cause of the issue. Unfortunately, I was unable to reproduce the issue consistently with the following steps:
1. Stop and restart the lab environment
2. Run the "lab start" and "lab finish" commands for the lab "policy-pipeline", then stop and restart the lab environment
3. Run the "lab start" and "lab finish" commands for the lab "policy-review", then stop and restart the lab environment
4. Walk through the "policy-pipeline" lab, then stop and restart the lab environment
5. Walk through and complete the "policy-review" lab, then stop and restart the lab environment
For anyone else encountering a similar issue, I would recommend fixing the issue manually or re-creating the lab environment altogether. Feel free to share your experience if you've encountered a similar issue.
Just encountered the issue again so likely a bug with the grading scripts, though I have yet to pinpoint the exact cause. Here is the order of labs I have attempted from a fresh lab environment before stumbling upon the issue:
1. vulnerability-review
2. policy-review
3. network-review
4. compliance-review
5. policy-pipeline
For each lab, I have run both the "lab start" and "lab finish" commands. Furthermore, between each lab, I deleted all RHACS pods in the "stackrox" project for both central and managed clusters, then logged in to RHACS Central to verify the platform health.
For all 5 labs I have completed in succession, the cluster status was "Healthy" for both clusters in between and after each lab attempt. However, after attempting the 5 labs in the order mentioned above and stopping + restarting the environment, I was faced with the same issue. This time, only the managed cluster was affected and the cause of the issue was identical - missing SCC bindings to RHACS-related ServiceAccounts.
Let me fix the issue in the current lab environment and run a few tests to verify which lab is causing the issue.
Just spent a few lab hours trying to narrow the cause of the issue. Unfortunately, I was unable to reproduce the issue consistently with the following steps:
1. Stop and restart the lab environment
2. Run the "lab start" and "lab finish" commands for the lab "policy-pipeline", then stop and restart the lab environment
3. Run the "lab start" and "lab finish" commands for the lab "policy-review", then stop and restart the lab environment
4. Walk through the "policy-pipeline" lab, then stop and restart the lab environment
5. Walk through and complete the "policy-review" lab, then stop and restart the lab environment
For anyone else encountering a similar issue, I would recommend fixing the issue manually or re-creating the lab environment altogether. Feel free to share your experience if you've encountered a similar issue.
Red Hat
Learning Community
A collaborative learning environment, enabling open source skill development.