Just encountered what appears to be a catastrophic bug in either:
1. The "lab finish policy-review" command at the end of Chapter 3; or
2. The "lab start --training do0004l network-analyze" command in Chapter 4
At the end of Chapter 3 after running the "lab finish" command, I stopped the lab environment for breakfast. Upon restarting the lab environment to attempt the "network-analyze" lab in Chapter 4, multiple RHACS components were reported as unhealthy in both the central and secured clusters.
RHACS components unhealthy. The central cluster is healthy since I fixed the problem on the central cluster prior to taking the screenshot.
Checking the RHACS components (deployments, daemonsets) in namespace "stackrox" reveal multiple components are not available. Listing all pods in "stackrox" namespace reveal that pods for some components were not created at all.
RHACS sensor and collector workloads not available
RHACS sensor and collector pods missing entirely, scanners in CrashLoopBackOff state
Upon checking the Kubernetes events, the root cause was apparent: some ServiceAccounts in namespace "stackrox" were missing ClusterRoleBindings for their respective SecurityContextConstraints.
RHACS collector ServiceAccount missing binding for appropriate SCC
I fixed the issue for the central cluster by identifying the required SCCs for the affected ServiceAccounts and re-assigning them with the usual flow:
1. Identify the ServiceAccount used for each workload
2. Identify the SCC appropriate for each workload with "oc adm policy scc-subject-review"
3. Bind the appropriate SCC to the ServiceAccount with "oc adm policy add-scc-to-user"
4. Restart the affected workloads with "oc rollout restart"
While this issue is fixable without re-creating the lab environment, I assume this isn't intended behavior? Hope the course authors can look into the scripts to identify and fix the bug, thanks!
Red Hat
Learning Community
A collaborative learning environment, enabling open source skill development.