Re: Topic dedicated to course errors

BogdanB · ‎12-13-2020

Hi

since there are quite a few errors in the course I thought it would be a good a ideea for a pinned topic where users should report errors found in the course. This way you could have a main topic to report errors, to mark as solved and also much easier for the tema working with a course to pick up the errors.

Also, users can check there if there is something wrong with script, a type in a command or something like this .

What do you think ?

Lisenet · ‎12-16-2020

Agreed, it would be useful to have a place to report such issues. I don't think that raising a support request in this case is the best option because other members don't have visibility. Sharing findings with the learning community would benefit all RHLS users.

BogdanB · ‎12-23-2020

Since this was my ideea i thought I should go ahead and start to add the various errors. I tought the best ideea is to put each course error on a single post. I'll add my own RH318 next week when I get home ( I have the findings on another pc). Feel free to add what I've missed, I scan the topics the next few days for other things that may be put here.

BogdanB · ‎12-23-2020

RH294 review lab

The grading script for the review-roles lab is broken ( see the screenshot attached). I had a look at the grading script and it seems that the function which checks the remote_user, become_user si become_method if looking in the directory specified by the workdir variable.The previous function, which checks the existance of the ansible.cfg is lookin in the directory specified by the labdir variable. I've tried updating the script with labdir variable and it works.

Original report is Here

BogdanB · ‎12-23-2020

DO447 - Typos, errors and bugs

1.1 Implementing Recommended Practices

register: example_webpage
failed_when: example_webpage.status != 200

-> I do not consider that a best-practice, since the uri module comes with a status_code attribute. You can probably find a better example here, like looking for a text pattern in the web page.

1.3 Managing Ansible Project Materials Using Git

"Bare repositories does not have a local working tree."

"In the preceding example, the most recent commit for the branch master (and HEAD at that time) was commit 5749661, which occurred at some point in the past. A user ran the git branch feature/1 command, creating a branch, feature/1."

-> the commit is actually 7900dd94

2.1 Writing YAML Inventory Files

"These servers themselves form their own groups, so they must end in a colon"

-> the reason why hosts definitions must end in a colon is because they are YAML dictionaries containing their own host's variables (if any), not because they form their own groups (or did you mean "their own blocks" ?)

all:
  children:
    ungrouped:
      notinagroup.lab.example.com:
    mailserver:
      mail.lab.example.com:

-> the hosts keys are missing here

2.5 Lab: Managing Inventories

-> At item 5, it is unclear that the requested numbered naming scheme has to be static and not dynamic.

3.2 Guided Exercise: Controlling Privilege Escalation

force_handlers: True

-> needless here (there is really no reason any task would fail) ; above all, it misleadingly lets the students think that handlers are going to be executed every time, which is a false statement ; I'd rather insert changed_when: true on the 'Ensure haproxy configuration is set' task of role 'haproxy' instead

4.3 Templating External Data using Lookups

"Note that this example may not the most efficient way to do this particular task"

4.6 Guided Exercise: Implementing Advanced Loops

-> At this step there is no IDM or IPA service actually running on utility.lab.example.com. It's installed later at guided exercise 6.4. Therefore the 'ipa_user' module fails in scenario 1. So does the lab data-loops script.

5.4 Guided Exercise: Managing Rolling Updates

"After the playbook deploys the web application, a smoke test ensures that each back-end web server is responds with a 200 HTTP status code."

5.6 Summary

-> 2 orphan closing parenthesis on this page

6.6 Guided Exercise: Accessing Red Hat Ansible Tower

"Review the output of the job execution to determine which tasks were executed. You should see that the msg module was used to successfully display a Hello World! message."

-> the module name is actually debug

7.3 Managing Users Efficiently with Teams

"(instead of read on individual Teams."

-> Closing parenthesis missing

9.5 Lab: Managing Projects and Launching Ansible Jobs

-> The lab's grading script checks that the Developers team has a use role on the Test inventory, which is not required nor a lab objective.

10.10 Summary

"Ansible Tower provides a browsable REST API that can easily be used to automate Ansible Tower operations and integrate it with third-party products."

-> misplaced and repeats 11.6 Summary

11.4 Guided Exercise: Interacting with APIs using Ansible Playbooks

-> In the first playbook 'tower_copy_template.yml', registering the first 'uri' call to grab the inventory id is not only needless but also confusing. Indeed, the retrieved value 'copy.json.inventory' is already an attribute of the newly copied template, not one of the original one.

12.2 Guided Exercise: Importing External Static Inventories

(Step 4.7) "Click the double-arrow icon in the row for the git-inventory source to retrieve the changes. Wait until the cloud icon next to git-inventory is static and green."

-> this step is not needed because we checked the box : UPDATE ON PROJECT CHANGE

12.5 Filtering Hosts with Smart Inventories

"but that is not not the case"

12.6 Guided Exercise: Filtering Hosts with Smart Inventories

"These two systems' facts are available in Ansible Tower's cache because in a previous exercise we executed a job on those managed hosts with a job template that had fact caching enabled."

-> again, it would help a lot to specify which lab at least (it's actually guided exercise 10.2), or even better: leverage on the lab start script to enforce that

14.4 Guided Exercise: Configuring TLS/SSL for Ansible Tower

[root@tower ~]# semanage fcontext -a -t cert_t "/etc/tower(/.*)?"

-> that pretty loose pattern matches all Tower configuration files, most of which are unrelated to certificates

-> Unfortunately, I have never been able to complete this guided exercise, probably because of a previous reset of my utility VM. Even after having run 'lab tower-install start' (from guided exercise 6.4) to re-install IdM, I still missed the certmonger package on my tower VM. After having installed that package manually, I still missed the proper Kerberos configuration. I gave up at that point to save up some of my scarce lab time. So sad that the lab start script does not take care of all that.

15. Comprehensive review

-> Labs' solution are not hidden to make comprehensive review a mock exam

15.3 Lab: Privilege Escalation, Lookups, and Rolling Updates

"If a single host fails to update, the playbook must stop executing immediately."

-> as said before, any_errors_fatal: true should be a valid answer here, but it is not, only max_fail_percentage: 0 is accepted as a valid solution

(step 4) "Introduce logging tasks to register the start and end of the deployment on your control node."

-> Using lineinfile and delegate_to: localhost ends up with concurrent writing operations on the controller, leading to some lines missing, as explained here. Even though that buggy behaviour is more or less mitigated by the batch updates set up at the next step, it is not a very good practice.

15.9: Lab: Testing the Prepared Environment

-> This lab is broken because of improper residuing content in /var/lib/mysql/ on servere, preventing mariadb to start up on that server. That residuing content comes from the scenario 2 of lab "4.6 - Guided Exercise: Implementing Advanced Loops", where mysql-server (and not mariadb-server) was installed.

Workaround:

[root@servere ~]# rm -Rf /var/lib/mysql/*

+ relaunch the Full Stack Deployment workflow Job Template

I may have skipped a lab that cleans up that content at some point during the course.

Original report here - credit goes to @littlebigfab

BogdanB · ‎12-23-2020

DO425 - Typos, errors and bugs

1.5 Describing Least-Privilege Technology

"In UNIX systems, a process with UID 0 can change itself to a more restricted configuration, but any other process can only demote itself to a more restrictive configuration."

-> less?

1.8 Summary

"Model Linux kernels"

-> Modern?

2.3 Identifying Trusted Images

"The following except stores new signatures under the filesystem"

-> excerpt

3.1 Reviewing the OpenShift Automated Build Process

[user@demo ~]$ oc set build-hook bc/build-config-name \
--post-commit \
--command \
--command

-> here a critical space is missing at the last line which should read: -- command

3.1 Reviewing the OpenShift Automated Build Process

[user@demo ~]$ oc set triggers bc build-config-name \
--from-image="imagestream"

"NOTE
The command syntax is misleading. You should monitor for images stream changes, not container image changes."

-> the actual syntax is: --from-image="imagestream:tag" ; what is actually monitored is an image stream tag (thanks @oldbenko for having taught me that !)

3.2 Guided Exercise: Performing a Source-to-Image Build Process

At section 1.2:

 <profile>
    <id>openshift</id>
 </profile>

-> those 3 lines are actually not in security.txt

At section 2.2:

"From the terminal window, run the following command to build the image:"

-> the following command actually creates the project

In this guided exercise, the image trigger that is added to the bc at step 4.1 is misleadingly not the one that actually triggers the re-build, as clearly explained here by @oldbenko

4.4 Guided Exercise: Integrating Red Hat Identity Management and 4.7 Lab: Managing User Access Control

IDM labs are broken when IDM admin password has expired, as I previously mentioned here.

4.6 Guided Exercise: Installing Single Sign-on Authentication

for f in ~/sso72-dev; do oc delete -f $f; done
for f in ~/sso72-dev; do oc create -f $f; done

-> This accidentally works, but only because the -f flag also accepts a directory ; each loop actually iterates only once and therefore is needless

4.7 Lab: Managing User Access Control

"This file already contains the oc adm group sync"

-> oc adm groups sync

5.1 Automating Policy-Based Deployments

"On master nodes, define the experimental-encryption-provider-config experimental-encryption-provider-config in the /etc/origin/master/master-config.yaml file."

-> "experimental-encryption-provider-config" is repeated

10.2 Lab: Securing Single Container Applications and 10.3 Lab: Securing Multiple Container Applications

-> Those 2 comprehensive review labs'solutions are not hidden, as previously mentioned here.

10.2 Lab: Securing Single Container Applications

"The jrdevs group has the edit and jruser1 users as members of that group. The password for jruser1 is redhat."

-> 'edit' is actually not an user but a role

-> Both the solution and the grading script refer to a jrdev1 user instead of jruser1

"create an Apache HTTP Server pod named testhttpd [...] Leave the testhttpd pod running..."

-> testhttp (not testhttpd) is the name used in all the rest of the course and in the grading script

"Application pods run with resource limits set to 200 millicores of CPU and 384 Mi of memory."

-> Such limits can only be set to containers, not to pods ; both the lab solution and the grading script rely on 'type=Container' for those limits

"Deploy the compreview-singe/customapp container image from Quay"

-> "compreview-single/customapp"

At section 2.6:

"You should see a subdirectory named from the httpd-24-rhel7 image."

-> It's actually named compreview-single

At section 6.1:

    requests.cpu: "1000m"
    requests.memory: "1536Mi"

-> Such global requests quotas for the project are not requested by the lab instructions and not checked by the grading script either

Sections 7.10 to 7.17 erroneously repeat section 7.2 to 7.8.

The whole section 9 erroneously repeats section 8.

10.3 Lab: Securing Multiple Container Applications

"Access Jenkins at htps://jenkins-cicd.apps.lab.example.com and log in to Jenkins as the developer user from OpenShift."

-> https

"Deploy the egress router pod using the egress-router-pod.yaml file [...]

Edit these files as required to provide access to the external database [...]

The stage environment accesses an external MariaDB database server on the services VM to best mimic the production environment for load testing. [...]"

-> The third (introductory) stanza should be put first here

"To allow secure storage access from compreview-multiple-dev and other development projects, create the devdb-scc security context constraint. This SCC allows containers to run with the GID range from 10000 to 20000."

-> A runAsAny policy (with appropriate range) matches the requirement, but the grading script erroneously expects a mustRunAs policy

The Comprehensive review (chapter 10) - Lab: Securing Single Container Applications solution is actually not working - it seems like there is a problem with the CA certificate for Quay Enterprise:

The folder /etc/docker/certs.d/quay.apps.lab.example.com/ already exists on all of the nodes and contains the correct ca.crt (the exact copy of /etc/pki/CA/cacert.pem from Workstation)
When you follow the solution TO THE DOT (i tried 3 times - while resetting all the machines in the lab), you won't get pass step 4.4:

oc new-app --name=test --docker-image quay.apps.lab.example.com/compreview-single/customapp

Error:

[W0106 14:33:18.789889 6098 dockerimagelookup.go:233] Docker registry lookup failed: Get https://quay.apps.lab.example.com/v2/: x509: certificate signed by unknown authority

Chapter 6 GE1 credencials instead of credentials

Chapter 9 Lab section 1.1 - command is missing --discover at the end, correct command below

iscsiadm -m discovery -t st -p services.lab.example.com:3260 --discover

Chapter 10 Securing multiple applications

It it requested that you start httpd pods with certain labels. However, for the labels that do not match the network policies there is no label specified but the grading script checks for a certain label.

Lab 10.1 The grading script checks if the policy.json file is the same on all nodes. However, if you have the same rules but in different order you will get fail even tough the rules are the same. I discovered this while debugging other issues.

Original findings here , credit goes to @littlebigfab @LucianMaly for most of the findings

Jacek · ‎12-24-2020

regarding:
[W0106 14:33:18.789889 6098 dockerimagelookup.go:233] Docker registry lookup failed: Get https://quay.apps.lab.example.com/v2/: x509: certificate signed by unknown authority

solution was provided here:

https://learn.redhat.com/t5/Red-Hat-Learning-Subscription/DO425-Typos-errors-and-bugs/td-p/6170

The root cause for this is that there is a reference to an additional CA file in the ImageImportPolicy section of master-config.yaml, attribute AdditionalTrustedCA.

There are two possible solutions:

remove the AdditionalTrustedCA attribute and restart the controllers
append the Quay CA cert to that file and restart the controllers

vhoebel · ‎11-23-2021

DO180, running "lab-configure" on the workstation

I entered the credentials for OCP and after a while, lab-configure will throw an error, claiming:

ERROR:

Cannot login to OpenShift using your developer credentials.

Logging into the OCP web console with the same credentials is possible, though. I also double-checked the values for "API Endpoint", "Username" and "Password".

I can reach the API Endpoint from my workstation (tested by using curl).

mr-igor · ‎11-23-2021

@vhoebel wrote:
DO180, running "lab-configure" on the workstation
I entered the credentials for OCP and after a while, lab-configure will throw an error, claiming:
ERROR:
Cannot login to OpenShift using your developer credentials.
Logging into the OCP web console with the same credentials is possible, though. I also double-checked the values for "API Endpoint", "Username" and "Password".
I can reach the API Endpoint from my workstation (tested by using curl).

This might be related (at least it was in my case) to the recent issues with the labs. Re-creating the lab environment might help. Observe the Data Center region hosting your OpenShift cluster in your old and the new environments (e.g. API Endpoint: https://api.<dc-region-code>.prod.nextcle.com:6443)