A playbook is configured to be executed on a single host.
That playbook has a single play.
That single play has 4 tasks.
If the first task fails on that single host, will the remaining 3 tasks execute?
If the first task fails on a host, Ansible won’t run the next tasks on that host; it stops right there. That’s the default behavior to avoid making things worse after something goes wrong. But if you want Ansible to keep going even when something fails, you can tell it to do so by adding ignore_errors: yes to that task. A better way, though, is to use a block with rescue and always sections, which lets you handle the failure more cleanly and still run follow-up steps if needed. This approach is clearer and is also recommended as a best practice.
@Trevor Very important concept on Ansible error handling in playbooks.
By default, Ansible stops executing tasks on a host when a task fails on that host.
You can use ignore_errors: yes on the failing task to allow the play to continue despite failure.
But the ignore_errors directive in Ansible only takes effect if a task actually runs and fails. It won’t help if the playbook hits problems like missing variables, connection issues, missing packages, or syntax errors in which case Ansible will still stop if those things happen.
You can use the ignore_unreachable keyword to skip task failures when a host is marked as ‘UNREACHABLE’. This lets Ansible bypass errors for those hosts and continue running tasks that follow, even if it couldn’t connect to them. Normally, when Ansible cannot reach a host, it flags the host as ‘UNREACHABLE’ and removes it from the active list for the remainder of the play.
Also, Ansible gives you control over task failures with the failed_when conditional. By default, if you specify a list of conditions under failed_when, the task only fails if all are true (implicit “and”). If you want the task to fail when any one condition is met, you need to combine them in a string using the explicit “or” operator.
Last but not the least :
You can manage how Ansible handles task failures by grouping tasks in a block and using the rescue and always sections.
If a task inside a block fails, the tasks under rescue will run automatically, letting you handle errors in a way that is similar to exception handling in programming.
The always section runs tasks no matter what, whether there was an error or not.
Keep in mind, though - {{ if a failure is due to an invalid task (like a syntax mistake) or an unreachable host, neither rescue nor always will be triggered during playbook execution }}
https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_error_handling.html
https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_blocks.html#block-error-handling
A very good and thorough overview of failures and using failures. I would like to add one more thing in here that I haven't always done, but leveraging larger jobs and workflows in AAP Controller has made me thing more about handling failures (think a job that could run for well over an hour or more and a failure towards the end ... how do you recover?).
Some modules can fail for reasons that have nothing to do with a system or playbook or even your local networking, so while block/rescue/always can sometimes help you recover it might not be sufficient for network blips or remote system load. So while it doesn't involve the fail keyword, I have found this to be invaluable lately.
# Container image handling tasks - name: Pull images from external registries containers.podman.podman_image: name: "{{ item }}" validate_certs: false pull: true loop: "{{ container_images }}" retries: 5 delay: 20
What does this task do ... so glad you asked.
Essentailly this is a great working task to pull images from remote container registries. What happens if it can't get an image, well it failsl. So assuming you are pulling a bunch of container images and those might be coming from the same registry, any number of things can happen ...
Any of the above scenarios could cause the image download to fail. However, you know the images exist and you know the playbook works. One thing you can add at the module level in the task is the retries and delay option. What this will do is attempt to retry for X times and the delay will wait Y seconds before the retry. So one could argue that this does some self-healing from failures. This is different from block/rescue/always in those instances, the rescue will often try to fix something and get you to the end goal (yes, rescue could just try the same thing again - like installing a service because it failed the first time), but not always and the block/rescue/always is still a one and done.
The retries/delay gives a chance to try until it works and specify timeouts for things like API rate limiting and minor network glitches. So this can be very valuable for long running playbooks and workflows as rather than failing, it will stay at that task until it can complete successfully based on the parameters that you've provided.
Red Hat
Learning Community
A collaborative learning environment, enabling open source skill development.