cancel
Showing results for 
Search instead for 
Did you mean: 
smi-andrew
Flight Engineer
Flight Engineer
  • 558 Views

Controller schedule job help

Jump to solution

I currently have a ansible role I am working on that does os_patching for us. In that role I need **bleep** to take snapshots and then create a scheduled job in AAP to remove those snapshots after 1 week. I’ve got most of this working using the vmware.vmware or vmware_rest collections. Right now I’m running into an issue with the ansible.controller.schedule module. I am aware of the set_fact statement that has a date WAY in the past. My main issue comes down to making the rrule work. This seems to be where the role fails in my AAP instance and I’m not understanding why.

Here is the code I have written:

- name: Calculate 'DTSTART' 7 days in the future using shell
  ansible.builtin.shell: |
    date -d "+7 days"
  register: future_rrule_shell
  delegate_to: localhost

- name: Set 'dynamic_rrule' fact from shell output
  ansible.builtin.set_fact:
    dynamic_rrule: "{{ future_rrule_shell.stdout }}"

- name: Create a ruleset for everyday except Sundays
  ansible.builtin.set_fact:
    complex_rule: "{{ lookup(awx.awx.schedule_rruleset, '2022-04-30 10:30:45', rules=rrules, timezone='UTC' ) }}"
  vars:
    rrules:
    - frequency: 'day'
      interval: 1
    - frequency: 'day'
      interval: 1
      byweekday: 'sunday'
      include: False

- name: Schedule a one-time snapshot cleanup in AAP (T+{{ snapshot_ttl_days|default(7) }} days)
  ansible.controller.schedule:
    controller_host: "{{ lookup ('env', 'CONTROLLER_HOST') }}"
    controller_username: "{{ lookup ('env', 'CONTROLLER_USERNAME') }}"
    controller_password: "{{ lookup ('env', 'CONTROLLER_PASSWORD') }}"
    validate_certs: "{{ controller_validate_certs | default(true) }}"
    enabled: true
    job_type: run
    unified_job_template: vmware_snapshot_cleanup # your JobTemplate name
    name: "Cleanup {{ vm_name | default(inventory_hostname_short) }} pre-change on {{ selected_aap_instance.name }}"
    rrule: "DTSTART:{{ dynamic_rrule }}"
    #extra_data:
    #  vcenter_hostname: "{{ _chosen_vcenter }}"
    #  vcenter_username: "{{ vcenter_username }}"
    #  vcenter_password: "{{ vcenter_password }}"
    #  vcenter_validate_certs: "{{ vcenter_validate_certs | default(false) }}"
    #  vm_id: "{{ _vm_id }}"
  delegate_to: localhost

Here is the output from AAP.

 

 
[ERROR]: Task failed: Module failed: 'NoneType' object has no attribute 'replace'
Origin: /runner/requirements_roles/os_patching/tasks/vmware/schedule_removal.yml:64:3
62     dynamic_rrule: "{{ future_rrule_shell.stdout }}"
63
64 - name: Schedule a one-time snapshot cleanup in AAP (T+{{ snapshot_ttl_days|default(7) }} days)
     ^ column 3

fatal: [testserver03.example.com -> localhost]: FAILED! => {
    "changed": false,
    "msg": "Task failed: Module failed: 'NoneType' object has no attribute 'replace'"
}
<localhost> EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1762377189.6961734-301-224640049938324/ > /dev/null 2>&1 && sleep 0'

fatal: [testserver01.example -> localhost]: FAILED! => {
    "changed": false,
    "msg": "Task failed: Module failed: 'NoneType' object has no attribute 'replace'"

Any help is greatly appreciated!

Labels (3)
3 Solutions

Accepted Solutions
Travis
Moderator
Moderator
  • 283 Views

@smi-andrew -

This will be a tough one to answer as there aren't quite enough details to fully troubleshoot. I can offer some more insight into how to troubleshoot and update the post for better responses.

You don't have a complete playbook or role structure setup. You are mentioning using a ROLE to perform the work, so I can only imagine that your snippet above is the main.yml from a portion of the ROLE tasks directory. Additionally, you aren't specifying the version of Ansible Automation Controller (AAP) being used nor the version of the collections. You are also capturing output without showing what is in there so the facts you are setting and using as variables could be causing some or part of the issue. You also aren't showing line numbers in the snippets of the tasks nor a complete output of the failure message. Generally when you have the playbook/task/role failures, it is at least showing task and playbook names in there, but those pieces have been cutout of your copy/paste.

From the pieces that are there ...

os_patching/tasks/vmware/schedule_removal.yml

Indicates the problem is here, so it seems it is the tasks snippet section in the role. It is most likely not liking something in the future_rrule_shell.stdout captured output. 

I'm sure you've looked at the module docs already, but for others and a complete answer, I'm including some links here.

https://docs.ansible.com/projects/ansible/latest/collections/awx/awx/schedule_rruleset_lookup.html - Is a lookup filter and will generate a string (very important detail). 

I would also caution about mixing in community collections with AAP, especially as there is some more divergence with AAP 2.5 and beyond. When at all possible try to use the officially supported Ansible version of the collections as they may have slightly different syntax and formatting. The bigger thing is with version alignment as the community collections follow along the development of AWX and can often be ahead of Controller (most of the time on the older AAP versions I could use interchangibly, but it isn't quite as clear cut now).

I've posted some of the official collections here ...

https://console.redhat.com/ansible/automation-hub/repo/published/ansible/controller/content/lookup/s...

https://console.redhat.com/ansible/automation-hub/repo/published/ansible/controller/content/lookup/s...

https://console.redhat.com/ansible/automation-hub/repo/published/ansible/controller/content/module/s...

So in your tasks above, you are mixing the community ruleset lookup but the official ansible.controller collection schedule module. I've placed the links for the official ruleset lookup above too. (I doubt this is the actual issue, but just in case, it can help eliminate variables and unknowns and cleanup the role some).

That being said, you might want to perform some debugging and even eliminate (comment out some of the tasks) so you can pretend to see what your tasks will do. At a minium, I would definely do debugging to see what is actually contained within your variabels.

- name: Debug scheduling variables
  debug:
    msg: 
      - "RRule stdout: {{ future_rrule_shell.stdout }}"
      - "RRule type: {{ future_rrule_shell.stdout | type_debug }}"
      - "Snapshot TTL: {{ snapshot_ttl_days|default(7) }}"

I think these are the variables/facts you are captuing and this can give you an all-in-one output. It also might not hurt to see what complex_rule and some of the others evaluate to as you are using those in the tasks as well. I suspect it doesn't like what you are captuing and there are some null values in there or incorrect "types" and the module doesn't know what to do. 

64 - name: Schedule a one-time snapshot cleanup in AAP (T+{{ snapshot_ttl_days|default(7) }} days)
     ^ column 3

This one is throwing me for a bit of a loop as it presents like a YAML syntax issue, and this could be the only error (again, not sure without full line numbers and context), but it could be there is an indentation error causing the invalid or NULL data to be read by the module. 

One quick fix to the schedule task to see if it is the variable which shouldn't cause too much harm because you can delete it is to set the rule to a default empty string.

rrule: "{{ future_rrule_shell.stdout | default('') }}"

This would be replacing your run rule in the SCHEDULE task. So if the stdout is bad and not a string, you would get a blank/empty string in there.

 

Travis Michette, RHCA XIII
https://rhtapps.redhat.com/verify?certId=111-134-086
SENIOR TECHNICAL INSTRUCTOR / CERTIFIED INSTRUCTOR AND EXAMINER
Red Hat Certification + Training

View solution in original post

Travis
Moderator
Moderator
  • 281 Views

@smi-andrew -

I'm doing this as a separate post as I'm wondering if you can think of the problem and solution a little differently. It seems like you have a very solid infrastructure and a lot of tools in place already as well as some other scripts and REST APIs out there. 

You have this as a central component ...

I need **bleep** to take snapshots and then create a scheduled job in AAP to remove those snapshots after 1 week.

Basically, if we look at your actual problem, you are creating a lot of VMWare snapshots when running backups and you want to delete the older snapshots after they are 7 days (1 week old). I'm wondering if there is a much cleaner way to do this and have way more control as I don't want you to get rid of things accidentally that you might be using/needing later. Have you considered an alternative approach (you could still use Ansible Controller, but not in the scheduling method you want to do).

A high-level overview of how I might tackle the problem is creating a script or using an existing API that will list the snapshots in the date range I want to delete. I would run that multiple times by hand and error-check and verify it includes all the snapshots that I think I want ... this is Part 1, getting and verifying the list of images I want to delete and clean up.

Part 2: I would possibly modify the vmware_snapshot_cleanup to take and accept a list of images to delete. Maybe make a new playbook and job template for this for testing. You can then take the list of VM snapshots from part 1 and manually run the vmware_snapshot_cleanup playbook to see if you get the required results and successful deletion.

Part 3: Integrate Parts 1 & 2 into the patching workflow. It can create snapshots and patch the systems and when it gets to the cleanup part, instead of doing complex things with the Ansible Controller Scheduler that is "hidden" from you, you would be calling your API or scrpt that you used in part 1 and registering those as variables. You could then pass those variables and use an API call/webhook or something to trigger running the Job Template that does the cleanup passing in the images you wish to delete. 

While this approach might not be as fancy and require possibly more work, you have much more control and you have ways to look at things. More importantly, it makes it much easier to see what is going on ... one other thing the OCD in me would do is actually put an approval node in the workflow and have it send a list of VM snapshots that would be deleted (since you've captured the entire list), and then let someone manually approve the deletion as this would add an addtional layer of protection and provide piece of mind that you know what is being deleted and when.

Travis Michette, RHCA XIII
https://rhtapps.redhat.com/verify?certId=111-134-086
SENIOR TECHNICAL INSTRUCTOR / CERTIFIED INSTRUCTOR AND EXAMINER
Red Hat Certification + Training

View solution in original post

Travis
Moderator
Moderator
  • 42 Views

@smi-andrew -

Awesome and you're welcome. Glad it was resolved. It is almost always how it parses a variable in some of these situations and it is very easy to get an incorrect variable or the variable used as an incorrect type for a specific module. Hopefully the exercise you went through of troubleshooting this issue and playbook/role can be applied to other situations with new playbooks and roles (because this will most certainly happen again).

Travis Michette, RHCA XIII
https://rhtapps.redhat.com/verify?certId=111-134-086
SENIOR TECHNICAL INSTRUCTOR / CERTIFIED INSTRUCTOR AND EXAMINER
Red Hat Certification + Training

View solution in original post

21 Replies
Travis
Moderator
Moderator
  • 284 Views

@smi-andrew -

This will be a tough one to answer as there aren't quite enough details to fully troubleshoot. I can offer some more insight into how to troubleshoot and update the post for better responses.

You don't have a complete playbook or role structure setup. You are mentioning using a ROLE to perform the work, so I can only imagine that your snippet above is the main.yml from a portion of the ROLE tasks directory. Additionally, you aren't specifying the version of Ansible Automation Controller (AAP) being used nor the version of the collections. You are also capturing output without showing what is in there so the facts you are setting and using as variables could be causing some or part of the issue. You also aren't showing line numbers in the snippets of the tasks nor a complete output of the failure message. Generally when you have the playbook/task/role failures, it is at least showing task and playbook names in there, but those pieces have been cutout of your copy/paste.

From the pieces that are there ...

os_patching/tasks/vmware/schedule_removal.yml

Indicates the problem is here, so it seems it is the tasks snippet section in the role. It is most likely not liking something in the future_rrule_shell.stdout captured output. 

I'm sure you've looked at the module docs already, but for others and a complete answer, I'm including some links here.

https://docs.ansible.com/projects/ansible/latest/collections/awx/awx/schedule_rruleset_lookup.html - Is a lookup filter and will generate a string (very important detail). 

I would also caution about mixing in community collections with AAP, especially as there is some more divergence with AAP 2.5 and beyond. When at all possible try to use the officially supported Ansible version of the collections as they may have slightly different syntax and formatting. The bigger thing is with version alignment as the community collections follow along the development of AWX and can often be ahead of Controller (most of the time on the older AAP versions I could use interchangibly, but it isn't quite as clear cut now).

I've posted some of the official collections here ...

https://console.redhat.com/ansible/automation-hub/repo/published/ansible/controller/content/lookup/s...

https://console.redhat.com/ansible/automation-hub/repo/published/ansible/controller/content/lookup/s...

https://console.redhat.com/ansible/automation-hub/repo/published/ansible/controller/content/module/s...

So in your tasks above, you are mixing the community ruleset lookup but the official ansible.controller collection schedule module. I've placed the links for the official ruleset lookup above too. (I doubt this is the actual issue, but just in case, it can help eliminate variables and unknowns and cleanup the role some).

That being said, you might want to perform some debugging and even eliminate (comment out some of the tasks) so you can pretend to see what your tasks will do. At a minium, I would definely do debugging to see what is actually contained within your variabels.

- name: Debug scheduling variables
  debug:
    msg: 
      - "RRule stdout: {{ future_rrule_shell.stdout }}"
      - "RRule type: {{ future_rrule_shell.stdout | type_debug }}"
      - "Snapshot TTL: {{ snapshot_ttl_days|default(7) }}"

I think these are the variables/facts you are captuing and this can give you an all-in-one output. It also might not hurt to see what complex_rule and some of the others evaluate to as you are using those in the tasks as well. I suspect it doesn't like what you are captuing and there are some null values in there or incorrect "types" and the module doesn't know what to do. 

64 - name: Schedule a one-time snapshot cleanup in AAP (T+{{ snapshot_ttl_days|default(7) }} days)
     ^ column 3

This one is throwing me for a bit of a loop as it presents like a YAML syntax issue, and this could be the only error (again, not sure without full line numbers and context), but it could be there is an indentation error causing the invalid or NULL data to be read by the module. 

One quick fix to the schedule task to see if it is the variable which shouldn't cause too much harm because you can delete it is to set the rule to a default empty string.

rrule: "{{ future_rrule_shell.stdout | default('') }}"

This would be replacing your run rule in the SCHEDULE task. So if the stdout is bad and not a string, you would get a blank/empty string in there.

 

Travis Michette, RHCA XIII
https://rhtapps.redhat.com/verify?certId=111-134-086
SENIOR TECHNICAL INSTRUCTOR / CERTIFIED INSTRUCTOR AND EXAMINER
Red Hat Certification + Training
Travis
Moderator
Moderator
  • 282 Views

@smi-andrew -

I'm doing this as a separate post as I'm wondering if you can think of the problem and solution a little differently. It seems like you have a very solid infrastructure and a lot of tools in place already as well as some other scripts and REST APIs out there. 

You have this as a central component ...

I need **bleep** to take snapshots and then create a scheduled job in AAP to remove those snapshots after 1 week.

Basically, if we look at your actual problem, you are creating a lot of VMWare snapshots when running backups and you want to delete the older snapshots after they are 7 days (1 week old). I'm wondering if there is a much cleaner way to do this and have way more control as I don't want you to get rid of things accidentally that you might be using/needing later. Have you considered an alternative approach (you could still use Ansible Controller, but not in the scheduling method you want to do).

A high-level overview of how I might tackle the problem is creating a script or using an existing API that will list the snapshots in the date range I want to delete. I would run that multiple times by hand and error-check and verify it includes all the snapshots that I think I want ... this is Part 1, getting and verifying the list of images I want to delete and clean up.

Part 2: I would possibly modify the vmware_snapshot_cleanup to take and accept a list of images to delete. Maybe make a new playbook and job template for this for testing. You can then take the list of VM snapshots from part 1 and manually run the vmware_snapshot_cleanup playbook to see if you get the required results and successful deletion.

Part 3: Integrate Parts 1 & 2 into the patching workflow. It can create snapshots and patch the systems and when it gets to the cleanup part, instead of doing complex things with the Ansible Controller Scheduler that is "hidden" from you, you would be calling your API or scrpt that you used in part 1 and registering those as variables. You could then pass those variables and use an API call/webhook or something to trigger running the Job Template that does the cleanup passing in the images you wish to delete. 

While this approach might not be as fancy and require possibly more work, you have much more control and you have ways to look at things. More importantly, it makes it much easier to see what is going on ... one other thing the OCD in me would do is actually put an approval node in the workflow and have it send a list of VM snapshots that would be deleted (since you've captured the entire list), and then let someone manually approve the deletion as this would add an addtional layer of protection and provide piece of mind that you know what is being deleted and when.

Travis Michette, RHCA XIII
https://rhtapps.redhat.com/verify?certId=111-134-086
SENIOR TECHNICAL INSTRUCTOR / CERTIFIED INSTRUCTOR AND EXAMINER
Red Hat Certification + Training
smi-andrew
Flight Engineer
Flight Engineer
  • 254 Views

So this solution sounds awesome and I will look into it.  Might tap you for more at some point.

0 Kudos
smi-andrew
Flight Engineer
Flight Engineer
  • 273 Views

Holy cow, thank you for the response.  

To update you, I am getting a list of vms and parsing that out from vmware.  I can provide more of the code.

 

I got farther with this on Friday.  I am now able to get the one-time schedule created.  But when it creates it, it is just sync'ing the template.  So just updating the code.  So maybe I am using the wrong module?  I need to go back and read through all of your stuff again.  

0 Kudos
smi-andrew
Flight Engineer
Flight Engineer
  • 254 Views

I have since updated the code and removed the AWX stuff.  Still going through your notes.

0 Kudos
Travis
Moderator
Moderator
  • 248 Views

@smi-andrew -

Great to hear and hopefully everything was spelled correctly and easy enough to understand. One thing to mention on the followup ...

I got farther with this on Friday.  I am now able to get the one-time schedule created.  But when it creates it, it is just sync'ing the template.  So just updating the code.  So maybe I am using the wrong module?  I need to go back and read through all of your stuff again.  

I was somewhat afraid of how you were implementing things ... I didn't bring it up, but this somewhat confirms it. Let me see if I can explain things a little bit for you on what I'm pretty sure is happening ...

You have a scheduled job called "Cleanup Snapshots", and it is scheduled to run at some point in time. Right now, you are using all the correct modules (I think) now that you have updated to use everything that goes together, but what is happening actually with the ansible.controller.schedule is you are updating the schedule task. So each time you do that, it is essentially a new task and new schedule. So it is possible that might never run because again, that is meant to be something scheduled in the future. So if you for example had a generic Snapshot Cleanup playbook, it would be much better to create a schedule either with the WebUI or a playbook, but create it once. The schedule would run the generic cleanup playbook that would look at all the current snapshots on the system and delete any that were older than 7 days. You would set the schedule to run weekly. That is a fairly easy fix. However, by having the playbook and role run the tasks to create a schedule each time, it is possible that playing with the dates it only updates something.

This is part of the reason I suggested a different approach where you actually kickoff the delete process with a WebHook or API call to controller by passing the VMs you want. Your backup process could identify and capture the items and then pass them along to the delete playbook (Job Template or Job Workflow Template if you choose to use an Approval Node).

 

Travis Michette, RHCA XIII
https://rhtapps.redhat.com/verify?certId=111-134-086
SENIOR TECHNICAL INSTRUCTOR / CERTIFIED INSTRUCTOR AND EXAMINER
Red Hat Certification + Training
0 Kudos
smi-andrew
Flight Engineer
Flight Engineer
  • 247 Views

Here is the updated code.

- name: Calculate 'DTSTART' 7 days in the future using shell
ansible.builtin.shell: |
date -d "+7 days" +'%Y%m%dT%H%M%SZ'
register: removal_date
delegate_to: localhost
run_once: true
changed_when: false

- name: Calculate 'DTSTART' 2 hours in the future using shell
ansible.builtin.shell: |
date -d "+30 minutes" +'%Y%m%dT%H%M%SZ'
register: removal_date_hour
delegate_to: localhost
run_once: true
changed_when: false

- name: Determine AAP instance based on hostname
ansible.builtin.set_fact:
selected_aap_instance: "{{ item.value | combine({'name': item.key}) }}"
loop: "{{ aap_instance_mapping | dict2items }}"
when: item.value.host_string in ansible_hostname
run_once: true
delegate_to: localhost

- name: Aggregate VM IDs and vCenter info for bulk cleanup
ansible.builtin.set_fact:
all_vms_for_cleanup: "{{ all_vms_for_cleanup | default([]) + [{ 'vm_id': hostvars[item]._vm_id, 'moid': hostvars[item]._vm_id, 'vcenter_hostname': hostvars[item]._chosen_vcenter, 'vcenter_username': vcenter_username, 'vcenter_password': vcenter_password, 'vcenter_validate_certs': vcenter_validate_certs | default(false) }] }}"
loop: "{{ ansible_play_hosts_all }}"
when: hostvars[item]._vm_id is defined
run_once: true
delegate_to: localhost

- name: Set dynamic RRULE for one-time execution
ansible.builtin.set_fact:
dynamic_rrule: |
"DTSTART:{{ removal_date_hour.stdout }}
RRULE:FREQ=DAILY;INTERVAL=1;COUNT=1"
run_once: true
delegate_to: localhost

- name: Build Sanitized schedule job name
ansible.builtin.set_fact:
schedule_job_name: >-
Bulk Cleanup for {{ ansible_play_hosts_all | length }} VMs - scheduled for {{ removal_date_hour.stdout }}
run_once: true
delegate_to: localhost

- name: Look up the snapshot cleanup job template ID
ansible.controller.job_template:
controller_host: "https://{{ selected_aap_instance.host }}"
controller_oauthtoken: "{{ selected_aap_instance.oauth_token }}"
validate_certs: "{{ controller_validate_certs | default(true) }}"
name: vmware_snapshot_cleanup
organization: $REDACTED
register: cleanup_job_template_lookup
run_once: true
delegate_to: localhost

- name: Fail if job template was not found
ansible.builtin.fail:
msg: "The 'vmware_snapshot_cleanup' job template was not found in the '$REDACTED' organization. Please verify it exists."
when: cleanup_job_template_lookup is not defined or not cleanup_job_template_lookup.id
run_once: true
delegate_to: localhost

- name: Schedule a one-time snapshot cleanup in AAP for 7 days from now
ansible.controller.schedule:
controller_host: "https://{{ selected_aap_instance.host }}"
controller_oauthtoken: "{{ selected_aap_instance.oauth_token }}"
validate_certs: "{{ controller_validate_certs | default(true) }}"
# Use the ID to avoid ambiguous name lookup
unified_job_template: "{{ cleanup_job_template_lookup.id }}"
organization: $REDACTED
name: "{{ schedule_job_name }}"
job_type: run
rrule: "{{ dynamic_rrule }}"
state: present
enabled: true
extra_data:
vms_to_cleanup: "{{ all_vms_for_cleanup }}"
run_once: true
delegate_to: localhost

 

0 Kudos
Travis
Moderator
Moderator
  • 245 Views

So the question here is rrule: "{{ dynamic_rrule }}" what that evaluates to as this is the thing that helps determine the schedule. You need to have a "start date" and a "frequency" for schedules to run. Obviously, you want this running a single time not based on a recurring schedule, so that is acceptable not to run multiple times on a frequency like weekly. 

However, again, one issue you might have issues with is the changing of the schedule multiple times and when/how the start date/time is evaluated, which is why I suggested using debug to actually see what would happen on the screen and output. It takes a bit longer to do things this way, but you aren't troubleshooting in the dark and you know the exact values of all variables and how data is being processed.

Also one other thing, for the playbook/task/role code, a code block is nice as formatted values paste in well. Debugging YAML files like playbooks without that is very difficult to read/do when it is all left justified, so it is impossible to get things like syntax mistakes or alignment issues within modules. I know that isn't the issue here, but it is for future reference.

However, the more I look at the playbook and path you are using, unless I'm reading things incorrectly is that you are just wanting a list of machines to run/delete at a certain time and you are setting this up as a one time job using the schedule. Long term this might be more difficult to manage vs. something like an API/Webhook call that will delete the old snapshots as part of running the new snapshot process. Because seeing this again with more detail, it seems you are building a new one-time scheduled task for each time you run the playbook and that task is set to run on a specific set of VMs, so you would probably at some time want to cleanup all the old tasks in the task scheduler.

Travis Michette, RHCA XIII
https://rhtapps.redhat.com/verify?certId=111-134-086
SENIOR TECHNICAL INSTRUCTOR / CERTIFIED INSTRUCTOR AND EXAMINER
Red Hat Certification + Training
0 Kudos
smi-andrew
Flight Engineer
Flight Engineer
  • 194 Views

For some reason the code block wouldn't work yesterday.  No idea why.

Join the discussion
You must log in to join this conversation.