Mission Specialist
Mission Specialist
  • 800 Views

Openstack - POC - Instance Failure after controller reboot

Hi 

I hope this is the best place to ask for assistance on this. I'm currently busy with a proof of concept for Openstack for our internal development team. I'm running into the known issue of instances failing after a reboot of the controller node. (https://access.redhat.com/solutions/3524681). A related symptom is for the instance's drive to become read-only shortly after the controller is booted. Thereafter the instance fails to reboot, unable to find the boot drive. 

Deployment background:
- 3 node Ironic deployment (1 controller, 1 compute, 1 director)
- I did the entire deployment following the OSP13 product documentation (https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/director_installati...)

The issues described above is even after implimenting the suggested workarounds:
/sbin/losetup -fv /var/lib/cinder/cinder-volumes
as well as adding this this to the /etc/rc.d/rc.local file to be executed on boot. This is all cool, and after the reboot my lvms are visible and my loopback device has been created. But instances that were launched before the reboot, fail.

Attempting to start a failed instance returns the debug info shared at the bottom of this post. It looks like it's failing to log into the iscsi target, which is the controller node, right? Or more accurately, a container on the controller node. I found some target configs in the iscsid container. The iqn info below matches a volume that has been created after the last reboot. And my other volumes are missing Smiley Sad I haven't tried adding them manually, will try that and report back. Any ideas?

[root@overcloud-controller-0 ~]# docker exec -it -u0 iscsid sh -c 'targetcli ls'
o- / ......................................................................................................................... [...]
  o- backstores .............................................................................................................. [...]
  | o- block .................................................................................................. [Storage Objects: 1]
  | | o- iqn.2010-10.org.openstack:volume-3a6930bd-a80b-4f3c-8dca-10af717acdd9  [/dev/cinder-volumes/volume-3a6930bd-a80b-4f3c-8dca-10af717acdd9 (10.0GiB) write-thru activated]
  | |   o- alua ................................................................................................... [ALUA Groups: 1]
  | |     o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized]
  | o- fileio ................................................................................................. [Storage Objects: 0]
  | o- pscsi .................................................................................................. [Storage Objects: 0]
  | o- ramdisk ................................................................................................ [Storage Objects: 0]
  o- iscsi ............................................................................................................ [Targets: 1]
  | o- iqn.2010-10.org.openstack:volume-3a6930bd-a80b-4f3c-8dca-10af717acdd9 ............................................. [TPGs: 1]
  |   o- tpg1 .......................................................................................... [no-gen-acls, auth per-acl]
  |     o- acls .......................................................................................................... [ACLs: 1]
  |     | o- iqn.1994-05.com.redhat:801759f4b866 ...................................................... [1-way auth, Mapped LUNs: 1]
  |     |   o- mapped_lun0 ................. [lun0 block/iqn.2010-10.org.openstack:volume-3a6930bd-a80b-4f3c-8dca-10af717acdd9 (rw)]
  |     o- luns .......................................................................................................... [LUNs: 1]
  |     | o- lun0  [block/iqn.2010-10.org.openstack:volume-3a6930bd-a80b-4f3c-8dca-10af717acdd9 (/dev/cinder-volumes/volume-3a6930bd-a80b-4f3c-8dca-10af717acdd9) (default_tg_pt_gp)]
  |     o- portals .................................................................................................... [Portals: 1]
  |       o- 192.168.66.107:3260 .............................................................................................. [OK]
  o- loopback ......................................................................................................... [Targets: 0]
[root@overcloud-controller-0 ~]# 

I understand that  Red Hat does not support the Cinder iSCSI / LVM backend in production, but from what I can find online, this setup should work, although not optimally for a production environment. Any tips, or guidance will be greatly appreciated. 

Alternatively, is there an easy way to migrate to a working storage solution?

 

Labels (3)
0 Kudos
1 Reply
Mission Specialist
Mission Specialist
  • 797 Views

Re: Openstack - POC - Instance Failure after controller reboot

Original post exceeded character limit.

 

 

Debug Info from nova_compute container (on compute node):

/var/log/nova/nova-compute.log

2018-12-19 07:32:53.346 1 INFO nova.compute.resource_tracker [req-86e9602c-b014-4db9-bb3e-ff1f36a91568 bc43daad19bd4c95a655fd0307a214f6 fcbc822518254c55a6102237d0683306 - default default] Final resource view: name=overcloud-compute-0.localdomain phys_ram=16338MB used_ram=10240MB phys_disk=557GB used_disk=20GB total_vcpus=16 used_vcpus=4 pci_stats=[]
2018-12-19 07:32:53.412 1 DEBUG nova.compute.resource_tracker [req-86e9602c-b014-4db9-bb3e-ff1f36a91568 bc43daad19bd4c95a655fd0307a214f6 fcbc822518254c55a6102237d0683306 - default default] Compute_service record updated for overcloud-compute-0.localdomainSmiley Surprisedvercloud-compute-0.localdomain _update_available_resource /usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py:764
2018-12-19 07:32:53.413 1 DEBUG oslo_concurrency.lockutils [req-86e9602c-b014-4db9-bb3e-ff1f36a91568 bc43daad19bd4c95a655fd0307a214f6 fcbc822518254c55a6102237d0683306 - default default] Lock "compute_resources" released by "nova.compute.resource_tracker._update_available_resource" :: held 0.256s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
2018-12-19 07:32:54.837 1 DEBUG oslo_service.periodic_task [req-86e9602c-b014-4db9-bb3e-ff1f36a91568 bc43daad19bd4c95a655fd0307a214f6 fcbc822518254c55a6102237d0683306 - default default] Running periodic task ComputeManager._poll_rescued_instances run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215
2018-12-19 07:32:57.872 189 DEBUG oslo_concurrency.processutils [req-86e9602c-b014-4db9-bb3e-ff1f36a91568 bc43daad19bd4c95a655fd0307a214f6 fcbc822518254c55a6102237d0683306 - default default] CMD "iscsiadm -m node -T iqn.2010-10.org.openstack:volume-8da49a8b-1b12-438b-aa80-4bffb12440e4 -p 192.168.66.107:3260 --login" returned: 8 in 120.053s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409
2018-12-19 07:32:57.873 189 DEBUG oslo_concurrency.processutils [req-86e9602c-b014-4db9-bb3e-ff1f36a91568 bc43daad19bd4c95a655fd0307a214f6 fcbc822518254c55a6102237d0683306 - default default] u'iscsiadm -m node -T iqn.2010-10.org.openstack:volume-8da49a8b-1b12-438b-aa80-4bffb12440e4 -p 192.168.66.107:3260 --login' failed. Not Retrying. execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:457
2018-12-19 07:32:57.873 189 DEBUG oslo.privsep.daemon [req-86e9602c-b014-4db9-bb3e-ff1f36a91568 bc43daad19bd4c95a655fd0307a214f6 fcbc822518254c55a6102237d0683306 - default default] privsep: Exception during request[140339537775824]: Unexpected error while running command.
Command: iscsiadm -m node -T iqn.2010-10.org.openstack:volume-8da49a8b-1b12-438b-aa80-4bffb12440e4 -p 192.168.66.107:3260 --login
Exit code: 8
Stdout: u'Logging in to [iface: default, target: iqn.2010-10.org.openstack:volume-8da49a8b-1b12-438b-aa80-4bffb12440e4, portal: 192.168.66.107,3260] (multiple)\n'
Stderr: u'iscsiadm: Could not login to [iface: default, target: iqn.2010-10.org.openstack:volume-8da49a8b-1b12-438b-aa80-4bffb12440e4, portal: 192.168.66.107,3260].\niscsiadm: initiator reported error (8 - connection timed out)\niscsiadm: Could not log into all portals\n' loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:449
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 445, in loop
    reply = self._process_cmd(*msg)
  File "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 428, in _process_cmd
    ret = func(*f_args, **f_kwargs)
  File "/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 209, in _wrap
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/os_brick/privileged/rootwrap.py", line 194, in execute_root
    return custom_execute(*cmd, shell=False, run_as_root=False, **kwargs)
  File "/usr/lib/python2.7/site-packages/os_brick/privileged/rootwrap.py", line 143, in custom_execute
    on_completion=on_completion, *cmd, **kwargs)
  File "/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py", line 424, in execute
    cmd=sanitized_cmd)
ProcessExecutionError: Unexpected error while running command.
Command: iscsiadm -m node -T iqn.2010-10.org.openstack:volume-8da49a8b-1b12-438b-aa80-4bffb12440e4 -p 192.168.66.107:3260 --login
Exit code: 8
Stdout: u'Logging in to [iface: default, target: iqn.2010-10.org.openstack:volume-8da49a8b-1b12-438b-aa80-4bffb12440e4, portal: 192.168.66.107,3260] (multiple)\n'
Stderr: u'iscsiadm: Could not login to [iface: default, target: iqn.2010-10.org.openstack:volume-8da49a8b-1b12-438b-aa80-4bffb12440e4, portal: 192.168.66.107,3260].\niscsiadm: initiator reported error (8 - connection timed out)\niscsiadm: Could not log into all portals\n'
2018-12-19 07:32:57.892 189 DEBUG oslo.privsep.daemon [req-86e9602c-b014-4db9-bb3e-ff1f36a91568 bc43daad19bd4c95a655fd0307a214f6 fcbc822518254c55a6102237d0683306 - default default] privsep: reply[140339537775824]: (5, 'oslo_concurrency.processutils.ProcessExecutionError', (u'Logging in to [iface: default, target: iqn.2010-10.org.openstack:volume-8da49a8b-1b12-438b-aa80-4bffb12440e4, portal: 192.168.66.107,3260] (multiple)\n', u'iscsiadm: Could not login to [iface: default, target: iqn.2010-10.org.openstack:volume-8da49a8b-1b12-438b-aa80-4bffb12440e4, portal: 192.168.66.107,3260].\niscsiadm: initiator reported error (8 - connection timed out)\niscsiadm: Could not log into all portals\n', 8, u'iscsiadm -m node -T iqn.2010-10.org.openstack:volume-8da49a8b-1b12-438b-aa80-4bffb12440e4 -p 192.168.66.107:3260 --login', None)) loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:456
2018-12-19 07:32:57.894 1 WARNING os_brick.initiator.connectors.iscsi [req-86e9602c-b014-4db9-bb3e-ff1f36a91568 bc43daad19bd4c95a655fd0307a214f6 fcbc822518254c55a6102237d0683306 - default default] Failed to login iSCSI target iqn.2010-10.org.openstack:volume-8da49a8b-1b12-438b-aa80-4bffb12440e4 on portal 192.168.66.107:3260 (exit code 8).: ProcessExecutionError: Unexpected error while running command.
2018-12-19 07:32:57.894 1 WARNING os_brick.initiator.connectors.iscsi [req-86e9602c-b014-4db9-bb3e-ff1f36a91568 bc43daad19bd4c95a655fd0307a214f6 fcbc822518254c55a6102237d0683306 - default default] Failed to connect to iSCSI portal 192.168.66.107:3260.
2018-12-19 07:32:57.895 1 DEBUG os_brick.initiator.connectors.iscsi [req-86e9602c-b014-4db9-bb3e-ff1f36a91568 bc43daad19bd4c95a655fd0307a214f6 fcbc822518254c55a6102237d0683306 - default default] Getting connected devices for (ips,iqns,luns)=((u'192.168.66.107:3260', u'iqn.2010-10.org.openstack:volume-8da49a8b-1b12-438b-aa80-4bffb12440e4', 0),) _get_connection_devices /usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py:791
2018-12-19 07:32:57.895 189 DEBUG oslo.privsep.daemon [req-86e9602c-b014-4db9-bb3e-ff1f36a91568 bc43daad19bd4c95a655fd0307a214f6 fcbc822518254c55a6102237d0683306 - default default] privsep: request[140339537775824]: (3, 'os_brick.privileged.rootwrap.execute_root', ('iscsiadm', '-m', 'node'), {'check_exit_code': False}) loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:443
2018-12-19 07:32:57.895 189 DEBUG oslo_concurrency.processutils [req-86e9602c-b014-4db9-bb3e-ff1f36a91568 bc43daad19bd4c95a655fd0307a214f6 fcbc822518254c55a6102237d0683306 - default default] Running cmd (subprocess): iscsiadm -m node execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372
2018-12-19 07:32:57.902 189 DEBUG oslo_concurrency.processutils [req-86e9602c-b014-4db9-bb3e-ff1f36a91568 bc43daad19bd4c95a655fd0307a214f6 fcbc822518254c55a6102237d0683306 - default default] CMD "iscsiadm -m node" returned: 0 in 0.007s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409
2018-12-19 07:32:57.902 189 DEBUG oslo.privsep.daemon [req-86e9602c-b014-4db9-bb3e-ff1f36a91568 bc43daad19bd4c95a655fd0307a214f6 fcbc822518254c55a6102237d0683306 - default default] privsep: reply[140339537775824]: (4, ('192.168.66.107:3260,-1 iqn.2010-10.org.openstack:volume-82e4262f-8adf-4a75-93e9-543608660548\n192.168.66.107:3260,-1 iqn.2010-10.org.openstack:volume-c08108a8-ed72-4487-967a-51ec4616b7c7\n192.168.66.107:3260,-1 iqn.2010-10.org.openstack:volume-9aa05f39-5b2a-44fb-8657-fc04b31fb018\n192.168.66.107:3260,-1 iqn.2010-10.org.openstack:volume-b9284212-95d0-4e59-97c6-806471eea69b\n192.168.66.107:3260,-1 iqn.2010-10.org.openstack:volume-3a6930bd-a80b-4f3c-8dca-10af717acdd9\n192.168.66.107:3260,-1 iqn.2010-10.org.openstack:volume-8da49a8b-1b12-438b-aa80-4bffb12440e4\n', '')) loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:456
2018-12-19 07:32:57.904 189 DEBUG oslo.privsep.daemon [req-86e9602c-b014-4db9-bb3e-ff1f36a91568 bc43daad19bd4c95a655fd0307a214f6 fcbc822518254c55a6102237d0683306 - default default] privsep: request[140339537775824]: (3, 'os_brick.privileged.rootwrap.execute_root', ('iscsiadm', '-m', 'session'), {'check_exit_code': (0, 1, 21, 255)}) loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:443
2018-12-19 07:32:57.904 189 DEBUG oslo_concurrency.processutils [req-86e9602c-b014-4db9-bb3e-ff1f36a91568 bc43daad19bd4c95a655fd0307a214f6 fcbc822518254c55a6102237d0683306 - default default] Running cmd (subprocess): iscsiadm -m session execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372
2018-12-19 07:32:57.910 189 DEBUG oslo_concurrency.processutils [req-86e9602c-b014-4db9-bb3e-ff1f36a91568 bc43daad19bd4c95a655fd0307a214f6 fcbc822518254c55a6102237d0683306 - default default] CMD "iscsiadm -m session" returned: 0 in 0.006s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409
2018-12-19 07:32:57.911 189 DEBUG oslo.privsep.daemon [req-86e9602c-b014-4db9-bb3e-ff1f36a91568 bc43daad19bd4c95a655fd0307a214f6 fcbc822518254c55a6102237d0683306 - default default] privsep: reply[140339537775824]: (4, ('tcp: [1] 192.168.66.107:3260,1 iqn.2010-10.org.openstack:volume-3a6930bd-a80b-4f3c-8dca-10af717acdd9 (non-flash)\n', '')) loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:456
2018-12-19 07:32:57.912 1 DEBUG os_brick.initiator.connectors.iscsi [req-86e9602c-b014-4db9-bb3e-ff1f36a91568 bc43daad19bd4c95a655fd0307a214f6 fcbc822518254c55a6102237d0683306 - default default] iscsiadm ('-m', 'session'): stdout=tcp: [1] 192.168.66.107:3260,1 iqn.2010-10.org.openstack:volume-3a6930bd-a80b-4f3c-8dca-10af717acdd9 (non-flash)
 stderr= _run_iscsiadm_bare /usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py:1088
2018-12-19 07:32:57.912 1 DEBUG os_brick.initiator.connectors.iscsi [req-86e9602c-b014-4db9-bb3e-ff1f36a91568 bc43daad19bd4c95a655fd0307a214f6 fcbc822518254c55a6102237d0683306 - default default] iscsi session list stdout=tcp: [1] 192.168.66.107:3260,1 iqn.2010-10.org.openstack:volume-3a6930bd-a80b-4f3c-8dca-10af717acdd9 (non-flash)
 stderr= _run_iscsi_session /usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py:1077
2018-12-19 07:32:57.912 1 DEBUG os_brick.initiator.connectors.iscsi [req-86e9602c-b014-4db9-bb3e-ff1f36a91568 bc43daad19bd4c95a655fd0307a214f6 fcbc822518254c55a6102237d0683306 - default default] Resulting device map defaultdict(<function <lambda> at 0x7fa358210b90>, {(u'192.168.66.107:3260', u'iqn.2010-10.org.openstack:volume-8da49a8b-1b12-438b-aa80-4bffb12440e4'): (set([]), set([]))}) _get_connection_devices /usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py:823
2018-12-19 07:32:57.913 1 DEBUG os_brick.initiator.connectors.iscsi [req-86e9602c-b014-4db9-bb3e-ff1f36a91568 bc43daad19bd4c95a655fd0307a214f6 fcbc822518254c55a6102237d0683306 - default default] Disconnecting from: [(u'192.168.66.107:3260', u'iqn.2010-10.org.openstack:volume-8da49a8b-1b12-438b-aa80-4bffb12440e4')] _disconnect_connection /usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py:1065
2018-12-19 07:32:57.914 189 DEBUG oslo.privsep.daemon [req-86e9602c-b014-4db9-bb3e-ff1f36a91568 bc43daad19bd4c95a655fd0307a214f6 fcbc822518254c55a6102237d0683306 - default default] privsep: request[140339537775824]: (3, 'os_brick.privileged.rootwrap.execute_root', ('iscsiadm', '-m', 'node', '-T', u'iqn.2010-10.org.openstack:volume-8da49a8b-1b12-438b-aa80-4bffb12440e4', '-p', u'192.168.66.107:3260', '--op', 'update', '-n', 'node.startup', '-v', 'manual'), {'attempts': 1, 'check_exit_code': (0, 21, 255), 'delay_on_retry': True}) loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:443
2018-12-19 07:32:57.914 189 DEBUG oslo_concurrency.processutils [req-86e9602c-b014-4db9-bb3e-ff1f36a91568 bc43daad19bd4c95a655fd0307a214f6 fcbc822518254c55a6102237d0683306 - default default] Running cmd (subprocess): iscsiadm -m node -T iqn.2010-10.org.openstack:volume-8da49a8b-1b12-438b-aa80-4bffb12440e4 -p 192.168.66.107:3260 --op update -n node.startup -v manual execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372
0 Kudos
Reply
Loading...
Join the discussion
You must log in to join this conversation.