Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1602071 - [RFE] FFU [Improvement]: no recovery when ffwd-upgrade run command is missed and controller upgrade is started
Summary: [RFE] FFU [Improvement]: no recovery when ffwd-upgrade run command is missed ...
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-tripleoclient
Version: unspecified
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: zstream
: 10.0 (Newton)
Assignee: Sergii Golovatiuk
QA Contact: Gurenko Alex
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-17 18:38 UTC by Valli Annamalai
Modified: 2019-04-14 05:35 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)

Description Valli Annamalai 2018-07-17 18:38:51 UTC
Description of problem:

OSP10 was deployed with 3 controllers and 2 computes.
Undercloud was upgraded from OSP10 to 13
Fast Forward prepare was run including all the templates.
But I missed the ffwd-upgrade run command and executed the controller upgrade.

So during controller upgrade_steps, the task Install docker package failed:

 u'TASK [Install docker packages on upgrade if missing] ***************************',
 u'Tuesday 17 July 2018  11:47:43 -0400 (0:00:00.101)       0:20:22.448 ********** ',
 u'fatal: [192.168.24.7]: FAILED! => {"changed": false, "msg": "There are no enabled repos.\\n Run \\"yum repolist all\\" to see the repos you have.\\n To enable Red Hat Subscription Management repositories:\\n     subscription-manager repos --enable <repo>\\n To enable custom repositories:\\n     yum-config-manager --enable <repo>\\n", "rc": 1, "results": []}',
 u'fatal: [192.168.24.15]: FAILED! => {"changed": false, "msg": "There are no enabled repos.\\n Run \\"yum repolist all\\" to see the repos you have.\\n To enable Red Hat Subscription Management repositories:\\n     subscription-manager repos --enable <repo>\\n To enable custom repositories:\\n     yum-config-manager --enable <repo>\\n", "rc": 1, "results": []}',
 u'fatal: [192.168.24.12]: FAILED! => {"changed": false, "msg": "There are no enabled repos.\\n Run \\"yum repolist all\\" to see the repos you have.\\n To enable Red Hat Subscription Management repositories:\\n     subscription-manager repos --enable <repo>\\n To enable custom repositories:\\n     yum-config-manager --enable <repo>\\n", "rc": 1, "results": []}',
 u'',
 u'PLAY RECAP *********************************************************************',
 u'192.168.24.12              : ok=354  changed=226  unreachable=0    failed=1   ',
 u'192.168.24.15              : ok=354  changed=226  unreachable=0    failed=1   ',
 u'192.168.24.7               : ok=354  changed=226  unreachable=0    failed=1   ',


So when I ran the ffwd-upgrade run command, it failed with error:
An unexpected error prevented the server from fulfilling your request. (HTTP 500) (Request-ID: req-3f978f6a-a1df-4d5d-a636-26e7d1b26bad)

And in keystone log:
 [root@lorenzo stack]# tail /var/log/keystone/keystone.log
2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi   File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1152, in _request_authentication
2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi     auth_packet = self._read_packet()
2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi   File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1014, in _read_packet
2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi     packet.check_error()
2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi   File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 393, in check_error
2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi     err.raise_mysql_exception(self._data)
2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi   File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 107, in raise_mysql_exception
2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi     raise errorclass(errno, errval)
2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi DBNonExistentDatabase: (pymysql.err.InternalError) (1049, u"Unknown database 'keystone'") (Background on this error at: http://sqlalche.me/e/2j85)
2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi 


Since the upgrade_steps playbook failed in the middle after disabling all the services, the openstack CLI commands failed.

So there should be a way to recover from this other than the hard way of starting OSP 10 from scratch. The playbook can be made to revert all changes made when it fails in the middle. Or there could be a validation step in the beginning of controller upgrade to check if the ffwd-upgrade run command completed successfully.


Version-Release number of selected component (if applicable):


How reproducible:
Can be reproduced when the run command is missed and the controllers upgrade is started


Steps to Reproduce:
1. Deploy OSP10
2. Upgrade undercloud from 10 to 13
3. openstack overcloud ffwd-upgrade prepare
4. openstack overcloud upgrade run --roles Controller
5. Step 4 will fail with the task: Install docker packages
6. openstack overcloud ffwd-upgrade run --yes
7. Step 6 will throw error with keystone

Actual results:
When upgrade steps in controller fail, its impossible to recover the cloud.

Expected results:
When upgrade steps fail, it should revert the changes so the cloud is not disturbed. Or a validation step should be added to make sure all previous command were completed successfully.

Additional info:


Note You need to log in before you can comment on or make changes to this bug.