Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1597873 - FFU: ffwd-upgrade-prepare times out when the overcloud nodes are not able to validate undercloud's swift endpoint SSL certificate
Summary: FFU: ffwd-upgrade-prepare times out when the overcloud nodes are not able to ...
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: zstream
: 13.0 (Queens)
Assignee: mathieu bultel
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-03 18:27 UTC by Marius Cornea
Modified: 2019-03-31 05:11 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)

Description Marius Cornea 2018-07-03 18:27:06 UTC
Description of problem:

FFU: ffwd-upgrade-prepare times out when the overcloud nodes are not able to validate undercloud's swift endpoint SSL certificate. The failure shows something like the message below. As you can see it doesn't indicate to what the root cause is. We should probably add a validation that runs during ffwd-upgrade-prepare on the overcloud nodes and prevents the upgrade process from starting by pointing to the actual root cause that the overcloud nodes are not able to reach the undercloud swift SSL enabled endpoint.

Waiting for messages on queue 'ffwdupgrade' with no timeout.
{u'config_name': u'ffwd-upgrade-prepare',
 u'execution': {u'created_at': u'2018-06-22 02:45:22',
                u'id': u'd05a82ef-c002-4f9b-9890-5d465a8a6bd2',
                u'input': {u'config': u'#!/bin/bash \nrm -f /usr/libexec/os-apply-config/templates/etc/os-net-config/config.json || true \n',
                           u'config_name': u'ffwd-upgrade-prepare',
                           u'group': u'script',
                           u'queue_name': u'ffwdupgrade',
                           u'server_name': u'overcloud-controller-0',
                           u'server_uuid': u'afb6d2a8-0937-488b-85dd-157ac38ad6bf'},
                u'name': u'tripleo.deployment.v1.deploy_on_server',
                u'params': {u'index': 1,
                            u'namespace': u'',
                            u'root_execution_id': u'9bbb254a-2b83-4047-8dd1-e92f672ceadd',
                            u'task_execution_id': u'ce79401a-09e8-4b35-baed-adc5ba6dc0d4'},
                u'spec': {u'input': [u'server_uuid',
                                     u'server_name',
                                     u'config',
                                     u'config_name',
                                     u'group',
                                     {u'queue_name': u'tripleo'}],
                          u'name': u'deploy_on_server',
                          u'tags': [u'tripleo-common-managed'],
                          u'tasks': {u'deploy_config': {u'action': u'tripleo.deployment.config',
                                                        u'input': {u'config': u'<% $.config %>',
                                                                   u'group': u'<% $.group %>',
                                                                   u'name': u'<% $.config_name %>',
                                                                   u'server_id': u'<% $.server_uuid %>'},
                                                        u'name': u'deploy_config',
                                                        u'on-complete': u'send_message',
                                                        u'publish': {u'status_code': u'<% task().result.deploy_status_code %>',
                                                                     u'stderr': u'<% task().result.deploy_stderr %>',
                                                                     u'stdout': u'<% task().result.deploy_stdout %>'},
                                                        u'publish-on-error': {u'message': u'<% task().result %>',
                                                                              u'status': u'FAILED'},
                                                        u'type': u'direct',
                                                        u'version': u'2.0'},
                                     u'send_message': {u'action': u'zaqar.queue_post',
                                                       u'input': {u'messages': {u'body': {u'payload': {u'config_name': u'<% $.config_name %>',
                                                                                                       u'execution': u'<% execution() %>',
                                                                                                       u'message': u'<% $.get("message", "") %>',
                                                                                                       u'server_name': u'<% $.server_name %>',
                                                                                                       u'server_uuid': u'<% $.server_uuid %>',
                                                                                                       u'status': u'<% $.get("status", "SUCCESS") %>',
                                                                                                       u'status_code': u'<% $.get("status_code", "") %>',
                                                                                                       u'stderr': u'<% $.get("stderr", "") %>',
                                                                                                       u'stdout': u'<% $.get("stdout", "") %>'},
                                                                                          u'type': u'tripleo.deployment.v1.deploy_on_server'}},
                                                                  u'queue_name': u'<% $.queue_name %>'},
                                                       u'name': u'send_message',
                                                       u'on-success': [{u'fail': u'<% $.get(\'status\') = "FAILED" %>'}],
                                                       u'retry': u'count=5 delay=1',
                                                       u'type': u'direct',
                                                       u'version': u'2.0'}},
                          u'version': u'2.0'},
                u'updated_at': u'2018-06-22 02:45:22'},
 u'message': u"Timeout for heat deployment 'ffwd-upgrade-prepare'",
 u'server_name': u'overcloud-controller-0',
 u'server_uuid': u'afb6d2a8-0937-488b-85dd-157ac38ad6bf',
 u'status': u'FAILED',
 u'status_code': u'',
 u'stderr': u'',
 u'stdout': u''}

Comment 1 Sadique Puthen 2018-07-12 04:07:10 UTC
We are hitting this issue. Undercloud is SSL enabled with self signed certificate. Is there a way to inject the CA certificate to ffw prepare command so that it can verify the certificate for undercloud swift url during prepare? Or a way to tell to skip verification.

Comment 2 Marius Cornea 2018-07-12 12:05:36 UTC
As workaround we have a documented ansible playbook[1] that can be used to validate that overcloud nodes can reach undercloud's Swift SSL enabled endpoint. The playbook takes as a var the CA certificate location that needs to be injected if needed. 

[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html-single/fast_forward_upgrades/#preparing_access_to_the_undercloud_public_api_over_SSL_TLS

Comment 4 Carlos Camacho 2018-10-17 13:00:30 UTC
Hi Harry,

Sorry for that when updating the BZ I forgot to remove the triaged keyword.

Comment 7 Harry Rybacki 2019-02-01 18:48:55 UTC
Updating the Severity/Priority of this bug. With the workaround for this issue living in official documentation (comment#2), an operator would need to explicitly skip it to encounter this bug.

However, we will need to a) add an inflight validation and/or b) ensure a clearer error message is propagated up to the operator at the time of failure.


Note You need to log in before you can comment on or make changes to this bug.