Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1598511 - FFU: restore procedure after successful 'openstack overcloud ffwd-upgrade run' fails with galera pcs resource not starting: exitreason='Failed initial monitor action
Summary: FFU: restore procedure after successful 'openstack overcloud ffwd-upgrade run...
Keywords:
Status: CLOSED DUPLICATE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: documentation
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: zstream
: 13.0 (Queens)
Assignee: RHOS Documentation Team
QA Contact: RHOS Documentation Team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-05 16:44 UTC by Marius Cornea
Modified: 2018-09-24 05:22 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-24 05:22:47 UTC
Target Upstream Version:


Attachments (Terms of Use)
restore commands output (deleted)
2018-07-05 16:44 UTC, Marius Cornea
no flags Details
sosreport (deleted)
2018-07-05 16:47 UTC, Marius Cornea
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1626086 None None None 2019-01-28 14:41:18 UTC

Description Marius Cornea 2018-07-05 16:44:35 UTC
Created attachment 1456801 [details]
restore commands output

Description of problem:
FFU: restore procedure after successful 'openstack overcloud ffwd-upgrade run' fails with galera pcs rsource not starting: exitreason='Failed initial monitor action:

[root@controller-0 heat-admin]# pcs status
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu Jul  5 16:39:26 2018
Last change: Thu Jul  5 16:36:43 2018 by hacluster via crmd on controller-0

3 nodes configured
19 resources configured (7 DISABLED)

Online: [ controller-0 controller-1 controller-2 ]

Full list of resources:

 ip-192.168.24.8	(ocf::heartbeat:IPaddr2):	Started controller-1
 Clone Set: haproxy-clone [haproxy]
     Started: [ controller-0 controller-1 controller-2 ]
 Master/Slave Set: galera-master [galera]
     galera	(ocf::heartbeat:galera):	FAILED Master controller-0 (blocked)
     Masters: [ controller-1 controller-2 ]
 ip-172.17.4.14	(ocf::heartbeat:IPaddr2):	Started controller-1
 ip-172.17.3.18	(ocf::heartbeat:IPaddr2):	Started controller-2
 Clone Set: rabbitmq-clone [rabbitmq]
     Stopped (disabled): [ controller-0 controller-1 controller-2 ]
 ip-172.17.1.12	(ocf::heartbeat:IPaddr2):	Started controller-2
 ip-10.0.0.108	(ocf::heartbeat:IPaddr2):	Started controller-1
 Master/Slave Set: redis-master [redis]
     Stopped (disabled): [ controller-0 controller-1 controller-2 ]
 ip-172.17.1.17	(ocf::heartbeat:IPaddr2):	Started controller-2
 openstack-cinder-volume	(systemd:openstack-cinder-volume):	Stopped (disabled)

Failed Actions:
* galera_promote_0 on controller-0 'unknown error' (1): call=66, status=complete, exitreason='Failed initial monitor action',
    last-rc-change='Thu Jul  5 16:36:44 2018', queued=0ms, exec=10412ms


Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


Version-Release number of selected component (if applicable):
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/fast_forward_upgrades/restoring-the-overcloud

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP10 with 3 controller, 2 computes, 3 ceph nodes

2. Backup controller nodes per:

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/fast_forward_upgrades/assembly-preparing_for_openstack-platform_upgrade#backing_up_the_overcloud 

3. Run the FFU procedure until the end of 'openstack overcloud ffwd-upgrade run' step. Make sure this step is successful.

4. Try to restore the controller nodes to step 2 per the restore procedure:
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/fast_forward_upgrades/restoring-the-overcloud

Actual results:
At step 'On the bootstrap Controller node, set Pacemaker to manage the Galera cluster:' the galera resource cannot get started:

Failed Actions:
* galera_promote_0 on controller-0 'unknown error' (1): call=66, status=complete, exitreason='Failed initial monitor action',
    last-rc-change='Thu Jul  5 16:36:44 2018', queued=0ms, exec=10412ms


Expected results:

The galera resource gets started as documented.

Additional info:

Attaching sosreport and the output of the commands that I run for the restore procedure.

Comment 1 Marius Cornea 2018-07-05 16:47:16 UTC
Created attachment 1456802 [details]
sosreport

Comment 5 Andrew Dahms 2018-09-24 05:22:47 UTC
Hi Marius,

Thank you for raising this bug.

My name is Andrew, and I am the documentation program manager investigating this issue.

After discussing this issue with the documentation team, we have decided that this issue must be reviewed by engineering before we can review the documentation impact.

Because there are several bugs of a similar nature, I will close this bug as a duplicate for now and move the main bug to engineering where it can be reviewed. We will then follow up and track any potential documentation impact coming out of that process.

Kind regards,

Andrew

*** This bug has been marked as a duplicate of bug 1626086 ***


Note You need to log in before you can comment on or make changes to this bug.