Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1380451 - Galera fails promotion after a stop-start sequence
Summary: Galera fails promotion after a stop-start sequence
Keywords:
Status: CLOSED DUPLICATE of bug 1360768
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: resource-agents
Version: 7.3
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: pre-dev-freeze
: ---
Assignee: Damien Ciabrini
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-29 16:00 UTC by Raoul Scarazzini
Modified: 2018-07-20 08:34 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-20 08:34:38 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Raoul Scarazzini 2016-09-29 16:00:24 UTC
Description of problem:

While testing HA resource behavior inside Newton we do this operations sequence:

1 - Stop Galera;
2 - Poll every minute for the status of the other resources;
3 - Start Galera;

Problem is that Galera failed to be started again:

 ip-172.18.0.11 (ocf::heartbeat:IPaddr2):       Started overcloud-controller-0
 ip-172.20.0.19 (ocf::heartbeat:IPaddr2):       Started overcloud-controller-1
 ip-172.19.0.18 (ocf::heartbeat:IPaddr2):       Started overcloud-controller-2
 ip-172.17.0.11 (ocf::heartbeat:IPaddr2):       Started overcloud-controller-0
 Clone Set: haproxy-clone [haproxy]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Master/Slave Set: galera-master [galera]
     galera     (ocf::heartbeat:galera):        FAILED Master overcloud-controller-2 (unmanaged)
     Slaves: [ overcloud-controller-0 overcloud-controller-1 ]
 ip-172.17.0.19 (ocf::heartbeat:IPaddr2):       Started overcloud-controller-1
 Clone Set: rabbitmq-clone [rabbitmq]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Master/Slave Set: redis-master [redis]
     Masters: [ overcloud-controller-1 ]
     Slaves: [ overcloud-controller-0 overcloud-controller-2 ]
 ip-192.0.2.15  (ocf::heartbeat:IPaddr2):       Started overcloud-controller-2
 openstack-cinder-volume        (systemd:openstack-cinder-volume):      Started overcloud-controller-0

Failed Actions:
* galera_promote_0 on overcloud-controller-2 'unknown error' (1): call=179, status=complete, exitreason='MySQL server failed to start (pid=65378) (rc=0), please check your installation',
    last-rc-change='Wed Sep 28 15:20:24 2016', queued=0ms, exec=12635ms

So basically the promotion on overcloud-controller-2 failed. This can be a race condition, since it does not happen every time, but looking inside the logs could be useful to understand what happened this time.

Comment 1 Raoul Scarazzini 2016-09-29 16:02:38 UTC
sosreports, logs and status are here: http://file.rdu.redhat.com/~rscarazz/BZ1380451/

Comment 3 Damien Ciabrini 2018-07-20 08:34:38 UTC

*** This bug has been marked as a duplicate of bug 1360768 ***


Note You need to log in before you can comment on or make changes to this bug.