Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1367813 - Shutting down N-1 nodes at once causes cluster with lms qdevice to lose quorum
Summary: Shutting down N-1 nodes at once causes cluster with lms qdevice to lose quorum
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: corosync
Version: 7.3
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Jan Friesse
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 614122
TreeView+ depends on / blocked
 
Reported: 2016-08-17 14:28 UTC by Roman Bednář
Modified: 2016-11-04 06:50 UTC (History)
6 users (show)

Fixed In Version: corosync-2.4.0-4.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-04 06:50:09 UTC


Attachments (Terms of Use)
Proposed patch (deleted)
2016-08-25 14:58 UTC, Jan Friesse
no flags Details | Diff
Patch with slightly better English comments (deleted)
2016-08-30 14:26 UTC, Christine Caulfield
no flags Details | Diff
Man: Fix corosync-qdevice-net-certutil link (deleted)
2016-08-31 09:21 UTC, Jan Friesse
no flags Details | Diff
man: mention qdevice incompatibilites in votequorum.5 (deleted)
2016-08-31 09:22 UTC, Jan Friesse
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:2463 normal SHIPPED_LIVE corosync bug fix and enhancement update 2016-11-03 14:06:04 UTC

Description Roman Bednář 2016-08-17 14:28:46 UTC
Description of problem:

See subject.
Also when the nodes are shut down one by one with a delay, this issue does not occur and cluster retains quorum as expected, even with only one node online.

Version-Release number of selected component (if applicable):
corosync-qnetd-2.4.0-3.el7.x86_64
corosynclib-2.4.0-3.el7.x86_64
corosync-qdevice-2.4.0-3.el7.x86_64
corosync-2.4.0-3.el7.x86_64
pcs-0.9.152-6.el7.x86_64

How reproducible:
always

Steps to Reproduce:
1) have a 3 node cluster with qdevice setup on separate node, set to lms algorithm

2) kill 2 nodes at once, quorum is 3 at this point and it seem that we loose a vote from qdevice causing cluster to lose quorum

3) cluster and qdevice status:

# pcs qdevice status net
QNetd address:			*:5403
TLS:				Supported (client certificate required)
Connected clients:		1
Connected clusters:		1
Cluster "STSRHTS10485":
    Algorithm:		LMS
    Tie-breaker:	Node with lowest node ID
    Node ID 1:
        Client address:		::ffff:192.168.0.137:47712
        Configured node list:	1, 2, 3
        Membership node list:	1
        Vote:			NACK (NACK

# pcs quorum status
Quorum information
------------------
Date:             Wed Aug 17 16:09:55 2016
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          1
Ring ID:          1/192
Quorate:          No

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      1
Quorum:           3 Activity blocked
Flags:            Qdevice 

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
         1          1   A,NV,NMW virt-136 (local)
         0          0            Qdevice (votes 2)


# pcs status
Cluster name: STSRHTS10485
Stack: corosync
Current DC: virt-136 (version 1.1.15-9.el7-e174ec8) - partition WITHOUT quorum
Last updated: Wed Aug 17 15:49:02 2016		Last change: Tue Aug 16 17:41:30 2016 by root via crm_node on virt-136

3 nodes and 12 resources configured

Node virt-139: UNCLEAN (offline)
Node virt-140: UNCLEAN (offline)
Online: [ virt-136 ]

Full list of resources:

 fence-virt-136	(stonith:fence_xvm):	Started virt-136
 fence-virt-139	(stonith:fence_xvm):	Started virt-139 (UNCLEAN)
 fence-virt-140	(stonith:fence_xvm):	Started virt-140 (UNCLEAN)
 fence-virt-141	(stonith:fence_xvm):	Started virt-139 (UNCLEAN)
 Clone Set: dlm-clone [dlm]
     dlm	(ocf::pacemaker:controld):	Started virt-139 (UNCLEAN)
     dlm	(ocf::pacemaker:controld):	Started virt-140 (UNCLEAN)
     Started: [ virt-136 ]
 Clone Set: clvmd-clone [clvmd]
     clvmd	(ocf::heartbeat:clvm):	Started virt-139 (UNCLEAN)
     clvmd	(ocf::heartbeat:clvm):	Started virt-140 (UNCLEAN)
     Started: [ virt-136 ]
 IP	(ocf::heartbeat:IPaddr2):	Started virt-136
 Webserver	(ocf::heartbeat:apache):	Started virt-136

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


# pcs quorum device status
Qdevice information
-------------------
Model:			Net
Node ID:		1
Configured node list:
    0	Node ID = 1
    1	Node ID = 2
    2	Node ID = 3
Membership node list:	1

Qdevice-net information
----------------------
Cluster name:		STSRHTS10485
QNetd host:		192.168.0.136:5403
Algorithm:		LMS
Tie-breaker:		Node with lowest node ID
State:			Connected
======================================================

Actual results:
quorum lost

Expected results:
quorum should be retained since we have qdevice connection from/to remaining node. This is basically the purpose of qdevice with such setup and it's an advantage from 'standard' LMS cluster.

Comment 2 Jan Friesse 2016-08-17 15:38:36 UTC
Reassigning to Chrissie because LMS is her field.

Comment 4 Jan Friesse 2016-08-22 08:32:21 UTC
@Martin: Is the same problem happening also with ffsplit?

Comment 5 Jan Friesse 2016-08-22 08:40:08 UTC
QE guys, debug logs would be helpful. Qnetd logs to syslog but it has to be configured so open /etc/sysconfig/corosync-qnetd and change line:

COROSYNC_QNETD_OPTIONS=""

to

COROSYNC_QNETD_OPTIONS="-dd"

Qdevice logs are depending on corosync.conf configuration but generally, syslog is enabled so please use:

logging {
...
        logger_subsys {
                subsys: QDEVICE
                debug: on
        }
...
}

configuration in /etc/corosync/corosync.conf before reporting problems.

Comment 6 Jan Friesse 2016-08-22 08:54:19 UTC
Also comment to both "reports". I'm unable to reproduce any one of them (trying killall -9 corosync or sysrg trigger). Would you mind to share corosync.conf (together with debug logs)?

Comment 8 Jan Friesse 2016-08-25 14:22:15 UTC
As discussed with Martin, I was able to reproduce the issue. It is really needed to crash node rather than just stop corosync and/or qdevice. Basically what happens:
- Node 1 dies but disconnect cannot be send
- Node 2 finds out node 1 is dead and starts forming new membership sending membership change to qnetd
- Qnetd LMS algo sees Node 1 as alive and Node 2 as split but not leader -> sends NACK to Node 2
- Eventually Qnetd finds out Node 2 died

Solution used in ffsplit is that qnetd_algo_lms_client_disconnect is handled and current status is revalued. This is probably not a good choice for lms, because lms keeps vote (if has one) till change (to overcome problem with accidental disconnect from qnetd).

Comment 9 Jan Friesse 2016-08-25 14:58:31 UTC
Created attachment 1194051 [details]
Proposed patch

Solves situation when in 2 node cluster tie-breaker node dies. Because
code contains two bugs, other node got NACK instead of ACK.

- Algo timer is not stack, so calling abort and schedule in timer
callback without setting reschedule is noop.
- It's needed to check not only what current node thinks about
membership, but also what other nodes thinks. If views diverge -> wait.

Comment 10 Jan Friesse 2016-08-26 07:14:42 UTC
Just a note, I'm still unable to reproduce first bug reported by Roman. Roman, can you please paste a logs as Martin did?

Comment 12 Jan Friesse 2016-08-30 07:12:05 UTC
Martine,
thanks for logs. For the next time, please make sure to set

logging {
...
        logger_subsys {
                subsys: QDEVICE
                debug: on
        }
...
}

(please note subsys: QDEVICE not subsys: VOTEQ). Anyway, I kind of believe that proposed patch solves also this problem. You mind to test scratch build?

Comment 16 Jan Friesse 2016-08-30 12:32:54 UTC
Sounds great, thanks for testing!

Comment 17 Christine Caulfield 2016-08-30 14:26:24 UTC
Created attachment 1195934 [details]
Patch with slightly better English comments

ACK to the patch, thanks for spotting that. 

I've fixed the English in the comments somewhat but the logic seems fine to me.

Comment 18 Jan Friesse 2016-08-30 15:00:50 UTC
Chrissie, thanks for review. Path is now in upstream as b0c850f308d44ddcdf1a1f881c1e1142ad489385

Comment 19 Jan Friesse 2016-08-31 09:21:58 UTC
Created attachment 1196271 [details]
Man: Fix corosync-qdevice-net-certutil link

Comment 20 Jan Friesse 2016-08-31 09:22:23 UTC
Created attachment 1196272 [details]
man: mention qdevice incompatibilites in votequorum.5

Comment 24 errata-xmlrpc 2016-11-04 06:50:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2463.html


Note You need to log in before you can comment on or make changes to this bug.