Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1067028 - CPG membership may be inconsistent after node pause
Summary: CPG membership may be inconsistent after node pause
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: corosync
Version: 7.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: Jan Friesse
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-02-19 14:25 UTC by Jan Friesse
Modified: 2014-06-18 00:31 UTC (History)
4 users (show)

Fixed In Version: corosync-2.3.3-2.el7
Doc Type: Bug Fix
Doc Text:
Cause: Corosync on one of node is paused. On other nodes, cpg clients are killed. Consequence: Corosync on paused node after resume never finds out that on other nodes cpg clients were killed and still believes they are alive. Membership (cpg one) is different between nodes. Fix: Make sure that corosync properly updates it's internal informations about other nodes cpg clients. Result: Killed cpg clients are properly removed from internal structures so cpg membership is equivalent between nodes.
Clone Of:
: 1067043 (view as bug list)
Environment:
Last Closed: 2014-06-13 12:27:43 UTC
Target Upstream Version:


Attachments (Terms of Use)
Proposed patch - part 1 - cpg: Refactor mh_req_exec_cpg_procleave (deleted)
2014-02-19 14:27 UTC, Jan Friesse
no flags Details | Diff
Proposed patch - part 2 - cpg: Make sure nodid is always logged as hex num (deleted)
2014-02-19 14:28 UTC, Jan Friesse
no flags Details | Diff
Proposed patch - part3 - cpg: Make sure left nodes are really removed (deleted)
2014-02-19 14:29 UTC, Jan Friesse
no flags Details | Diff

Description Jan Friesse 2014-02-19 14:25:23 UTC
Description of problem:
When node is paused and other nodes has in meantime exited cpg process,
paused node after resume doesn't update it's membership correctly so on
previously paused node exited cpg process is still visible.

Version-Release number of selected component (if applicable):
Any

How reproducible:
100%

Steps to Reproduce:
1. Start 3 nodes and on every node execute cpg client (for example testcpg)
2. Pause one of node
3. On another node, stop and execute cpg client again
4. Unpause paused node
5. On paused node membership will be different then on other nodes (probe by corosync-cpgtool). Paused node will have 4 members (2 will be from node where cpg client was stopped and executed), other nodes will have 3 members (each node has exactly one client)

Actual results:
Membership contains non-existing process

Expected results:
Membership contains only existing processes

Additional info:

Comment 1 Jan Friesse 2014-02-19 14:27:10 UTC
Created attachment 865089 [details]
Proposed patch - part 1 - cpg: Refactor mh_req_exec_cpg_procleave

Comment 2 Jan Friesse 2014-02-19 14:28:45 UTC
Created attachment 865090 [details]
Proposed patch - part 2 - cpg: Make sure nodid is always logged as hex num

Comment 3 Jan Friesse 2014-02-19 14:29:15 UTC
Created attachment 865091 [details]
Proposed patch - part3 - cpg: Make sure left nodes are really removed

Comment 9 Ludek Smid 2014-06-13 12:27:43 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.


Note You need to log in before you can comment on or make changes to this bug.