Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1512432 - Test bug-1483058-replace-brick-quorum-validation.t fails inconsistently
Summary: Test bug-1483058-replace-brick-quorum-validation.t fails inconsistently
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: tests
Version: 3.12
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Atin Mukherjee
QA Contact:
URL:
Whiteboard:
Depends On: 1511310 1512435
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-13 09:04 UTC by Atin Mukherjee
Modified: 2017-12-19 07:17 UTC (History)
3 users (show)

Fixed In Version: glusterfs-glusterfs-3.12.4
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1511310
Environment:
Last Closed: 2017-12-19 07:17:49 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Atin Mukherjee 2017-11-13 09:04:20 UTC
+++ This bug was initially created as a clone of Bug #1511310 +++

Description of problem:
I ran into this failure [1] during regression runs for patch [2]. On running the test on my local machine, it fails inconsistently. Failed test was:


TEST 15 (line 49): gluster --mode=script --wignore --glusterd-sock=/d/backends/1/glusterd/gd.sock --log-file=/var/log/glusterfs/bug-1483058-replace-brick-quorum-validation.t_cli1.log volume replace-brick patchy 127.1.1.2:/d/backends/2/patchy1 127.1.1.1:/d/backends/1/patchy1_new commit force
volume replace-brick: failed: Quorum not met. Volume operation not allowed.
./tests/bugs/glusterd/bug-1483058-replace-brick-quorum-validation.t .. 15/15 RESULT 15: 1
./tests/bugs/glusterd/bug-1483058-replace-brick-quorum-validation.t .. Failed 1/15 subtests 

Test Summary Report
-------------------
./tests/bugs/glusterd/bug-1483058-replace-brick-quorum-validation.t (Wstat: 0 Tests: 15 Failed: 1)
  Failed test:  15
Files=1, Tests=15, 39 wallclock secs ( 0.03 usr  0.00 sys +  1.74 cusr  0.98 csys =  2.75 CPU)

On looking at one of glusterd logs, I found:
[2017-11-09 06:06:09.387014]:++++++++++ G_LOG:./tests/bugs/glusterd/bug-1483058-replace-brick-quorum-validation.t: TEST: 49 gluster --mode=script --wignore --glusterd-sock=/d/backends/1/glusterd/gd.sock --log-file=/var/log/glusterfs/bug-1483058-replace-brick-quorum-validation.t_cli1.log volume replace-brick patchy 127.1.1.2:/d/backends/2/patchy1 127.1.1.1:/d/backends/1/patchy1_new commit force ++++++++++
The message "I [MSGID: 106487] [glusterd-handler.c:1485:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req" repeated 5 times between [2017-11-09 06:06:03.713593] and [2017-11-09 06:06:09.371221]
[2017-11-09 06:06:09.511510] I [MSGID: 106505] [glusterd-replace-brick.c:67:__glusterd_handle_replace_brick] 0-management: Received replace brick req
[2017-11-09 06:06:09.511673] I [MSGID: 106503] [glusterd-replace-brick.c:148:__glusterd_handle_replace_brick] 0-management: Received replace-brick commit force request.
[2017-11-09 06:06:10.205940] E [MSGID: 106001] [glusterd-replace-brick.c:228:glusterd_op_stage_replace_brick] 0-management: Server quorum not met. Rejecting operation.
[2017-11-09 06:06:10.205972] W [MSGID: 106122] [glusterd-mgmt.c:168:gd_mgmt_v3_pre_validate_fn] 0-management: Replace-brick prevalidation failed.
[2017-11-09 06:06:10.205987] E [MSGID: 106122] [glusterd-mgmt.c:1036:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Replace brick on local node
[2017-11-09 06:06:10.206000] E [MSGID: 106122] [glusterd-replace-brick.c:660:glusterd_mgmt_v3_initiate_replace_brick_cmd_phases] 0-management: Pre Validation Failed

Note that I didn't find any log related to tmp mount done during replace brick. Also glustershd.log didn't reflect that replace brick is succeeded. It had the old brick in graph.

Looking at this log, I fail to understand how [2] could've affected this failure. I am running tests without [2] just to eliminate [2] as the root cause. Will report back once tests are complete.

[1] https://build.gluster.org/job/centos6-regression/7327/console
[2] https://review.gluster.org/18681

Version-Release number of selected component (if applicable):
mainline

How reproducible:
inconsistently

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Raghavendra G on 2017-11-09 01:39 EST ---



--- Additional comment from Raghavendra G on 2017-11-09 01:43:27 EST ---

The reason why I think [2] is not the cause is, glusterd_validate_quorum() doesn't seem to have anything related to the FOP path. If no fops (specifically stat/fstat) are done [2] won't have any impact.

--- Additional comment from Atin Mukherjee on 2017-11-09 03:00:02 EST ---

This is indeed a bad test. The issue is the attributes to check if a peer is up and the quorum is regained are different. 

peer_count checks for peerinfo->status which will be set to connected the moment glusterd receives a RPC_CLNT_CONNECT event from its peer where as the quorum check is calculated based on if peerinfo->quorum_contrib is set to QUORUM_UP which is done at glusterd_friend_sm () and that might happen post RPC_CLNT_CONNECT. In between these two events, if the replace brick commit force is issued, then the same will fail with quorum rejection. I'll see how to handle this scenario in the test and will send the patch soon.

--- Additional comment from Worker Ant on 2017-11-09 12:13:39 EST ---

REVIEW: https://review.gluster.org/18710 (tests: fix bug-1483058-replace-brick-quorum-validation.t spurious failure) posted (#1) for review on master by Atin Mukherjee

--- Additional comment from Worker Ant on 2017-11-12 06:28:49 EST ---

COMMIT: https://review.gluster.org/18710 committed in master by  

------------- tests: fix bug-1483058-replace-brick-quorum-validation.t spurious failure

Change-Id: I04c35305bfb663eabbf715eee78695adfd4a2d20
BUG: 1511310
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>

Comment 1 Worker Ant 2017-11-13 09:05:41 UTC
REVIEW: https://review.gluster.org/18724 (tests: fix bug-1483058-replace-brick-quorum-validation.t spurious failure) posted (#1) for review on release-3.12 by Atin Mukherjee

Comment 2 Worker Ant 2017-11-30 06:48:37 UTC
COMMIT: https://review.gluster.org/18724 committed in release-3.12 by \"Atin Mukherjee\" <amukherj@redhat.com> with a commit message- tests: fix bug-1483058-replace-brick-quorum-validation.t spurious failure

> mainline patch : https://review.gluster.org/#/c/18710/

Change-Id: I04c35305bfb663eabbf715eee78695adfd4a2d20
BUG: 1512432
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
(cherry picked from commit 76a83f98b78a0bdf29bbb0f8e4c9ab74dae52be4)

Comment 3 Jiffin 2017-12-19 07:17:49 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-glusterfs-3.12.4, please open a new bug report.

glusterfs-glusterfs-3.12.4 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-devel/2017-December/054093.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.