Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1360576 - [Disperse volume]: IO hang seen on mount with file ops
Summary: [Disperse volume]: IO hang seen on mount with file ops
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: disperse
Version: 3.8.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On: 1329466 1330132 1330997 1342426 1344836
Blocks: 1361402
TreeView+ depends on / blocked
 
Reported: 2016-07-27 05:17 UTC by Pranith Kumar K
Modified: 2016-08-12 09:48 UTC (History)
11 users (show)

Fixed In Version: glusterfs-3.8.2
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1344836
: 1361402 (view as bug list)
Environment:
Last Closed: 2016-08-12 09:48:11 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Comment 1 Pranith Kumar K 2016-07-27 05:20:23 UTC
This is an issue we observed in internal testing:
The locks were getting acquired at the time when bricks were going down because of ping timeouts. 4 of the 6 bricks went down at that time. 2 of the 6 bricks have locks which are not being unlocked for some reason and were left stale.

Steps to recreate the issue:
1) create a plain disperse volume
2) Put a breakpoint at ec_wind_inodelk
3) From the fuse mount issue ls -laR <mount>
4) as soon as the break point is hit in gdb, from other terminal kill 4 of the 6 bricks
5) quit gdb
6) Wait for a second or two to confirm that there are stale locks on the remaining bricks
7) In my case there were, so I issued ls -laR on the mount and it hung.

Relevant logs to come to this conclustion(These failures were on disperse-2 of 6=4+2 setup):
[2016-06-10 17:21:44.690734] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7feed0cd5c32] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7feed0aa084e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7feed0aa095e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7feed0aa22ea] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7feed0aa2b18] ))))) 0-ec-nfsganesha-client-15: forced unwinding frame type(GlusterFS 3.3) op(INODELK(29)) called at 2016-06-10 17:21:44.537422 (xid=0x274d7)

[2016-06-10 17:21:44.771235] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7feed0cd5c32] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7feed0aa084e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7feed0aa095e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7feed0aa22ea] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7feed0aa2b18] ))))) 0-ec-nfsganesha-client-17: forced unwinding frame type(GlusterFS 3.3) op(INODELK(29)) called at 2016-06-10 17:21:44.537520 (xid=0x2740b)

[2016-06-10 17:21:44.773164] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7feed0cd5c32] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7feed0aa084e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7feed0aa095e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7feed0aa22ea] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7feed0aa2b18] ))))) 0-ec-nfsganesha-client-16: forced unwinding frame type(GlusterFS 3.3) op(INODELK(29)) called at 2016-06-10 17:21:44.537487 (xid=0x2740b)

[2016-06-10 17:21:44.808576] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7feed0cd5c32] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7feed0aa084e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7feed0aa095e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7feed0aa22ea] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7feed0aa2b18] ))))) 0-ec-nfsganesha-client-14: forced unwinding frame type(GlusterFS 3.3) op(INODELK(29)) called at 2016-06-10 17:21:44.537377 (xid=0x2740d)

Comment 2 Vijay Bellur 2016-07-27 05:53:04 UTC
REVIEW: http://review.gluster.org/15025 (cluster/ec: Unlock stale locks when inodelk/entrylk/lk fails) posted (#1) for review on release-3.8 by Pranith Kumar Karampuri (pkarampu@redhat.com)

Comment 3 Vijay Bellur 2016-07-29 10:50:52 UTC
COMMIT: http://review.gluster.org/15025 committed in release-3.8 by Xavier Hernandez (xhernandez@datalab.es) 
------
commit e641ac9444d04399761a46ac6b05f28e5231c66e
Author: Pranith Kumar K <pkarampu@redhat.com>
Date:   Sat Jun 11 18:43:42 2016 +0530

    cluster/ec: Unlock stale locks when inodelk/entrylk/lk fails
    
    Thanks to Rafi for hinting a while back that this kind of
    problem he saw once. I didn't think the theory was valid.
    Could have caught it earlier if I had tested his theory.
    
     >Change-Id: Iac6ffcdba2950aa6f8cf94f8994adeed6e6a9c9b
     >BUG: 1344836
     >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
     >Reviewed-on: http://review.gluster.org/14703
     >Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
     >Smoke: Gluster Build System <jenkins@build.gluster.org>
     >Tested-by: mohammed rafi  kc <rkavunga@redhat.com>
     >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
     >CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    
    BUG: 1360576
    Change-Id: If9ccf0b3db7159b87ddcdc7b20e81cde8c3c76f0
    Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
    Reviewed-on: http://review.gluster.org/15025
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>

Comment 4 Niels de Vos 2016-08-12 09:48:11 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.2, please open a new bug report.

glusterfs-3.8.2 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://www.gluster.org/pipermail/announce/2016-August/000058.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.