Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1065501 - AFR: Crash on client when creating files for self heal of 50k files testcase.
Summary: AFR: Crash on client when creating files for self heal of 50k files testcase.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: replicate
Version: 2.1
Hardware: All
OS: Linux
high
high
Target Milestone: ---
: RHGS 3.0.3
Assignee: Anuradha
QA Contact: Ben Turner
URL:
Whiteboard:
Depends On:
Blocks: 1035040 1162694
TreeView+ depends on / blocked
 
Reported: 2014-02-14 19:39 UTC by Ben Turner
Modified: 2016-09-20 02:00 UTC (History)
11 users (show)

Fixed In Version: glusterfs-3.6.0.28-1
Doc Type: Known Issue
Doc Text:
While self-heal is in progress on a mount, the mount may crash if cluster.data-self-heal is changed from "off" to "on" using volume set operation. Workaround: Ensure that no self-heals are required on the volume before changing the cluster.data-self-heal.
Clone Of:
Environment:
Last Closed: 2015-01-15 13:37:10 UTC


Attachments (Terms of Use)
sosreport from client. (deleted)
2014-02-14 19:44 UTC, Ben Turner
no flags Details
core (deleted)
2014-02-14 19:44 UTC, Ben Turner
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:0038 normal SHIPPED_LIVE Red Hat Storage 3.0 enhancement and bug fix update #3 2015-01-15 18:35:28 UTC

Description Ben Turner 2014-02-14 19:39:14 UTC
Description of problem:

When running the testcase "Test self-heal of 50k files (self-heal-daemon)" there was a crash when creating data.  Here is what I saw in the shell:

32768 bytes (33 kB) copied, 0.00226773 s, 14.4 MB/s
1+0 records in
1+0 records out
32768 bytes (33 kB) copied, 0.00233886 s, 14.0 MB/s
dd: opening `/gluster-mount/small/37773.small': Software caused connection abort
dd: opening `/gluster-mount/small/37774.small': Transport endpoint is not connected
dd: opening `/gluster-mount/small/37775.small': Transport endpoint is not connected

And in the gluster mount logs:

client-0 to healtest-client-1,  metadata - Pending matrix:  [ [ 0 2 ] [ 0 0 ] ], on /small/37757.small
[2014-02-14 18:56:15.169667] I [afr-self-heal-common.c:2906:afr_log_self_heal_completion_status] 0-healtest-replicate-0:  metadata self heal  is successfully completed,   metadata self heal from source healtest-client-0 to healtest-client-1,  metadata - Pending matrix:  [ [ 0 2 ] [ 0 0 ] ], on /small/37771.small
[2014-02-14 18:56:15.275690] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2014-02-14 18:56:15.276117] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2014-02-14 18:56:15.278740] I [dht-shared.c:311:dht_init_regex] 0-healtest-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$
[2014-02-14 18:56:15.278975] I [glusterfsd-mgmt.c:1379:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
[2014-02-14 18:56:15.279009] I [glusterfsd-mgmt.c:1379:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
pending frames:
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(0) op(0)
frame : type(0) op(0)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2014-02-14 18:56:15configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.0.59rhs
/lib64/libc.so.6(+0x32920)[0x7fd0fb464920]
/usr/lib64/glusterfs/3.4.0.59rhs/xlator/cluster/replicate.so(afr_sh_data_lock_rec+0x77)[0x7fd0f53a9a27]
/usr/lib64/glusterfs/3.4.0.59rhs/xlator/cluster/replicate.so(afr_sh_data_open_cbk+0x178)[0x7fd0f53ab398]
/usr/lib64/glusterfs/3.4.0.59rhs/xlator/protocol/client.so(client3_3_open_cbk+0x18b)[0x7fd0f560e82b]
/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x7fd0fc1a7f45]
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x147)[0x7fd0fc1a9507]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x7fd0fc1a4d88]
/usr/lib64/glusterfs/3.4.0.59rhs/rpc-transport/socket.so(+0x8d86)[0x7fd0f7a44d86]
/usr/lib64/glusterfs/3.4.0.59rhs/rpc-transport/socket.so(+0xa69d)[0x7fd0f7a4669d]
/usr/lib64/libglusterfs.so.0(+0x61ad7)[0x7fd0fc413ad7]
/usr/sbin/glusterfs(main+0x5f8)[0x4068b8]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7fd0fb450cdd]
/usr/sbin/glusterfs[0x4045c9]
---------


Version-Release number of selected component (if applicable):

glusterfs 3.4.0.59rhs

How reproducible:

I have only seen this crash once out of 2-3 runs of this or very similar testcases. 

Steps to Reproduce:

I hit this during a batch run of automated testcases, they were:

TCMS - 198855 223406 226909 226912 237832 238530 238539

The testcase that saw the crash was 238530, 

1.  Create a 1x2 volume across 2 nodes.

2.  Set volume option 'self-heal-daemon'  to value “off” using the command “gluster volume set <vol_name> self-heal-daemon off” from one of the storage node. 
 
3.  Bring down all bricks processes offline on a node.
 
4.  Create 50k files with:

            mkdir -p $MOUNT-POINT/small
            for i in `seq 1 $3`; do
                dd if=/dev/zero of=$MOUNT-POINT/small/$i.small bs=$4 count=1
            done


Actual results:

Crash on the client during file creation.

Expected results:

No crash.

Additional info:

I was only able to get the core file and sosreport from the client before the hosts were reclaimed.  I'll attempt to repro again for more data.

Comment 1 Ben Turner 2014-02-14 19:44:31 UTC
Created attachment 863404 [details]
sosreport from client.

Comment 2 Ben Turner 2014-02-14 19:44:54 UTC
Created attachment 863405 [details]
core

Comment 8 Shalaka 2014-02-18 11:20:25 UTC
Please review the edited doc text and sign off.

Comment 10 Anuradha 2014-10-27 07:16:04 UTC
This bug was fixed as part of a rebase for Denali.

Comment 11 Ben Turner 2014-12-15 19:07:57 UTC
Verified on glusterfs-3.6.0.38-1.

Comment 13 errata-xmlrpc 2015-01-15 13:37:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0038.html


Note You need to log in before you can comment on or make changes to this bug.