Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 986953 - quota: glusterd crash
Summary: quota: glusterd crash
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd
Version: 2.1
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Krutika Dhananjay
QA Contact: Sudhir D
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-07-22 13:44 UTC by Saurabh
Modified: 2016-01-19 06:12 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-09-06 10:53:16 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Saurabh 2013-07-22 13:44:10 UTC
Description of problem:
had a 6x2 volume
rhs nodes[1, 2, 3, 4]
set limit on volume and directories
while I/O going on had put down two nodes

after sometime used gluster volume start <vol> force to bring back bricks

started self heal, before this step the I/O was stopped.


[root@nfs1 ~]# gluster volume quota quota-dist-rep list
                  Path                   Hard-limit Soft-limit   Used  Available
--------------------------------------------------------------------------------
/                                           30GB       90%       7.2GB  22.8GB
/dir2                                        1GB       90%    1023.9MB  64.0KB
/dir3                                        1GB       90%    1022.9MB   1.1MB
/dir4                                        1GB       90%    1023.9MB  64.0KB
/dir5                                        1GB       90%    1022.9MB   1.1MB
/dir6                                        1GB       90%    1022.9MB   1.1MB
/dir7                                        1GB       90%       1.0GB  0Bytes
/dir8                                        1GB       90%     104.0MB 920.0MB
/dir9                                        1GB       90%      0Bytes   1.0GB
/dir10                                       1GB       90%      0Bytes   1.0GB
/dir1                                        2GB       90%    1023.9MB   1.0GB
/bar                                        10MB       90%         N/A     N/A
/foo                                        10MB       90%      95.4MB  0Bytes


Version-Release number of selected component (if applicable):
[root@nfs1 ~]# rpm -qa | grep glusterfs
glusterfs-3.4.0.12rhs.beta4-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.12rhs.beta4-1.el6rhs.x86_64

glusterfs-server-3.4.0.12rhs.beta4-1.el6rhs.x86_64


How reproducible:
happened this time

Actual results:

found core on both node2 and node3,


Status of volume: quota-dist-rep
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick 10.70.37.180:/rhs/bricks/quota-d1r1               49172   Y       23303
Brick 10.70.37.139:/rhs/bricks/quota-d2r2               49172   Y       17100
Brick 10.70.37.180:/rhs/bricks/quota-d3r1               49173   Y       23314
Brick 10.70.37.139:/rhs/bricks/quota-d4r2               49173   Y       17111
Brick 10.70.37.180:/rhs/bricks/quota-d5r1               49174   Y       23325
Brick 10.70.37.139:/rhs/bricks/quota-d6r2               49174   Y       17122
NFS Server on localhost                                 2049    Y       25673
Self-heal Daemon on localhost                           N/A     Y       25680
NFS Server on 10.70.37.139                              2049    Y       18714
Self-heal Daemon on 10.70.37.139                        N/A     Y       18721
 
           Task                                      ID         Status
           ----                                      --         ------
      Rebalance    9e281276-6e32-43d6-8028-d06c80dc3b18              3


(gdb) bt
#0  0x000000396ba328a5 in raise () from /lib64/libc.so.6
#1  0x000000396ba34085 in abort () from /lib64/libc.so.6
#2  0x000000396ba707b7 in __libc_message () from /lib64/libc.so.6
#3  0x000000396ba760e6 in malloc_printerr () from /lib64/libc.so.6
#4  0x000000348f415715 in data_destroy (data=0x7f041c24f200) at dict.c:147
#5  0x000000348f416309 in _dict_set (this=<value optimized out>, key=0x7f041a21b8ff "features.limit-usage", value=0x7f041c25088c, replace=_gf_true) at dict.c:262
#6  0x000000348f41654a in dict_set (this=0x7f041c431144, key=0x7f041a21b8ff "features.limit-usage", value=0x7f041c25088c) at dict.c:334
#7  0x00007f041a1f9ff7 in glusterd_quota_limit_usage (volinfo=0x19ad930, dict=0x7f041c4327b0, op_errstr=0x1c4f538) at glusterd-quota.c:717
#8  0x00007f041a1faf78 in glusterd_op_quota (dict=0x7f041c4327b0, op_errstr=0x1c4f538, rsp_dict=0x7f041c432468) at glusterd-quota.c:1019
#9  0x00007f041a1c6046 in glusterd_op_commit_perform (op=GD_OP_QUOTA, dict=0x7f041c4327b0, op_errstr=0x1c4f538, rsp_dict=0x7f041c432468) at glusterd-op-sm.c:3899
#10 0x00007f041a1c7843 in glusterd_op_ac_commit_op (event=<value optimized out>, ctx=0x7f0410000c70) at glusterd-op-sm.c:3645
#11 0x00007f041a1c3281 in glusterd_op_sm () at glusterd-op-sm.c:5309
#12 0x00007f041a1b137d in __glusterd_handle_commit_op (req=0x7f041a12602c) at glusterd-handler.c:750
#13 0x00007f041a1ae53f in glusterd_big_locked_handler (req=0x7f041a12602c, actor_fn=0x7f041a1b1280 <__glusterd_handle_commit_op>) at glusterd-handler.c:75
#14 0x000000348f447292 in synctask_wrap (old_task=<value optimized out>) at syncop.c:131
#15 0x000000396ba43b70 in ?? () from /lib64/libc.so.6
#16 0x0000000000000000 in ?? ()
(gdb) 


Expected results:
crash is not expected.

Additional info:
script used for creating data
#!/bin/bash
set -x

create-data()
{
for i in `seq 1 10`
do
while [ 1 ]
do
cmd=`dd if=/dev/urandom of=dir$i/$(date +%s) bs=1024 count=1024 2>&1`
echo $cmd
if [ "$(echo $cmd | awk '/Disk quota exceeded/')" ] 
then
   echo "quota limit reached"
   break
fi
done
done
return 1
}

create-data

Comment 4 Krutika Dhananjay 2013-07-23 05:11:18 UTC
Looking at the backtrace, it seems to me that the cause of this crash is the same as the cause of the crash in https://bugzilla.redhat.com/show_bug.cgi?id=983544.

CAUSE:

This happens because in the earlier code (in function glusterd_quota_limit_usage()), the pointer @quota_limits pointed to a location that is pointed to by the 'value' for key='features.limit-usage' in volinfo->dict. At some point in time, we do a GF_FREE on quota_limits. This implies that the 'value' in volinfo->dict gets freed as well, making 'value' a dangling pointer. Now some time later, we do a dict_set_str on key='features.limit-usage' in this same function, which tries to GF_FREE the object pointed to by 'value' before making it point to the new value. This causes the process to crash. In the end, this bug is a case of process crash due to double free.




The fix for 983544 is available in glusterfs-3.4.0.12rhs.beta5. Could you please check if this bug is valid in the latest version, i.e., glusterfs-3.4.0.12rhs.beta5?

Comment 5 Krutika Dhananjay 2013-09-02 06:59:14 UTC
As per the root cause analysis in comment #4, the bug was fixed as part of the build glusterfs-3.4.0.12rhs.beta5. This is very much true with respect to the new design as well. Hence moving the state of the bug to ON_QA.


Note You need to log in before you can comment on or make changes to this bug.