Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1066960 - DHT: REBALANCE - Remove-brick without commit followed by rebalance causes migration failures
Summary: DHT: REBALANCE - Remove-brick without commit followed by rebalance causes mig...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: distribute
Version: 2.1
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Nithya Balachandran
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1286123
TreeView+ depends on / blocked
 
Reported: 2014-02-19 11:42 UTC by shylesh
Modified: 2015-11-27 11:35 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1286123 (view as bug list)
Environment:
Last Closed: 2015-11-27 11:35:29 UTC


Attachments (Terms of Use)

Description shylesh 2014-02-19 11:42:08 UTC
Description of problem:
Starting remove-brick and stopping in the middle then followed by a add-brick + rebalance causes migration failures for some of the files

Version-Release number of selected component (if applicable):
3.4.0.59rhs-1.el6rhs.x86_64

How reproducible:
Tried once

Steps to Reproduce:
1. created a distribute volume of 6 bricks
2. create some directories and files on the mount 
3. remove 2 bricks using 
gluster v remove-brick <vol> b1 b2 start 
4. while migration is in progress stop the remove-brick op
gluster v remove-brick <vol> b1 b2 stop
5. added 2 more bricks to the volume and started rebalance

Actual results:
some failures are seen during the migration
 


Additional info:

[root@rhs-client9 ~]# gluster v info dt
 
Volume Name: dt
Type: Distribute
Volume ID: b3bc1409-8f46-48dc-8b50-10c1e66d9528
Status: Started
Number of Bricks: 8
Transport-type: tcp
Bricks:
Brick1: rhs-client9.lab.eng.blr.redhat.com:/home/dt0
Brick2: rhs-client39.lab.eng.blr.redhat.com:/home/dt1
Brick3: rhs-client4.lab.eng.blr.redhat.com:/home/dt2
Brick4: rhs-client9.lab.eng.blr.redhat.com:/home/dt3
Brick5: rhs-client39.lab.eng.blr.redhat.com:/home/dt4  * decommissioned brick
Brick6: rhs-client4.lab.eng.blr.redhat.com:/home/dt5   * decommissioned brick
Brick7: rhs-client9.lab.eng.blr.redhat.com:/home/dt6
Brick8: rhs-client39.lab.eng.blr.redhat.com:/home/dt7


[root@rhs-client9 mnt]# #gluster v remove-brick dt rhs-client39.lab.eng.blr.redhat.com:/home/dt4 rhs-client4.lab.eng.blr.redhat.com:/home/dt5 stop
[root@rhs-client9 mnt]# gluster v rebalance dt status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost              431         4.1GB          1811             0             0            completed             166.00
     rhs-client39.lab.eng.blr.redhat.com              240         1.1GB          1650             9             0            completed             128.00
      rhs-client4.lab.eng.blr.redhat.com              274         1.1GB          1667             4             0            completed             129.00
volume rebalance: dt: success: 



rebalance logs says
======================
[2014-02-19 11:07:35.678282] I [dht-layout.c:646:dht_layout_normalize] 0-dt-dht: found anomalies in /another/2/2/1/0/2/1. holes=4 overlaps=1 missing=2 down=0 misc=0
[2014-02-19 11:07:35.679255] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-dt-client-6: remote operation failed: File exists. Path: /another/2/2/1/0/2/1
[2014-02-19 11:07:35.689081] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-dt-client-7: remote operation failed: File exists. Path: /another/2/2/1/0/2/1
[2014-02-19 11:07:35.730577] I [dht-common.c:2646:dht_setxattr] 0-dt-dht: fixing the layout of /another/2/2/1/0/2/1



[2014-02-19 11:07:37.574121] I [dht-common.c:1142:dht_lookup_linkfile_cbk] 0-dt-dht: lookup of /another/2/2/2/0/file.0 on dt-client-6 (following linkfile) reached link

[2014-02-19 11:07:37.574926] W [dht-common.c:1022:dht_lookup_everywhere_cbk] 0-dt-dht: multiple subvolumes (dt-client-2 and dt-client-6) have file /another/2/2/2/0/file.0 (preferably rename the file in the backend, and do a fresh lookup)
[2014-02-19 11:07:37.575420] W [client-rpc-fops.c:256:client3_3_mknod_cbk] 0-dt-client-6: remote operation failed: File exists. Path: /another/2/2/2/0/file.0
[2014-02-19 11:07:37.575907] W [dht-linkfile.c:44:dht_linkfile_lookup_cbk] 0-dt-dht: got non-linkfile dt-client-6:/another/2/2/2/0/file.0



[2014-02-19 11:07:40.992898] E [dht-rebalance.c:1276:gf_defrag_migrate_data] 0-dt-dht: /another/2/2/2/2/0/file.0: failed to get trusted.distribute.linkinfo key - No s
uch file or directory


[root@rhs-client9 mnt]# getfattr -d -m . -e hex /home/dt*/another/2/2/2/0
getfattr: Removing leading '/' from absolute path names
# file: home/dt0/another/2/2/2/0
trusted.gfid=0x23eae90f876043749c2bc842a3b48bcc
trusted.glusterfs.dht=0x0000000100000000dffffff9ffffffff

# file: home/dt3/another/2/2/2/0
trusted.gfid=0x23eae90f876043749c2bc842a3b48bcc
trusted.glusterfs.dht=0x00000001000000005ffffffd7ffffffb

# file: home/dt6/another/2/2/2/0
trusted.gfid=0x23eae90f876043749c2bc842a3b48bcc
trusted.glusterfs.dht=0x00000001000000001fffffff3ffffffd

[root@rhs-client39 dt4]# getfattr -d -m . -e hex /home/dt*/another/2/2/2/0
getfattr: Removing leading '/' from absolute path names
# file: home/dt1/another/2/2/2/0
trusted.gfid=0x23eae90f876043749c2bc842a3b48bcc
trusted.glusterfs.dht=0x0000000100000000000000001ffffffe

# file: home/dt4/another/2/2/2/0
trusted.gfid=0x23eae90f876043749c2bc842a3b48bcc
trusted.glusterfs.dht=0x00000001000000009ffffffbbffffff9

# file: home/dt7/another/2/2/2/0
trusted.gfid=0x23eae90f876043749c2bc842a3b48bcc
trusted.glusterfs.dht=0x00000001000000003ffffffe5ffffffc

[root@rhs-client4 ~]# getfattr -d -m . -e hex /home/dt*/another/2/2/2/0
getfattr: Removing leading '/' from absolute path names
# file: home/dt2/another/2/2/2/0
trusted.gfid=0x23eae90f876043749c2bc842a3b48bcc
trusted.glusterfs.dht=0x00000001000000007ffffffc9ffffffa

# file: home/dt5/another/2/2/2/0
trusted.gfid=0x23eae90f876043749c2bc842a3b48bcc
trusted.glusterfs.dht=0x0000000100000000bffffffadffffff8

Comment 3 Susant Kumar Palai 2015-11-27 11:35:29 UTC
Cloning this to 3.1. To be fixed in future.


Note You need to log in before you can comment on or make changes to this bug.