Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1065221 - add-brick : After add-brick and rebalance observing more number of files from mount point
Summary: add-brick : After add-brick and rebalance observing more number of files from...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: distribute
Version: 2.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Nithya Balachandran
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1286124
TreeView+ depends on / blocked
 
Reported: 2014-02-14 07:16 UTC by spandura
Modified: 2015-11-27 11:36 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1286124 (view as bug list)
Environment:
Last Closed: 2015-11-27 11:36:03 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description spandura 2014-02-14 07:16:07 UTC
Description of problem:
==========================
On a 2 x 3 distribute - replicate volume when all the bricks were 100% filled and the disk usage of volume was 100% full, added 3 more bricks to the volume to change the volume type to 3 x 3. Started rebalance on the volume. Before performing add-brick the "rep-0-client-1 and rep-1-client-0" brick ports i.e. 49152 were blocked 

After rebalance , we are observing more number of files in the mount point. Also when the disk got filled all the IO's were stopped. 

Version-Release number of selected component (if applicable):
============================================================
glusterfs 3.4.0.59rhs built on Feb  4 2014 08:44:13

How reproducible:


Steps to Reproduce:
====================
1.Create 2 x 3 distribute replicate volume (ec2 instances on AWS). start the volume. 

2. From fuse mount create files/directories. 

3. Let the volume disk usage be 100%.Stop all the IO's happening from mount. 

4. Collect the number of files/directories existing from mount point 

5. Block rep-0-client-1 and rep-1-client-0 brick ports. (49152)

6. Add 3 more bricks to the volume. Start rebalance. 

7. After rebalance is complete, Collect the number of files/directories existing from mount point

Actual results:
====================
Compare the number of files/directories before and after rebalance. After rebalance we are seeing more number of files. The more number of files are from the duplicate entries for the same file. The same file exist in both the sub-volumes after the successful completion of rebalance without any failures. 

Number of files on mount before rebalance:
===========================================
root@ip-10-168-33-85 [Feb-13-2014- 8:24:58] >find /mnt/exporter/ | wc
  20329   20329 1211913
root@ip-10-168-33-85 [Feb-13-2014- 8:25:21] >
root@ip-10-168-33-85 [Feb-13-2014- 8:27:47] >find /mnt/exporter/ -type f | wc
  19542   19542 1170048
root@ip-10-168-33-85 [Feb-13-2014- 8:28:07] >
root@ip-10-168-33-85 [Feb-13-2014- 8:41:33] >find /mnt/exporter/ -type d| wc
    787     787   41865
root@ip-10-168-33-85 [Feb-13-2014- 8:42:12] >

Number of files on mount after rebalance :
===========================================
root@ip-10-168-33-85 [Feb-13-2014-16:56:39] >find /mnt/exporter/ | wc
  20389   20389 1215369
root@ip-10-168-33-85 [Feb-13-2014-16:59:20] >find /mnt/exporter/ -type f | wc
  19602   19602 1173504
root@ip-10-168-33-85 [Feb-13-2014-17:03:55] >find /mnt/exporter/ -type d | wc
    787     787   41865

Observed 60 files extra after rebalance

Expected results:
===================
The number of files/directories should be same before and after rebalance

Additional info:
===================

root@ip-10-168-193-243 [Feb-14-2014- 6:47:54] >gluster v info exporter
 
Volume Name: exporter
Type: Distributed-Replicate
Volume ID: 13d2f482-7fa8-45ae-b236-00a76625c5dc
Status: Started
Number of Bricks: 3 x 3 = 9
Transport-type: tcp
Bricks:
Brick1: ip-10-168-193-243.us-west-1.compute.internal:/rhs/bricks/exporter
Brick2: ip-10-170-141-106.us-west-1.compute.internal:/rhs/bricks/exporter
Brick3: ip-10-178-22-237.us-west-1.compute.internal:/rhs/bricks/exporter
Brick4: ip-10-168-207-167.us-west-1.compute.internal:/rhs/bricks/exporter
Brick5: ip-10-171-49-143.us-west-1.compute.internal:/rhs/bricks/exporter
Brick6: ip-10-171-121-88.us-west-1.compute.internal:/rhs/bricks/exporter
Brick7: ip-10-178-66-57.us-west-1.compute.internal:/rhs/bricks/exporter
Brick8: ip-10-170-250-206.us-west-1.compute.internal:/rhs/bricks/exporter
Brick9: ip-10-176-19-140.us-west-1.compute.internal:/rhs/bricks/exporter
root@ip-10-168-193-243 [Feb-14-2014- 6:48:00] >gluster v status exporter
Status of volume: exporter
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick ip-10-168-193-243.us-west-1.compute.internal:/rhs
/bricks/exporter					49152	Y	30470
Brick ip-10-170-141-106.us-west-1.compute.internal:/rhs
/bricks/exporter					49152	Y	18199
Brick ip-10-178-22-237.us-west-1.compute.internal:/rhs/
bricks/exporter						49152	Y	18628
Brick ip-10-168-207-167.us-west-1.compute.internal:/rhs
/bricks/exporter					49152	Y	18748
Brick ip-10-171-49-143.us-west-1.compute.internal:/rhs/
bricks/exporter						49152	Y	18713
Brick ip-10-171-121-88.us-west-1.compute.internal:/rhs/
bricks/exporter						49152	Y	18414
Brick ip-10-178-66-57.us-west-1.compute.internal:/rhs/b
ricks/exporter						49152	Y	15332
Brick ip-10-170-250-206.us-west-1.compute.internal:/rhs
/bricks/exporter					49152	Y	15414
Brick ip-10-176-19-140.us-west-1.compute.internal:/rhs/
bricks/exporter						49152	Y	15423
NFS Server on localhost					2049	Y	32740
Self-heal Daemon on localhost				N/A	Y	32747
NFS Server on ip-10-176-19-140.us-west-1.compute.intern
al							2049	Y	15475
Self-heal Daemon on ip-10-176-19-140.us-west-1.compute.
internal						N/A	Y	15482
NFS Server on ip-10-178-22-237.us-west-1.compute.intern
al							2049	Y	5333
Self-heal Daemon on ip-10-178-22-237.us-west-1.compute.
internal						N/A	Y	5340
NFS Server on ip-10-168-207-167.us-west-1.compute.inter
nal							2049	Y	5510
Self-heal Daemon on ip-10-168-207-167.us-west-1.compute
.internal						N/A	Y	5517
NFS Server on ip-10-178-66-57.us-west-1.compute.interna
l							2049	Y	24919
Self-heal Daemon on ip-10-178-66-57.us-west-1.compute.i
nternal							N/A	Y	25226
NFS Server on ip-10-170-141-106.us-west-1.compute.inter
nal							2049	Y	3619
Self-heal Daemon on ip-10-170-141-106.us-west-1.compute
.internal						N/A	Y	3626
NFS Server on ip-10-171-49-143.us-west-1.compute.intern
al							2049	Y	4591
Self-heal Daemon on ip-10-171-49-143.us-west-1.compute.
internal						N/A	Y	4598
NFS Server on ip-10-170-250-206.us-west-1.compute.inter
nal							2049	Y	15466
Self-heal Daemon on ip-10-170-250-206.us-west-1.compute
.internal						N/A	Y	15473
NFS Server on ip-10-171-121-88.us-west-1.compute.intern
al							2049	Y	5938
Self-heal Daemon on ip-10-171-121-88.us-west-1.compute.
internal						N/A	Y	5945
 
Task Status of Volume exporter
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : 2148f874-bcba-4663-9685-f5ab86876aad
Status               : completed           
 
root@ip-10-168-193-243 [Feb-14-2014- 6:48:04] >gluster v rebalance exporter status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             2             0             0            completed               0.00
ip-10-170-141-106.us-west-1.compute.internal                0        0Bytes         19542             0             0            completed             253.00
ip-10-178-22-237.us-west-1.compute.internal                0        0Bytes             2             0             0            completed               0.00
ip-10-168-207-167.us-west-1.compute.internal             2381       119.7GB         21962             0          1017            completed           13737.00
ip-10-171-49-143.us-west-1.compute.internal              132         6.4GB          1265             0            73            completed            1189.00
ip-10-171-121-88.us-west-1.compute.internal                0        0Bytes            77             0             0            completed               1.00
ip-10-178-66-57.us-west-1.compute.internal                0        0Bytes             2             0             0            completed               0.00
ip-10-170-250-206.us-west-1.compute.internal                0        0Bytes            77             0             0            completed               1.00
ip-10-176-19-140.us-west-1.compute.internal                0        0Bytes           127             0             0            completed               2.00
volume rebalance: exporter: success: 
root@ip-10-168-193-243 [Feb-14-2014- 6:48:11] >


rep-0-client-1
==============
root@ip-10-170-141-106 [Feb-14-2014- 6:50:36] >iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
DROP       tcp  --  anywhere             anywhere            tcp dpt:49152 
DROP       tcp  --  anywhere             anywhere            tcp dpt:49153 

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
root@ip-10-170-141-106 [Feb-14-2014- 6:50:38] >


rep-1-client-0
==================
root@ip-10-168-207-167 [Feb-14-2014- 6:51:05] >iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
DROP       tcp  --  anywhere             anywhere            tcp dpt:49152 
DROP       tcp  --  anywhere             anywhere            tcp dpt:49153 

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
root@ip-10-168-207-167 [Feb-14-2014- 6:51:07] >

Comment 3 Susant Kumar Palai 2015-11-27 11:36:03 UTC
Cloning this to 3.1. To be fixed in future.


Note You need to log in before you can comment on or make changes to this bug.