Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.

Bug 1479446

Summary: Rebalance estimate(ETA) shows wrong details(as intial message of 10min wait reappears) when still in progress
Product: Red Hat Gluster Storage Reporter: nchilaka <nchilaka>
Component: distributeAssignee: Nithya Balachandran <nbalacha>
Status: CLOSED ERRATA QA Contact: Prasad Desala <tdesala>
Severity: high Docs Contact:
Priority: medium    
Version: rhgs-3.3CC: amukherj, apaladug, nchilaka, pasik, rhs-bugs, sanandpa, saraut, sheggodu, storage-qa-internal, tdesala
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.4.z Batch Update 2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.12.2-27 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1479528 (view as bug list) Environment:
Last Closed: 2018-12-17 17:07:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1479528, 1511271    

Description nchilaka 2017-08-08 14:50:56 UTC
Description of problem:
==============================
I did a removebrick operation to convert 2x2 to 1x2 , while IOs were going on from 3 different ganesha mounts.

I noticed that at a later stage(may be >80% completed), the message of "The estimated time for rebalance to complete will be unavailable for the first 10 minutes." appears again. 

I thinks this comes when the rebalance estimated time is over, but rebalance as such is not yet completed 







Last login: Tue Aug  8 19:32:38 2017 from 10.70.35.77
[root@dhcp46-42 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost             5145         7.4MB         10594             0             0          in progress        0:06:38
       dhcp46-101.lab.eng.blr.redhat.com             4142        21.7MB          8722             0             0          in progress        0:06:38
The estimated time for rebalance to complete will be unavailable for the first 10 minutes.
volume rebalance: nrep2: success
[root@dhcp46-42 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost             5993        31.3MB         11970             0             0          in progress        0:08:38
       dhcp46-101.lab.eng.blr.redhat.com             5050        26.6MB         10415             0             0          in progress        0:08:38
The estimated time for rebalance to complete will be unavailable for the first 10 minutes.
volume rebalance: nrep2: success

[root@dhcp46-42 ~]# gluster v rebal nrep2 status                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost             8059        62.0MB         16022             0             0          in progress        0:13:13
       dhcp46-101.lab.eng.blr.redhat.com             7208        76.2MB         14071             0             0          in progress        0:13:13
Estimated time left for rebalance to complete :        0:47:28
volume rebalance: nrep2: success
[root@dhcp46-42 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            10699       110.9MB         21188             0             0          in progress        0:19:58
       dhcp46-101.lab.eng.blr.redhat.com             9949       119.4MB         16739             0             0          in progress        0:19:58
Estimated time left for rebalance to complete :        0:47:25
volume rebalance: nrep2: success

[root@dhcp46-42 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            16839       151.7MB         28114             0             0          in progress        0:33:23
       dhcp46-101.lab.eng.blr.redhat.com            16754       184.3MB         27528             0             0          in progress        0:33:23
Estimated time left for rebalance to complete :        0:00:48
volume rebalance: nrep2: success



[root@dhcp46-42 ~]# 
[root@dhcp46-42 ~]# 
[root@dhcp46-42 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            20687       192.2MB         32058             0             0          in progress        0:39:16
       dhcp46-101.lab.eng.blr.redhat.com            20965       189.6MB         32669             0             0          in progress        0:39:16
Estimated time left for rebalance to complete :        0:00:06
volume rebalance: nrep2: success
[root@dhcp46-42 ~]# 

============== SEE FROM BELOW ==================

[root@dhcp46-42 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            21521       192.8MB         33069             0             0          in progress        0:40:28
       dhcp46-101.lab.eng.blr.redhat.com            22456       189.6MB         35708             0             0          in progress        0:40:28
The estimated time for rebalance to complete will be unavailable for the first 10 minutes.
volume rebalance: nrep2: success
[root@dhcp46-42 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            21669       192.8MB         33372             0             0          in progress        0:40:36
       dhcp46-101.lab.eng.blr.redhat.com            22614       189.6MB         35708             0             0          in progress        0:40:36
The estimated time for rebalance to complete will be unavailable for the first 10 minutes.
volume rebalance: nrep2: success
[root@dhcp46-42 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            21718       192.8MB         33372             0             0          in progress        0:40:40
       dhcp46-101.lab.eng.blr.redhat.com            22667       189.6MB         36020             0             0          in progress        0:40:40
The estimated time for rebalance to complete will be unavailable for the first 10 minutes.
volume rebalance: nrep2: success
[root@dhcp46-42 ~]# 
[root@dhcp46-42 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            23842       194.1MB         37488             0             0          in progress        0:43:47
       dhcp46-101.lab.eng.blr.redhat.com            23440       285.5MB         39635             0             0            completed        0:43:29
The estimated time for rebalance to complete will be unavailable for the first 10 minutes.
volume rebalance: nrep2: success


Version-Release number of selected component (if applicable):
[root@dhcp46-42 ~]# rpm -qa|grep gluster
glusterfs-api-3.8.4-38.el7rhgs.x86_64
python-gluster-3.8.4-34.el7rhgs.noarch
glusterfs-server-3.8.4-38.el7rhgs.x86_64
gluster-nagios-addons-0.2.9-1.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.4-16.el7rhgs.x86_64
glusterfs-3.8.4-38.el7rhgs.x86_64
glusterfs-cli-3.8.4-38.el7rhgs.x86_64
glusterfs-rdma-3.8.4-38.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.2.x86_64
vdsm-gluster-4.17.33-1.2.el7rhgs.noarch
glusterfs-libs-3.8.4-38.el7rhgs.x86_64
glusterfs-fuse-3.8.4-38.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-38.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-38.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-38.el7rhgs.x86_64





Steps to Reproduce:
1.had a 1x2 volume add-brick to convert 2x2 and rebalance was done(with some files skipped)
2.did linux untar from one client, lookups from another client(going on till end)
rename,move,chmod,chgrp from another client , but for only sometime, that too these operations were complete much before the rebalance was at this state.

3.observed rebalance eta 

Actual results:
==========
again eta starts to show the initial 10 min wait message

Comment 2 nchilaka 2017-08-08 14:52:04 UTC
rebalance at end 
[root@dhcp46-42 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            23842       194.1MB         37488             0             0            completed        0:44:21
       dhcp46-101.lab.eng.blr.redhat.com            23440       285.5MB         39635             0             0            completed        0:43:29
volume rebalance: nrep2: success

Comment 3 Nithya Balachandran 2017-08-08 17:29:11 UTC
Is this reproducible?

Comment 6 nchilaka 2018-04-05 09:19:14 UTC
Prasad, can you check this as part of your testing(comment#3, ie if this is reproducible)

Comment 17 Prasad Desala 2018-12-05 12:44:33 UTC
Verified this BZ on glusterfs version 3.12.2-30. Followed the same steps as in the description, rebalance ETA displayed as expected.

Moving this BZ to Verified.

Comment 18 errata-xmlrpc 2018-12-17 17:07:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3827