Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1516876 - Rebalance panel status in Grafana
Summary: Rebalance panel status in Grafana
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: web-admin-tendrl-monitoring-integration
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Darshan
QA Contact: Lubos Trilety
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-23 13:24 UTC by Lubos Trilety
Modified: 2017-12-18 04:37 UTC (History)
7 users (show)

Fixed In Version: tendrl-monitoring-integration-1.5.4-11.el7rhgs.noarch tendrl-gluster-integration-1.5.4-14.el7rhgs.noarch
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-18 04:37:58 UTC
Target Upstream Version:


Attachments (Terms of Use)
Rebalance status as completed for the volume ssssssh(arbiter volume) (deleted)
2017-12-06 09:25 UTC, Bala Konda Reddy M
no flags Details
Rebalance status as not started for the volume arbiter which is just started (arbiter volume) (deleted)
2017-12-06 12:50 UTC, Bala Konda Reddy M
no flags Details
rebalance status (deleted)
2017-12-06 13:14 UTC, Lubos Trilety
no flags Details
On the volumes tab, I am able to see NA and Not started respectively (deleted)
2017-12-06 13:51 UTC, Bala Konda Reddy M
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:3478 normal SHIPPED_LIVE RHGS Web Administration packages 2017-12-18 09:34:49 UTC
Github https://github.com/Tendrl gluster-integration issues 497 None None None 2017-11-23 14:39:30 UTC
Github https://github.com/Tendrl monitoring-integration issues 283 None None None 2017-11-23 14:39:09 UTC

Description Lubos Trilety 2017-11-23 13:24:50 UTC
Description of problem:
Rebalance panel displays NA when rebalance is run manually from CLI

# gluster volume rebalance <volume_name> start
volume rebalance: <volume_name>: success: Rebalance on <volume_name> has been started successfully. Use rebalance status command to check status of the rebalance process.
ID: b7ab4d78-9e2a-4b44-a456-9a4e1e20440f

# gluster volume rebalance <volume_name> status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             7             0             0            completed        0:00:01
<hostname1>                0        0Bytes             5             0             0            completed        0:00:01
...

The same NA status is displayed on volume details page on RHGSWA UI.


Version-Release number of selected component (if applicable):
tendrl-ansible-1.5.4-1.el7rhgs.noarch
tendrl-ui-1.5.4-4.el7rhgs.noarch
tendrl-grafana-plugins-1.5.4-5.el7rhgs.noarch
tendrl-selinux-1.5.3-2.el7rhgs.noarch
tendrl-commons-1.5.4-4.el7rhgs.noarch
tendrl-api-1.5.4-2.el7rhgs.noarch
tendrl-api-httpd-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.5.4-5.el7rhgs.noarch
tendrl-grafana-selinux-1.5.3-2.el7rhgs.noarch
tendrl-node-agent-1.5.4-5.el7rhgs.noarch
tendrl-notifier-1.5.4-3.el7rhgs.noarch

How reproducible:
100%

Steps to Reproduce:
1. Create volume (e.g. arbiter volume)
2. Start rebalance
3. Check Rebalance panel in Grafana, and rebalance status on Volume Details page

Actual results:
Rebalance panel shows NA as last rebalance status, the same is displayed on Volume Details page

Expected results:
Rebalance status should correspond to the status get by
gluster volume rebalance <volume_name> status

Additional info:
Rebalance takes no time as there are no data or very few data loaded.

Comment 3 Lubos Trilety 2017-11-28 13:29:44 UTC
Tested with:
tendrl-gluster-integration-1.5.4-6.el7rhgs.noarch
tendrl-monitoring-integration-1.5.4-8.el7rhgs.noarch

It was not working for me. There's still NA after more than 20 minutes of waiting.

Comment 7 Lubos Trilety 2017-12-05 14:36:58 UTC
Tested with:
tendrl-gluster-integration-1.5.4-8.el7rhgs.noarch
tendrl-monitoring-integration-1.5.4-11.el7rhgs.noarch

It was not working for me if the volume was arbiter. Interestingly when the volume is disperse RHGSWA shows the status as it should.

Comment 8 Nishanth Thomas 2017-12-05 18:48:15 UTC
We want to have a look at your setup.
Can you run 'gluster volume info' on the cluster where you are trying to test this scenario and paste the output here.

What I think is you are trying to run rebalance on an invalid volume type. Just wanted to confirm that before we do any debugging.

Comment 9 Lubos Trilety 2017-12-06 08:37:29 UTC
(In reply to Nishanth Thomas from comment #8)
> We want to have a look at your setup.
> Can you run 'gluster volume info' on the cluster where you are trying to
> test this scenario and paste the output here.
> 
> What I think is you are trying to run rebalance on an invalid volume type.
> Just wanted to confirm that before we do any debugging.

OK, makes sense. Here's what I get:
# gluster volume info
 
Volume Name: volume_beta_arbiter_2_plus_1x2
Type: Distributed-Replicate
Volume ID: 30fc5ce2-8c10-4d28-b7f9-8a3126ef5ff8
Status: Started
Snapshot Count: 0
Number of Bricks: 6 x (2 + 1) = 18
Transport-type: tcp
Bricks:
Brick1: <hostname1>:/mnt/brick_beta_arbiter_1/1
Brick2: <hostname2>:/mnt/brick_beta_arbiter_1/1
Brick3: Mhostname3>:/mnt/brick_beta_arbiter_1/1 (arbiter)
Brick4: <hostname4>:/mnt/brick_beta_arbiter_1/1
Brick5: <hostname5>:/mnt/brick_beta_arbiter_1/1
Brick6: <hostname6>:/mnt/brick_beta_arbiter_1/1 (arbiter)
Brick7: <hostname1>:/mnt/brick_beta_arbiter_2/2
Brick8: <hostname2>:/mnt/brick_beta_arbiter_2/2
Brick9: Mhostname3>:/mnt/brick_beta_arbiter_2/2 (arbiter)
Brick10: <hostname4>:/mnt/brick_beta_arbiter_2/2
Brick11: <hostname5>:/mnt/brick_beta_arbiter_2/2
Brick12: <hostname6>:/mnt/brick_beta_arbiter_2/2 (arbiter)
Brick13: <hostname1>:/mnt/brick_beta_arbiter_3/3
Brick14: <hostname2>:/mnt/brick_beta_arbiter_3/3
Brick15: Mhostname3>:/mnt/brick_beta_arbiter_3/3 (arbiter)
Brick16: <hostname4>:/mnt/brick_beta_arbiter_3/3
Brick17: <hostname5>:/mnt/brick_beta_arbiter_3/3
Brick18: <hostname6>:/mnt/brick_beta_arbiter_3/3 (arbiter)
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
transport.address-family: inet
nfs.disable: on

Comment 10 Bala Konda Reddy M 2017-12-06 09:24:20 UTC
(In reply to Lubos Trilety from comment #7)
> Tested with:
> tendrl-gluster-integration-1.5.4-8.el7rhgs.noarch
> tendrl-monitoring-integration-1.5.4-11.el7rhgs.noarch
> 
> It was not working for me if the volume was arbiter. Interestingly when the
> volume is disperse RHGSWA shows the status as it should.

Lubos I tried the same scenario on the arbiter volume.

gluster vol info ssssssh
 
Volume Name: ssssssh
Type: Distributed-Replicate
Volume ID: 66c88f57-5a05-4b15-aeb6-0412b225cf8e
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: dhcp42-119.lab.eng.blr.redhat.com:/gluster/brick10/first
Brick2: dhcp42-129.lab.eng.blr.redhat.com:/gluster/brick10/second
Brick3: dhcp42-127.lab.eng.blr.redhat.com:/gluster/brick10/third (arbiter)
Brick4: dhcp42-125.lab.eng.blr.redhat.com:/gluster/brick10/fourth
Brick5: dhcp42-129.lab.eng.blr.redhat.com:/gluster/brick3/fifth
Brick6: dhcp42-127.lab.eng.blr.redhat.com:/gluster/brick3/sixth (arbiter)
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
transport.address-family: inet
nfs.disable: on
cluster.enable-shared-storage: disable
nfs-ganesha: disable


I am able to see the information on grafana dashboard for rebalance.

Steps I performed:
1. Created arbiter volume with 6 bricks.2*(2+1)
2. Started the rebalance on one of the storage node.
3. I am able to see the information on the dashboard for rebalance

Am I missing something here.

Please find the attachment helpful

Comment 11 Bala Konda Reddy M 2017-12-06 09:25:14 UTC
Created attachment 1363571 [details]
Rebalance status as completed for the volume ssssssh(arbiter volume)

Comment 12 Lubos Trilety 2017-12-06 09:52:14 UTC
Hmm, it seems it doesn't matter on the volume type then, but on the speed of rebalance action. Mine was very quick as there were no data. I completely did not see 'In Progress' state. This state I could see, when I tried the same scenario with disperse volume.

# gluster volume rebalance volume_beta_arbiter_2_plus_1x2 status
       Node Rebalanced-files          size   ...       status  run time in h:m:s
  ---------      -----------   -----------   ... ------------     --------------
  localhost                0        0Bytes   ...    completed        0:00:01
<hostname1>                0        0Bytes   ...    completed        0:00:01
<hostname2>                0        0Bytes   ...    completed        0:00:01
<hostname4>                0        0Bytes   ...    completed        0:00:01
<hostname5>                0        0Bytes   ...    completed        0:00:01
<hostname6>                0        0Bytes   ...    completed        0:00:01

Comment 13 Nishanth Thomas 2017-12-06 10:07:01 UTC
Based on https://bugzilla.redhat.com/show_bug.cgi?id=1516876#c10, moving the bug back to ON_QA. In development setup as well we see the results as expected.

Also make sure that the the setup meets the requirements specified at https://github.com/Tendrl/documentation/wiki/Tendrl-release-v1.5.4-(install-guide)#tendrl-server-system-requirements

Comment 14 Lubos Trilety 2017-12-06 11:01:48 UTC
Bala do you create the volume before RHGSWA install or after? I have the volume present in gluster before RHGSWA install. BTW I checked it and found that rebalance status is different from beginning. When I had disperse volume prepared the status was 'Not started' when I had arbiter volume prepared the status was 'NA'.

Comment 15 Bala Konda Reddy M 2017-12-06 12:48:30 UTC
Lubos i created volume after RHGSWA install. I haven't tried the scenario which you mentioned and i feel like it doesn't make any different. correct me if i am wrong.

I am able to see information like Not started on the rebalance panel.

Please find the attachment helpful.

Comment 16 Bala Konda Reddy M 2017-12-06 12:50:24 UTC
Created attachment 1363649 [details]
Rebalance status as not started for the volume arbiter which is just started (arbiter volume)

Comment 17 Lubos Trilety 2017-12-06 13:10:18 UTC
(In reply to Bala Konda Reddy M from comment #15)
> Lubos i created volume after RHGSWA install. I haven't tried the scenario
> which you mentioned and i feel like it doesn't make any different. correct
> me if i am wrong.
> 
> I am able to see information like Not started on the rebalance panel.
> 
> Please find the attachment helpful.

I thought so too, but when I created a new arbiter volume the rebalance status is correct. That said it makes a difference if it is created before or after.

Comment 18 Lubos Trilety 2017-12-06 13:14:20 UTC
Created attachment 1363662 [details]
rebalance status

volume_beta_arbiter_2_plus_1x2 arbiter volume created before RHGSWA is installed
volume_gamma_arbiter_2_plus_1x2 arbiter volume created after RHGSWA is installed and cluster imported

Comment 20 Bala Konda Reddy M 2017-12-06 13:51:00 UTC
Created attachment 1363680 [details]
On the volumes tab, I am able to see NA and Not started respectively

Comment 23 Nishanth Thomas 2017-12-12 02:00:01 UTC
Please test with the latest builds.

Comment 24 Lubos Trilety 2017-12-12 08:20:20 UTC
Tested with:
tendrl-gluster-integration-1.5.4-14.el7rhgs.noarch
tendrl-monitoring-integration-1.5.4-14.el7rhgs.noarch

Working properly, rebalance status is displayed on Grafana dashboard.

Comment 26 errata-xmlrpc 2017-12-18 04:37:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3478


Note You need to log in before you can comment on or make changes to this bug.