|Summary:||Rebalance panel status in Grafana|
|Product:||Red Hat Gluster Storage||Reporter:||Lubos Trilety <ltrilety>|
|Status:||CLOSED ERRATA||QA Contact:||Lubos Trilety <ltrilety>|
|Version:||rhgs-3.3||CC:||anbehl, bmekala, ltrilety, nthomas, rhs-bugs, sanandpa, sankarshan|
|Fixed In Version:||tendrl-monitoring-integration-1.5.4-11.el7rhgs.noarch tendrl-gluster-integration-1.5.4-14.el7rhgs.noarch||Doc Type:||If docs needed, set a value|
|Doc Text:||Story Points:||---|
|Last Closed:||2017-12-18 04:37:58 UTC||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
Description Lubos Trilety 2017-11-23 13:24:50 UTC
Description of problem: Rebalance panel displays NA when rebalance is run manually from CLI # gluster volume rebalance <volume_name> start volume rebalance: <volume_name>: success: Rebalance on <volume_name> has been started successfully. Use rebalance status command to check status of the rebalance process. ID: b7ab4d78-9e2a-4b44-a456-9a4e1e20440f # gluster volume rebalance <volume_name> status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 7 0 0 completed 0:00:01 <hostname1> 0 0Bytes 5 0 0 completed 0:00:01 ... The same NA status is displayed on volume details page on RHGSWA UI. Version-Release number of selected component (if applicable): tendrl-ansible-1.5.4-1.el7rhgs.noarch tendrl-ui-1.5.4-4.el7rhgs.noarch tendrl-grafana-plugins-1.5.4-5.el7rhgs.noarch tendrl-selinux-1.5.3-2.el7rhgs.noarch tendrl-commons-1.5.4-4.el7rhgs.noarch tendrl-api-1.5.4-2.el7rhgs.noarch tendrl-api-httpd-1.5.4-2.el7rhgs.noarch tendrl-monitoring-integration-1.5.4-5.el7rhgs.noarch tendrl-grafana-selinux-1.5.3-2.el7rhgs.noarch tendrl-node-agent-1.5.4-5.el7rhgs.noarch tendrl-notifier-1.5.4-3.el7rhgs.noarch How reproducible: 100% Steps to Reproduce: 1. Create volume (e.g. arbiter volume) 2. Start rebalance 3. Check Rebalance panel in Grafana, and rebalance status on Volume Details page Actual results: Rebalance panel shows NA as last rebalance status, the same is displayed on Volume Details page Expected results: Rebalance status should correspond to the status get by gluster volume rebalance <volume_name> status Additional info: Rebalance takes no time as there are no data or very few data loaded.
Comment 3 Lubos Trilety 2017-11-28 13:29:44 UTC
Tested with: tendrl-gluster-integration-1.5.4-6.el7rhgs.noarch tendrl-monitoring-integration-1.5.4-8.el7rhgs.noarch It was not working for me. There's still NA after more than 20 minutes of waiting.
Comment 7 Lubos Trilety 2017-12-05 14:36:58 UTC
Tested with: tendrl-gluster-integration-1.5.4-8.el7rhgs.noarch tendrl-monitoring-integration-1.5.4-11.el7rhgs.noarch It was not working for me if the volume was arbiter. Interestingly when the volume is disperse RHGSWA shows the status as it should.
Comment 8 Nishanth Thomas 2017-12-05 18:48:15 UTC
We want to have a look at your setup. Can you run 'gluster volume info' on the cluster where you are trying to test this scenario and paste the output here. What I think is you are trying to run rebalance on an invalid volume type. Just wanted to confirm that before we do any debugging.
Comment 9 Lubos Trilety 2017-12-06 08:37:29 UTC
(In reply to Nishanth Thomas from comment #8) > We want to have a look at your setup. > Can you run 'gluster volume info' on the cluster where you are trying to > test this scenario and paste the output here. > > What I think is you are trying to run rebalance on an invalid volume type. > Just wanted to confirm that before we do any debugging. OK, makes sense. Here's what I get: # gluster volume info Volume Name: volume_beta_arbiter_2_plus_1x2 Type: Distributed-Replicate Volume ID: 30fc5ce2-8c10-4d28-b7f9-8a3126ef5ff8 Status: Started Snapshot Count: 0 Number of Bricks: 6 x (2 + 1) = 18 Transport-type: tcp Bricks: Brick1: <hostname1>:/mnt/brick_beta_arbiter_1/1 Brick2: <hostname2>:/mnt/brick_beta_arbiter_1/1 Brick3: Mhostname3>:/mnt/brick_beta_arbiter_1/1 (arbiter) Brick4: <hostname4>:/mnt/brick_beta_arbiter_1/1 Brick5: <hostname5>:/mnt/brick_beta_arbiter_1/1 Brick6: <hostname6>:/mnt/brick_beta_arbiter_1/1 (arbiter) Brick7: <hostname1>:/mnt/brick_beta_arbiter_2/2 Brick8: <hostname2>:/mnt/brick_beta_arbiter_2/2 Brick9: Mhostname3>:/mnt/brick_beta_arbiter_2/2 (arbiter) Brick10: <hostname4>:/mnt/brick_beta_arbiter_2/2 Brick11: <hostname5>:/mnt/brick_beta_arbiter_2/2 Brick12: <hostname6>:/mnt/brick_beta_arbiter_2/2 (arbiter) Brick13: <hostname1>:/mnt/brick_beta_arbiter_3/3 Brick14: <hostname2>:/mnt/brick_beta_arbiter_3/3 Brick15: Mhostname3>:/mnt/brick_beta_arbiter_3/3 (arbiter) Brick16: <hostname4>:/mnt/brick_beta_arbiter_3/3 Brick17: <hostname5>:/mnt/brick_beta_arbiter_3/3 Brick18: <hostname6>:/mnt/brick_beta_arbiter_3/3 (arbiter) Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on transport.address-family: inet nfs.disable: on
Comment 10 Bala Konda Reddy M 2017-12-06 09:24:20 UTC
(In reply to Lubos Trilety from comment #7) > Tested with: > tendrl-gluster-integration-1.5.4-8.el7rhgs.noarch > tendrl-monitoring-integration-1.5.4-11.el7rhgs.noarch > > It was not working for me if the volume was arbiter. Interestingly when the > volume is disperse RHGSWA shows the status as it should. Lubos I tried the same scenario on the arbiter volume. gluster vol info ssssssh Volume Name: ssssssh Type: Distributed-Replicate Volume ID: 66c88f57-5a05-4b15-aeb6-0412b225cf8e Status: Started Snapshot Count: 0 Number of Bricks: 2 x (2 + 1) = 6 Transport-type: tcp Bricks: Brick1: dhcp42-119.lab.eng.blr.redhat.com:/gluster/brick10/first Brick2: dhcp42-129.lab.eng.blr.redhat.com:/gluster/brick10/second Brick3: dhcp42-127.lab.eng.blr.redhat.com:/gluster/brick10/third (arbiter) Brick4: dhcp42-125.lab.eng.blr.redhat.com:/gluster/brick10/fourth Brick5: dhcp42-129.lab.eng.blr.redhat.com:/gluster/brick3/fifth Brick6: dhcp42-127.lab.eng.blr.redhat.com:/gluster/brick3/sixth (arbiter) Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on transport.address-family: inet nfs.disable: on cluster.enable-shared-storage: disable nfs-ganesha: disable I am able to see the information on grafana dashboard for rebalance. Steps I performed: 1. Created arbiter volume with 6 bricks.2*(2+1) 2. Started the rebalance on one of the storage node. 3. I am able to see the information on the dashboard for rebalance Am I missing something here. Please find the attachment helpful
Comment 11 Bala Konda Reddy M 2017-12-06 09:25:14 UTC
Created attachment 1363571 [details] Rebalance status as completed for the volume ssssssh(arbiter volume)
Comment 12 Lubos Trilety 2017-12-06 09:52:14 UTC
Hmm, it seems it doesn't matter on the volume type then, but on the speed of rebalance action. Mine was very quick as there were no data. I completely did not see 'In Progress' state. This state I could see, when I tried the same scenario with disperse volume. # gluster volume rebalance volume_beta_arbiter_2_plus_1x2 status Node Rebalanced-files size ... status run time in h:m:s --------- ----------- ----------- ... ------------ -------------- localhost 0 0Bytes ... completed 0:00:01 <hostname1> 0 0Bytes ... completed 0:00:01 <hostname2> 0 0Bytes ... completed 0:00:01 <hostname4> 0 0Bytes ... completed 0:00:01 <hostname5> 0 0Bytes ... completed 0:00:01 <hostname6> 0 0Bytes ... completed 0:00:01
Comment 13 Nishanth Thomas 2017-12-06 10:07:01 UTC
Based on https://bugzilla.redhat.com/show_bug.cgi?id=1516876#c10, moving the bug back to ON_QA. In development setup as well we see the results as expected. Also make sure that the the setup meets the requirements specified at https://github.com/Tendrl/documentation/wiki/Tendrl-release-v1.5.4-(install-guide)#tendrl-server-system-requirements
Comment 14 Lubos Trilety 2017-12-06 11:01:48 UTC
Bala do you create the volume before RHGSWA install or after? I have the volume present in gluster before RHGSWA install. BTW I checked it and found that rebalance status is different from beginning. When I had disperse volume prepared the status was 'Not started' when I had arbiter volume prepared the status was 'NA'.
Comment 15 Bala Konda Reddy M 2017-12-06 12:48:30 UTC
Lubos i created volume after RHGSWA install. I haven't tried the scenario which you mentioned and i feel like it doesn't make any different. correct me if i am wrong. I am able to see information like Not started on the rebalance panel. Please find the attachment helpful.
Comment 16 Bala Konda Reddy M 2017-12-06 12:50:24 UTC
Created attachment 1363649 [details] Rebalance status as not started for the volume arbiter which is just started (arbiter volume)
Comment 17 Lubos Trilety 2017-12-06 13:10:18 UTC
(In reply to Bala Konda Reddy M from comment #15) > Lubos i created volume after RHGSWA install. I haven't tried the scenario > which you mentioned and i feel like it doesn't make any different. correct > me if i am wrong. > > I am able to see information like Not started on the rebalance panel. > > Please find the attachment helpful. I thought so too, but when I created a new arbiter volume the rebalance status is correct. That said it makes a difference if it is created before or after.
Comment 18 Lubos Trilety 2017-12-06 13:14:20 UTC
Created attachment 1363662 [details] rebalance status volume_beta_arbiter_2_plus_1x2 arbiter volume created before RHGSWA is installed volume_gamma_arbiter_2_plus_1x2 arbiter volume created after RHGSWA is installed and cluster imported
Comment 20 Bala Konda Reddy M 2017-12-06 13:51:00 UTC
Created attachment 1363680 [details] On the volumes tab, I am able to see NA and Not started respectively
Comment 23 Nishanth Thomas 2017-12-12 02:00:01 UTC
Please test with the latest builds.
Comment 24 Lubos Trilety 2017-12-12 08:20:20 UTC
Tested with: tendrl-gluster-integration-1.5.4-14.el7rhgs.noarch tendrl-monitoring-integration-1.5.4-14.el7rhgs.noarch Working properly, rebalance status is displayed on Grafana dashboard.
Comment 26 errata-xmlrpc 2017-12-18 04:37:58 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3478