Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1060138 - [RHSC] - Rebalance icon in the activities column chnages to unknown (?)
Summary: [RHSC] - Rebalance icon in the activities column chnages to unknown (?)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: rhsc
Version: 2.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Kanagaraj
QA Contact: RamaKasturi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-01-31 10:58 UTC by RamaKasturi
Modified: 2016-04-18 10:06 UTC (History)
11 users (show)

Fixed In Version: CB17
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1060994 (view as bug list)
Environment:
Last Closed: 2014-02-25 08:15:54 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:0208 normal SHIPPED_LIVE Red Hat Storage 2.1 enhancement and bug fix update #2 2014-02-25 12:20:30 UTC
oVirt gerrit 23936 None None None Never

Description RamaKasturi 2014-01-31 10:58:57 UTC
Description of problem:
Rebalance icon in the activities column changes to unknown when rebalance is started or if glusterd is made up and down during rebalance process.

Version-Release number of selected component (if applicable):
glusterfs-server-3.4.0.58rhs-1.el6rhs.x86_64
glusterfs-server-3.4.0.58rhs-1.el6rhs.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Login to rhs console.
2. create 2 distribute and 2 distribute replicate volumes.
3. Now start rebalance on the volumes.


Actual results:
after a few sec user can see that rebalance icon in the activities column changes to unknown (?) with an event message saying "Could not find information for rebalance on volume <volName> of Cluster cluster_test_setup from CLI. Marking it as unknown

Expected results:
Rebalance icon should not be changed to "?" and it should run sucessfully.

Additional info:

Comment 1 RamaKasturi 2014-01-31 11:04:50 UTC
Another way of reproducing the issue is:

1) start rebalance on all the volumes.

2) While rebalance is going on stop glusterd in one of the node.

3) Now bring back the glusterd from where it was stopped.


Actual Results :

After a few sec user can see that rebalance icon in the activities column changes to unknown (?) with an event message saying "Could not find information for rebalance on volume <volName> of Cluster cluster_test_setup from CLI. Marking it as unknown"

Expected Results :

Rebalance icon should not be changed to "?" and it should run successfully.

Comment 3 RamaKasturi 2014-01-31 11:50:25 UTC
Attaching the sos reports for the same.

http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/rhsc/1060138/

Comment 4 Kanagaraj 2014-01-31 11:59:07 UTC
Following error is seen in vdsm.log

Thread-74940::ERROR::2014-01-31 16:17:32,771::BindingXMLRPC::1000::vds::(wrapper) vdsm exception occured
Traceback (most recent call last):
  File "/usr/share/vdsm/BindingXMLRPC.py", line 989, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/gluster/api.py", line 53, in wrapper
    rv = func(*args, **kwargs)
  File "/usr/share/vdsm/gluster/api.py", line 295, in tasksList
    status = self.svdsmProxy.glusterTasksList(taskIds)
  File "/usr/share/vdsm/supervdsm.py", line 50, in __call__
    return callMethod()
  File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda>
    **kwargs)
  File "<string>", line 2, in glusterTasksList
  File "/usr/lib64/python2.6/multiprocessing/managers.py", line 740, in _callmethod
    raise convert_to_error(kind, result)
GlusterCmdExecFailedException: Command execution failed
return code: 2

glusterd.log
[2014-01-31 11:13:00.716556] E [glusterd-syncop.c:747:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2014-01-31 11:13:00.716611] E [glusterd-syncop.c:747:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2014-01-31 11:13:00.716633] E [glusterd-syncop.c:747:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2014-01-31 11:13:00.716648] E [glusterd-syncop.c:747:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2014-01-31 11:13:00.716677] E [glusterd-syncop.c:747:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2014-01-31 11:13:00.716708] E [glusterd-syncop.c:747:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2014-01-31 11:13:00.716742] E [glusterd-syncop.c:747:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2014-01-31 11:13:00.716757] E [glusterd-syncop.c:747:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2014-01-31 11:13:00.716785] E [glusterd-syncop.c:747:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2014-01-31 11:13:00.716828] E [glusterd-syncop.c:747:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2014-01-31 11:13:00.716861] E [glusterd-syncop.c:747:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2014-01-31 11:13:00.716879] E [glusterd-syncop.c:747:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2014-01-31 11:13:00.716916] E [glusterd-syncop.c:747:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2014-01-31 11:13:00.716959] E [glusterd-syncop.c:747:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2014-01-31 11:13:00.716975] E [glusterd-syncop.c:747:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from  node/brick
[2014-01-31 11:13:00.719175] I [glusterd-handler.c:3498:__glusterd_handle_status_volume] 0-management: Received status volume req for volume vol_dis_rep_2bricks_16hosts
[2014-01-31 11:13:00.724860] I [glusterd-handler.c:3498:__glusterd_handle_status_volume] 0-management: Received status volume req for volume vol_dis_1brick_16hosts
[2014-01-31 11:13:00.730196] I [glusterd-handler.c:3498:__glusterd_handle_status_volume] 0-management: Received status volume req for volume vol_dis_1brick_8hosts


engine.log

2014-01-31 05:47:15,161 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterTasksListVDSCommand] (DefaultQuartzScheduler_Worker-52) START, GlusterTasksListVDSCommand(HostName = 10.16.159.164, HostId = f705263a-b85d-4d24-be12-82fa3f4e158f), log id: 21d9e39e

2014-01-31 05:47:32,773 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterTasksListVDSCommand] (DefaultQuartzScheduler_Worker-52) FINISH, GlusterTasksListVDSCommand, log id: 21d9e39e
2014-01-31 05:47:32,773 ERROR [org.ovirt.engine.core.bll.gluster.tasks.GlusterTasksService] (DefaultQuartzScheduler_Worker-52) org.ovirt.engine.core.common.errors.VDSError@5f2f0482
2014-01-31 05:47:32,773 ERROR [org.ovirt.engine.core.bll.gluster.GlusterTasksSyncJob] (DefaultQuartzScheduler_Worker-52) Error updating tasks from CLI: org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException: Command execution failed
return code: 2 (Failed with error GlusterVolumeStatusAllFailedException and code 4161)
        at org.ovirt.engine.core.bll.gluster.tasks.GlusterTasksService.getTaskListForCluster(GlusterTasksService.java:43) [bll.jar:]
        at org.ovirt.engine.core.bll.gluster.GlusterTasksSyncJob.updateGlusterAsyncTasks(GlusterTasksSyncJob.java:84) [bll.jar:]
        at sun.reflect.GeneratedMethodAccessor77.invoke(Unknown Source) [:1.7.0_51]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_51]
        at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_51]
        at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:60) [scheduler.jar:]
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) [quartz.jar:]

2014-01-31 05:47:32,795 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-52) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Could not find information for rebalance on volume vol_dis_1brick_8hosts of Cluster cluster_test_setup from CLI. Marking it as unknown.

Comment 5 Kanagaraj 2014-01-31 12:02:46 UTC
Apart from the failure in glusterd,
When the engine unable to fetch taskList due to some failures in the node(the above case), it is not supposed to clear the task information.

Comment 6 RamaKasturi 2014-01-31 13:22:25 UTC
1) Added a brick to 2 distribute and 2 distribute replicate volumes, and started rebalance.

2) Now rebalance starts on all the volumes except one.

3) In the event message it says, could not gluster volume.

This is what i see in engine log.


2014-01-31 07:20:49,444 ERROR [org.ovirt.engine.core.vdsbroker.gluster.StartRebalanceGlusterVolumeVDSCommand] (pool-4-thread-38) [77fa27db] Command StartRebalanceGlusterVolumeVDS execution failed. Exception: VDSNetworkException: org.apache.xmlrpc.XmlRpcException: <type 'exceptions.TypeError'>:sequence item 0: expected string, NoneType found
2014-01-31 07:20:49,444 INFO  [org.ovirt.engine.core.vdsbroker.gluster.StartRebalanceGlusterVolumeVDSCommand] (pool-4-thread-38) [77fa27db] FINISH, StartRebalanceGlusterVolumeVDSCommand, log id: 16105a0f
2014-01-31 07:20:49,444 ERROR [org.ovirt.engine.core.bll.gluster.StartRebalanceGlusterVolumeCommand] (pool-4-thread-38) [77fa27db] Command org.ovirt.engine.core.bll.gluster.StartRebalanceGlusterVolumeCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: org.apache.xmlrpc.XmlRpcException: <type 'exceptions.TypeError'>:sequence item 0: expected string, NoneType found (Failed with error VDS_NETWORK_ERROR and code 5022)
2014-01-31 07:20:49,449 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-4-thread-38) [77fa27db] Correlation ID: 77fa27db, Job ID: c30b19a3-ec5c-402f-95f8-f5ae264befb9, Call Stack: null, Custom Event ID: -1, Message: Could not start Gluster Volume vol_dis_rep_2bricks_16hosts rebalance.
2014-01-31 07:20:49,453 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler_Worker-86) FINISH, GlusterServersListVDSCommand, return: [10.16.159.67:CONNECTED, 10.16.159.101:CONNECTED, 10.16.159.106:CONNECTED, 10.16.159.14:CONNECTED, 10.16.159.95:CONNECTED, 10.16.159.123:CONNECTED, 10.16.159.45:CONNECTED, 10.16.159.200:CONNECTED, 10.16.159.118:CONNECTED, 10.16.159.56:CONNECTED, 10.16.159.23:CONNECTED, 10.16.159.164:CONNECTED, 10.16.159.98:CONNECTED, 10.16.159.51:CONNECTED, 10.16.159.7:CONNECTED, 10.16.159.43:CONNECTED], log id: 7eb07627
2014-01-31 07:20:49,456 INFO  [org.ovirt.engine.core.bll.gluster.StartRebalanceGlusterVolumeCommand] (pool-4-thread-38) [77fa27db] Lock freed to object EngineLock [exclusiveLocks= key: 7b644b0e-627f-4dd4-93b8-8c9b099d02ef value: GLUSTER
, sharedLocks= ]
2014-01-31 07:20:49,465 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler_Worker-86) START, GlusterVolumesListVDSCommand(HostName = 10.16.159.67, HostId = 13440d27-f08f-4ad6-b6ed-b64962c0a70a), log id: 20f2a732
2014-01-31 07:20:49,468 INFO  [org.ovirt.engine.core.bll.gluster.StartRebalanceGlusterVolumeCommand] (pool-4-thread-38) [484ed866] Running command: StartRebalanceGlusterVolumeCommand internal: false. Entities affected :  ID: 4f1509d6-74d3-4957-9c24-99bb2a538abc Type: GlusterVolume
2014-01-31 07:20:49,472 INFO  [org.ovirt.engine.core.vdsbroker.gluster.StartRebalanceGlusterVolumeVDSCommand] (pool-4-thread-38) [484ed866] START, StartRebalanceGlusterVolumeVDSCommand(HostName = 10.16.159.32, HostId = 18983482-2c0a-413f-85ba-3f2b9c8a68ec), log id: 6cc6560e
2014-01-31 07:20:49,669 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler_Worker-86) FINISH, GlusterVolumesListVDSCommand, return: {7b644b0e-627f-4dd4-93b8-8c9b099d02ef=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@6603aa9e, 108d1300-bd62-4d24-82d8-13b15b590da4=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@6c592578, 8854d24f-aebc-4f9f-8968-d42b8c40602d=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@5fbadb47}, log id: 20f2a732
2014-01-31 07:20:54,704 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler_Worker-68) [8e51e5a] START, GlusterServersListVDSCommand(HostName = 10.16.159.32, HostId = 18983482-2c0a-413f-85ba-3f2b9c8a68ec), log id: 63b16e34
2014-01-31 07:20:56,850 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler_Worker-68) [8e51e5a] FINISH, GlusterServersListVDSCommand, return: [10.16.159.32:CONNECTED, 10.16.159.125:CONNECTED, 10.16.159.128:CONNECTED, 10.16.159.190:CONNECTED], log id: 63b16e34
2014-01-31 07:20:56,854 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler_Worker-68) [8e51e5a] START, GlusterVolumesListVDSCommand(HostName = 10.16.159.32, HostId = 18983482-2c0a-413f-85ba-3f2b9c8a68ec), log id: 3d7481af
2014-01-31 07:20:57,013 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler_Worker-68) [8e51e5a] FINISH, GlusterVolumesListVDSCommand, return: {4f1509d6-74d3-4957-9c24-99bb2a538abc=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@bc46912d}, log id: 3d7481af
2014-01-31 07:20:57,035 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler_Worker-68) [8e51e5a] START, GlusterServersListVDSCommand(HostName = 10.16.159.67, HostId = 13440d27-f08f-4ad6-b6ed-b64962c0a70a), log id: 3e6d6bf2

Comment 8 RamaKasturi 2014-01-31 13:32:29 UTC
Seen the following behaviour as well.

After few minutes the unknown ? icon in the volume activities changes to "rebalance running / completed " and again changes back to "?"

Comment 9 Kanagaraj 2014-02-03 12:08:22 UTC
Patch sent to fix the issue in engine - http://gerrit.ovirt.org/#/c/23936/
Now engine will not change the task to UNKNOWN (? mark) if there is a failure in fetching the task list.

No updates yet on the glusterfs issue.

Comment 10 Sahina Bose 2014-02-03 12:20:12 UTC
On node 10.16.159.95 - rebalance of vol_dis_rep_2bricks_16hosts results in:

MainProcess|Thread-76324::DEBUG::2014-01-31 17:50:49,300::utils::509::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-76324::ERROR::2014-01-31 17:50:49,300::supervdsmServer::99::SuperVdsm.ServerCallback::(wrapper) Error in wrapper
Traceback (most recent call last):
  File "/usr/share/vdsm/supervdsmServer", line 97, in wrapper
    res = func(*args, **kwargs)
  File "/usr/share/vdsm/supervdsmServer", line 367, in wrapper
    return func(*args, **kwargs)
  File "/usr/share/vdsm/gluster/__init__.py", line 31, in wrapper
    return func(*args, **kwargs)
  File "/usr/share/vdsm/gluster/cli.py", line 595, in volumeRebalanceStart
    err=e.err)
GlusterVolumeRebalanceStartFailedException: <unprintable GlusterVolumeRebalanceStartFailedException object>

Comment 12 RamaKasturi 2014-02-10 10:00:58 UTC
Verified and works fine with the following builds:

RHSC - rhsc-2.1.2-0.36.el6rhs.noarch

Glusterfs - glusterfs-server-3.4.0.59rhs-1.el6rhs.x86_64

vdsm - vdsm-4.13.0-24.el6rhs.x86_64

Ran the following tests:

1) created two distribute and distribute replicate volumes

2) started rebalance on all the volumes.

Rebalance icon does not change to "?" after few secs.

While rebalance runs on all the volumes, stopped glusterd in one of the node and started it again.

rebalance icon does not change to "?" after few secs. 

If rebalance is completed on a volume, rebalance icon changes to running state if there is some rebalance running on the node where glusterd was stopped and started.

Comment 14 errata-xmlrpc 2014-02-25 08:15:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0208.html


Note You need to log in before you can comment on or make changes to this bug.