Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1035891 - cinder: if cinder-volume is restarted during the execution of a command, the volume is left in a non-consistent status
Summary: cinder: if cinder-volume is restarted during the execution of a command, the ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-cinder
Version: 4.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: z2
: 4.0
Assignee: Flavio Percoco
QA Contact: Dafna Ron
URL:
Whiteboard:
: 1051606 1066955 (view as bug list)
Depends On: 1016224 1051605 1066955
Blocks: 977865 1016216 1051606
TreeView+ depends on / blocked
 
Reported: 2013-11-28 18:14 UTC by Dafna Ron
Modified: 2016-04-26 16:08 UTC (History)
7 users (show)

Fixed In Version: openstack-cinder-2013.2.1-6.el6ost
Doc Type: Bug Fix
Doc Text:
Prior to this update, the driver initialization check was done prior to calling the method responsible for processing the RPC request. Consequently, volumes would enter an inconsistent state ('creating' instead of 'error'), which resulted in unusable and stuck volumes. This issue has been resolved with this update, and the driver check has been moved into the method itself.
Clone Of:
: 1051606 1071184 (view as bug list)
Environment:
Last Closed: 2014-03-04 20:13:20 UTC


Attachments (Terms of Use)
logs (deleted)
2013-11-28 18:14 UTC, Dafna Ron
no flags Details
logs (deleted)
2014-02-19 11:37 UTC, Dafna Ron
no flags Details


Links
System ID Priority Status Summary Last Updated
OpenStack gerrit 61088 None None None Never
OpenStack gerrit 67097 None None None Never
Launchpad 1256064 None None None Never
Red Hat Product Errata RHBA-2014:0213 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform 4 Bug Fix and Enhancement Advisory 2014-03-05 01:11:55 UTC

Description Dafna Ron 2013-11-28 18:14:57 UTC
Created attachment 830340 [details]
logs

Description of problem:

if I extend a volume+ restart cinder-volumes service the volume state remains in "extending" while in actuality the action failed with the following error:

DriverNotInitialized: Volume driver 'GlusterfsDriver' not initialized

Version-Release number of selected component (if applicable):

openstack-cinder-2013.2-2.el6ost.noarch

How reproducible:

100%

Steps to Reproduce:
1. create a volume
2. run the following commands from two sessions simultaneously:
session1: cinder extend <volume_id> <new_size>
session2: /etc/init.d/openstack-cinder-volume restart

Actual results:

volume status will be "extending" 
there is no warning and user will have no inclination that the action failed 

Expected results:

if a service restarts there should be a state change for the volume to error_extending

Additional info:

>>>>>extend>>>>>
2013-11-28 20:01:08.210 12269 ERROR cinder.openstack.common.rpc.amqp [req-d060442d-61b0-48ce-b926-c177f7608790 f40889a38c4644fab9d91b73fcfde1bd ddbce226991a45bf97642120207897cb] Exception during message handling
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp Traceback (most recent call last):
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp   File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/amqp.py", line 441, in _process_data
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp     **args)
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp   File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/dispatcher.py", line 148, in dispatch
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp     return getattr(proxyobj, method)(ctxt, **kwargs)
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp   File "/usr/lib/python2.6/site-packages/cinder/utils.py", line 807, in wrapper
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp     raise exception.DriverNotInitialized(driver=driver_name)
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp DriverNotInitialized: Volume driver 'GlusterfsDriver' not initialized.
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp 
/var/log/cinder/volume.log (END)

Comment 1 Dafna Ron 2013-11-28 18:44:51 UTC
volume state remains in extending when we try to extend while the volume status is down: 

[root@cougar06 ~(keystone_admin)]# cinder list 
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
|                  ID                  |   Status  | Display Name | Size | Volume Type | Bootable |             Attached to              |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
| 1121404d-372f-4c9e-9f00-8fa8e253c064 |   in-use  |    small     |  2   |     None    |  false   | 8401c157-032b-47d5-bcf9-e2fc413f333c |
| 2f2abeea-44b7-43f5-9536-655077875bf5 |   in-use  |    dafna     |  6   |     None    |  false   | 4ab92098-6587-4afa-9c41-c7adf93d76e0 |
| 990d2111-240d-4184-bb76-e6a82b2fda81 | available |     user     |  3   |     None    |  false   |                                      |
| a6b9603d-1047-4965-86a0-4f15d34876b5 | available |     test     |  4   |     None    |  false   |                                      |
| ea526067-a6bd-4e58-9db2-d0e58d166022 | available |    dafna1    |  9   |     None    |  false   |                                      |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
[root@cougar06 ~(keystone_admin)]# cinder list ^C
[root@cougar06 ~(keystone_admin)]# echo '>>>>>extend>>>>>' >> /var/log/cinder/scheduler.log 
[root@cougar06 ~(keystone_admin)]# cinder extend ea526067-a6bd-4e58-9db2-d0e58d166022 11^C
[root@cougar06 ~(keystone_admin)]# /etc/init.d/openstack-cinder-volume status
openstack-cinder-volume is stopped
[root@cougar06 ~(keystone_admin)]# cinder extend ea526067-a6bd-4e58-9db2-d0e58d166022 11
[root@cougar06 ~(keystone_admin)]# cinder list 
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
|                  ID                  |   Status  | Display Name | Size | Volume Type | Bootable |             Attached to              |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
| 1121404d-372f-4c9e-9f00-8fa8e253c064 |   in-use  |    small     |  2   |     None    |  false   | 8401c157-032b-47d5-bcf9-e2fc413f333c |
| 2f2abeea-44b7-43f5-9536-655077875bf5 |   in-use  |    dafna     |  6   |     None    |  false   | 4ab92098-6587-4afa-9c41-c7adf93d76e0 |
| 990d2111-240d-4184-bb76-e6a82b2fda81 | available |     user     |  3   |     None    |  false   |                                      |
| a6b9603d-1047-4965-86a0-4f15d34876b5 | available |     test     |  4   |     None    |  false   |                                      |
| ea526067-a6bd-4e58-9db2-d0e58d166022 | extending |    dafna1    |  9   |     None    |  false   |                                      |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
[root@cougar06 ~(keystone_admin)]# less /var/log/cinder/volume.log 
[root@cougar06 ~(keystone_admin)]# 
[root@cougar06 ~(keystone_admin)]# 
[root@cougar06 ~(keystone_admin)]# cinder list 
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
|                  ID                  |   Status  | Display Name | Size | Volume Type | Bootable |             Attached to              |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
| 1121404d-372f-4c9e-9f00-8fa8e253c064 |   in-use  |    small     |  2   |     None    |  false   | 8401c157-032b-47d5-bcf9-e2fc413f333c |
| 2f2abeea-44b7-43f5-9536-655077875bf5 |   in-use  |    dafna     |  6   |     None    |  false   | 4ab92098-6587-4afa-9c41-c7adf93d76e0 |
| 990d2111-240d-4184-bb76-e6a82b2fda81 | available |     user     |  3   |     None    |  false   |                                      |
| a6b9603d-1047-4965-86a0-4f15d34876b5 | available |     test     |  4   |     None    |  false   |                                      |
| ea526067-a6bd-4e58-9db2-d0e58d166022 | extending |    dafna1    |  9   |     None    |  false   |                                      |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+


>>>>>extend>>>>>
2013-11-28 20:01:08.210 12269 ERROR cinder.openstack.common.rpc.amqp [req-d060442d-61b0-48ce-b926-c177f7608790 f40889a38c4644fab9d91b73fcfde1bd ddbce226991a45bf97642120207897cb] Exception during message handling
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp Traceback (most recent call last):
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp   File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/amqp.py", line 441, in _process_data
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp     **args)
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp   File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/dispatcher.py", line 148, in dispatch
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp     return getattr(proxyobj, method)(ctxt, **kwargs)
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp   File "/usr/lib/python2.6/site-packages/cinder/utils.py", line 807, in wrapper
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp     raise exception.DriverNotInitialized(driver=driver_name)
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp DriverNotInitialized: Volume driver 'GlusterfsDriver' not initialized.
2013-11-28 20:01:08.210 12269 TRACE cinder.openstack.common.rpc.amqp 
2013-11-28 20:27:42.585 12269 WARNING cinder.quota [req-9a315f82-53db-4e86-b2c1-9df042ea1ed5 f40889a38c4644fab9d91b73fcfde1bd ddbce226991a45bf97642120207897cb] Deprecated: Default quota for resource: gigabytes is set by the default quota flag: quota_gigabytes, it is now deprecated. Please use the the default quota class for default quota.
2013-11-28 20:28:51.112 12269 WARNING cinder.quota [req-ca0c5611-ec0f-44ea-8137-560c5134309e f40889a38c4644fab9d91b73fcfde1bd ddbce226991a45bf97642120207897cb] Deprecated: Default quota for resource: gigabytes is set by the default quota flag: quota_gigabytes, it is now deprecated. Please use the the default quota class for default quota.
2013-11-28 20:29:54.860 12269 WARNING cinder.quota [req-1614bec8-762d-4979-957d-274b7ca6ec68 f40889a38c4644fab9d91b73fcfde1bd ddbce226991a45bf97642120207897cb] Deprecated: Default quota for resource: gigabytes is set by the default quota flag: quota_gigabytes, it is now deprecated. Please use the the default quota class for default quota.
2013-11-28 20:31:18.986 12269 WARNING cinder.quota [req-59fc9d53-0d33-4304-ab3d-257dd24e8c0c f40889a38c4644fab9d91b73fcfde1bd ddbce226991a45bf97642120207897cb] Deprecated: Default quota for resource: gigabytes is set by the default quota flag: quota_gigabytes, it is now deprecated. Please use the the default quota class for default quota.
2013-11-28 20:32:19.883 12269 WARNING cinder.quota [req-18396ff0-3550-4acc-8e22-bf25feb4fe16 f40889a38c4644fab9d91b73fcfde1bd ddbce226991a45bf97642120207897cb] Deprecated: Default quota for resource: gigabytes is set by the default quota flag: quota_gigabytes, it is now deprecated. Please use the the default quota class for default quota.
2013-11-28 20:32:20.139 12269 ERROR cinder.volume.manager [req-18396ff0-3550-4acc-8e22-bf25feb4fe16 f40889a38c4644fab9d91b73fcfde1bd ddbce226991a45bf97642120207897cb] volume ea526067-a6bd-4e58-9db2-d0e58d166022: Error trying to extend volume
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager Traceback (most recent call last):
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager   File "/usr/lib/python2.6/site-packages/cinder/volume/manager.py", line 875, in extend_volume
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager     self.driver.extend_volume(volume, new_size)
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager   File "/usr/lib/python2.6/site-packages/cinder/volume/drivers/glusterfs.py", line 919, in extend_volume
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager     info = self._qemu_img_info(volume_path)
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager   File "/usr/lib/python2.6/site-packages/cinder/volume/drivers/glusterfs.py", line 132, in _qemu_img_info
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager     info = image_utils.qemu_img_info(path)
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager   File "/usr/lib/python2.6/site-packages/cinder/image/image_utils.py", line 191, in qemu_img_info
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager     out, err = utils.execute(*cmd, run_as_root=True)
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager   File "/usr/lib/python2.6/site-packages/cinder/utils.py", line 142, in execute
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager     return processutils.execute(*cmd, **kwargs)
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager   File "/usr/lib/python2.6/site-packages/cinder/openstack/common/processutils.py", line 173, in execute
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager     cmd=' '.join(cmd))
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager ProcessExecutionError: Unexpected error while running command.
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C LANG=C qemu-img info /var/lib/cinder/mnt/d1c4e0e3273b195c7d311f8303edd391/volume-ea526067-a6bd-4e58-9db2-d0e58d166022
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager Exit code: 1
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager Stdout: ''
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager Stderr: "Could not open '/var/lib/cinder/mnt/d1c4e0e3273b195c7d311f8303edd391/volume-ea526067-a6bd-4e58-9db2-d0e58d166022': No such file or directory\n"
2013-11-28 20:32:20.139 12269 TRACE cinder.volume.manager 
(END)

Comment 2 Flavio Percoco 2013-12-04 08:25:34 UTC
This seems related to bug #1016224. The volume creation command fails before getting into the volume manager where the status management should happen.

Comment 4 Eric Harney 2014-01-13 15:58:43 UTC
*** Bug 1051606 has been marked as a duplicate of this bug. ***

Comment 5 Flavio Percoco 2014-01-16 10:31:10 UTC
Patch backported upstream for stable/havana

Comment 6 Flavio Percoco 2014-01-23 09:31:02 UTC
Moving to high since this is quite frustrating from users and leaves the volumes in inconsistent status.

Comment 8 Dafna Ron 2014-02-19 11:34:32 UTC
I am still able to reproduce this on openstack-cinder-2013.2.2-1.el6ost.noarch

moving back to dev with logs. 

it seems that it's a case of scheduler not knowing the status of a service (race).

Comment 9 Dafna Ron 2014-02-19 11:37:32 UTC
Created attachment 865043 [details]
logs

Comment 10 Dafna Ron 2014-02-19 11:49:15 UTC
[root@orange-vdsf ~(keystone_admin)]# cinder list 
+--------------------------------------+-----------+-------------------+------+-------------+----------+-------------+
|                  ID                  |   Status  |    Display Name   | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+-------------------+------+-------------+----------+-------------+
| 1be7427d-bec7-4afb-8891-34b350b1f613 | extending |       dafna3      |  2   |     None    |  false   |             |
| 2b441306-0064-4da9-bf8a-ac19bf4342ba | extending |       dafna2      |  20  |     None    |  false   |             |
| 885ab1a8-e9e3-4478-a39a-aa7af9af05dc | extending |    extend-dafna   |  12  |     None    |  false   |             |
| 9681b5da-975b-4cfe-af5a-a418b19fec81 | available | copy-of-dafna-vol |  12  |     None    |  false   |             |
+--------------------------------------+-----------+-------------------+------+-------------+----------+-------------+
 here is the output from cinder list (with volume UUID and state)

Comment 11 Flavio Percoco 2014-02-19 13:01:37 UTC
The title of this bug seems quite misleading to me now.

The issue here is that volume operations interrupted when cinder-volume goes down are not rolled back, which leaves the volume status in a not consistent state. However, there are a few other implications here:

1. There's no easy way to know when something has gone wrong with one of the cinder-volumes node after the rpc.cast is sent unless we add a sentinel that watches over volume's status changes.

2. If the cinder-volume goes down *while* extending a volume, we can't simply restore the volume status because we don't actually know what the status is. The volume at this point could be completely broken and there's no way - without manual inspection - to know that.

So, we could think about possible solutions for this issue but it would go way beyond this bug and Icehouse. It requires discussions upstream - if it is even considered a real issue.

Thing is, if cinder-scheduler sends a cast command to cinder-volume and then cinder-volume goes down *while* executing that command, I think the wrong status in the volume is the least important issue a cloud operator would have.

Comment 12 Flavio Percoco 2014-02-19 13:02:22 UTC
*** Bug 1066955 has been marked as a duplicate of this bug. ***

Comment 13 Flavio Percoco 2014-03-03 10:45:30 UTC
This bug was originally created to track a failure with driver initialization and evolved to ensure volume's status consistency between `cinder-volume` crashes. Based on what I've explained in c#11, I've cloned this bug to track the changes that will need to happen in cinder and I'll close this bug as verified since the original issues with regards to driver initialization where indeed fixed - as show in other bugs related to the same issue.

Comment 15 errata-xmlrpc 2014-03-04 20:13:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0213.html


Note You need to log in before you can comment on or make changes to this bug.