Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1510039 - openstack upscale failed (database inconsistent ? )
Summary: openstack upscale failed (database inconsistent ? )
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-heat
Version: 11.0 (Ocata)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Zane Bitter
QA Contact: Ronnie Rasouli
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-06 15:14 UTC by Eduard Barrera
Modified: 2017-11-20 09:53 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-11-20 09:53:37 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Eduard Barrera 2017-11-06 15:14:00 UTC
Description of problem:

The installation of the 13th  compute failed the error :  The Referenced Attribute (13 nova_server_resource) is incorrect

Could you please help us hiw to resolve this issue ? 


2017-09-25 10:17:14Z [overcloud.ComputeG9]: UPDATE_FAILED  resources.ComputeGROUPO9: resources[13]: Updating a stack when it is deleting is not supported.
2017-09-25 10:17:14Z [overcloud.Controller]: UPDATE_FAILED  UPDATE aborted
2017-09-25 10:17:14Z [overcloud]: UPDATE_FAILED  resources.ComputeGROUP9: resources[13]: Updating a stack when it is deleting is not supported.
2017-09-25 10:27:46Z [overcloud]: UPDATE_IN_PROGRESS  Stack UPDATE started
2017-09-25 10:28:00Z [overcloud]: UPDATE_FAILED  The Referenced Attribute (13 nova_server_resource) is incorrect.
2017-09-25 10:31:54Z [overcloud]: UPDATE_IN_PROGRESS  Stack UPDATE started
2017-09-25 10:32:07Z [overcloud]: UPDATE_FAILED  The Referenced Attribute (13 nova_server_resource) is incorrect.
2017-09-25 12:25:23Z [overcloud]: UPDATE_IN_PROGRESS  Stack UPDATE started
2017-09-25 12:25:36Z [overcloud]: UPDATE_FAILED  The Referenced Attribute (13 nova_server_resource) is incorrect.
2017-09-25 12:35:07Z [overcloud]: UPDATE_IN_PROGRESS  Stack UPDATE started
2017-09-25 12:35:21Z [overcloud]: UPDATE_FAILED  The Referenced Attribute (13 nova_server_resource) is incorrect.
2017-09-25 12:59:57Z [overcloud]: UPDATE_IN_PROGRESS  Stack UPDATE started
2017-09-25 13:00:12Z [overcloud]: UPDATE_FAILED  The Referenced Attribute (13 nova_server_resource) is incorrect.
2017-09-25 13:52:43Z [overcloud]: UPDATE_IN_PROGRESS  Stack UPDATE started
2017-09-25 13:52:57Z [overcloud]: UPDATE_FAILED  The Referenced Attribute (13 nova_server_resource) is incorrect.
2017-09-25 14:00:14Z [overcloud]: UPDATE_IN_PROGRESS  Stack UPDATE started
2017-09-25 14:00:28Z [overcloud]: UPDATE_FAILED  The Referenced Attribute (13 nova_server_resource) is incorrect.


Customer also reported they undeleted an instance since the deployment was stuck in a node did not exist 

$ openstack software deployment list | grep PROGRESS
| 6691a035-7a4d-4432-8fbc-f41731bc2db9 | 199cb1c3-c434-4167-aaf5-f476461a0aed | 3193c7b2-9728-4e3f-9268-c76a5cf8790a | CREATE | IN_PROGRESS |

$ openstack software deployment show 6691a035-7a4d-4432-8fbc-f41731bc2db9
+---------------+--------------------------------------------------------+
| Field         | Value                                                  |
+---------------+--------------------------------------------------------+
| id            | 6691a035-7a4d-4432-8fbc-f41731bc2db9                   |
| server_id     | 3193c7b2-9728-4e3f-9268-c76a5cf8790a                   |
| config_id     | 199cb1c3-c434-4167-aaf5-f476461a0aed                   |
| creation_time | 2017-08-18T16:43:51Z                                   |
| updated_time  |                                                        |
| status        | IN_PROGRESS                                            |
| status_reason | Deploy data available                                  |
| input_values  | {u'interface_name': u'nic1', u'bridge_name': u'br-ex'} |
| action        | CREATE                                                 |
+---------------+--------------------------------------------------------+

It is also reported that there was some problems with the networks cards, specially on the node they are trying to reinstall

""We had some issues with network cards bricking during the deployment on some nodes, specially on the one we're trying to reinstall right now, and I think what happened is the node lost connectivity during the deployment due to the network cards dying and deployment got stuck in a weird state."

"""

Version-Release number of selected component (if applicable):


How reproducible:
Always


Steps to Reproduce:
1. upscale the overcloud by one node
2.
3.

Actual results:
fails with already provided message

Expected results:
dont fail

Comment 3 Zane Bitter 2017-11-09 21:04:27 UTC
There's certainly nothing inconsistent about the database here - the server fails to delete because the request to Nova timed out, and it remains in the database in a DELETE_FAILED state as you'd expect. The API will say the same thing.

I suspect that the likely issue here is that the server has, in fact, been deleted, but there are still software deployments that reference it. They'd be deleted in any successful stack update, but first the new template gets validated, for which purpose it will try to resolve the inputs to the software deployment, which will reference the resource that is already gone, which will fail with an error saying the attribute (of the resource group) is incorrect.

Had the original update (the one that removed the resource) continued, everything would have been fine, since the new template had already been validated. If my guess is right, the error comes from trying to validate an updated template after a partially-completed update that removes a server.

In a way, one could consider this the same issue as bug 1430753, which is fixed in OSP12 - starting with the Pike release, Heat saves the attribute values at each update and uses these stored values for validating the updated template. It appears to me that this issue wouldn't be resolved though, since the resources in question are actually going into a FAILED state, which will clear out the stored attribute values.

In any event, none of that helps you right now... if you have a backup of the Heat database then I'd consider restoring it.


Note You need to log in before you can comment on or make changes to this bug.