Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1511874 - OSP11 -> OSP12 upgrade: unable to scale out compute nodes post upgrade
Summary: OSP11 -> OSP12 upgrade: unable to scale out compute nodes post upgrade
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: instack-undercloud
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: 12.0 (Pike)
Assignee: Dmitry Tantsur
QA Contact: Marius Cornea
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-10 10:45 UTC by Marius Cornea
Modified: 2018-02-05 19:15 UTC (History)
10 users (show)

Fixed In Version: instack-undercloud-7.4.3-4.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-13 22:20:31 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Launchpad 1731885 None None None 2017-11-13 11:15:53 UTC
Red Hat Product Errata RHEA-2017:3462 normal SHIPPED_LIVE Red Hat OpenStack Platform 12.0 Enhancement Advisory 2018-02-16 01:43:25 UTC
OpenStack gerrit 519312 None None None 2017-11-13 11:33:06 UTC

Description Marius Cornea 2017-11-10 10:45:19 UTC
Description of problem:
OSP11 -> OSP12 upgrade: unable to scale out compute nodes post upgrade. Trying to deploy with an additional node fails with:

2017-11-10 10:30:35Z [overcloud]: UPDATE_FAILED  resources.Compute: ResourceInError: resources[2].resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. , Code: 500"
 

Version-Release number of selected component (if applicable):
2017-11-09.2 build

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP11 with 3 controllers, 2 computes, 3 ceph nodes
2. Upgrade to OSP12
3. Remove one compute node from deployment:
openstack overcloud node delete --stack overcloud efd8563d-7619-40f9-ac4f-67cf7b6798a1
4. Wait for stack to get UPDATE_COMPLETE
5. Rerun the openstack overcloud deploy with ComputeCount: 2 to get the deleted compute node reprovisioned

Actual results:
The deploy command fails with:

2017-11-10 10:30:35Z [overcloud]: UPDATE_FAILED  resources.Compute: ResourceInError: resources[2].resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. , Code: 500"

 Stack overcloud UPDATE_FAILED 

overcloud.Compute.2.NovaCompute:
  resource_type: OS::TripleO::ComputeServer
  physical_resource_id: 492f864f-76bf-4acf-9f89-8148b4ed427b
  status: CREATE_FAILED
  status_reason: |
    ResourceInError: resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. , Code: 500"
Heat Stack update failed.
Heat Stack update failed.

Expected results:
The deploy command gets completed fine.

Additional info:
Attaching the sosreport on the undercloud.

Comment 2 Marius Cornea 2017-11-10 11:00:38 UTC
(undercloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| ID                                   | Name         | Status | Task State | Power State | Networks               |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| eadafa81-0ce3-48ef-9101-ae80e3509e71 | ceph-0       | ACTIVE | -          | Running     | ctlplane=192.168.24.11 |
| 8fad8238-c463-4807-992b-19a0bdfe840f | ceph-1       | ACTIVE | -          | Running     | ctlplane=192.168.24.12 |
| 88826ab3-fd49-4866-9f18-daa3be19bcd1 | ceph-2       | ACTIVE | -          | Running     | ctlplane=192.168.24.10 |
| 2e145e34-c57e-4a75-a59b-1c19bd58f289 | compute-1    | ACTIVE | -          | Running     | ctlplane=192.168.24.9  |
| 492f864f-76bf-4acf-9f89-8148b4ed427b | compute-2    | ERROR  | -          | NOSTATE     |                        |
| 61a4692f-8acc-418b-a3da-3e5294b58d37 | controller-0 | ACTIVE | -          | Running     | ctlplane=192.168.24.19 |
| b230be0b-1699-4078-995d-a6a1ca6e1cb3 | controller-1 | ACTIVE | -          | Running     | ctlplane=192.168.24.13 |
| ecfef989-f2b9-4f42-8f73-bbd3c2c3ce47 | controller-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.7  |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+

Checking the nova logs for the failed node uuid we can see in /var/log/nova/nova-scheduler.log:

2017-11-10 05:30:02.529 1348 DEBUG nova.scheduler.manager [req-6cb9920f-7705-43c9-ad06-42be84e6bf9c a1f3cd9117df43c8ad2a236b6f70e801 d6b72ece1f95470b817ea14f96205691 - default default] Starting to schedule for instances: [u'492f864f-76bf-4acf-9f89-8148b4ed427b'] select_destinations /usr/lib/python2.7/site-packages/nova/scheduler/manager.py:113
2017-11-10 05:30:02.550 1348 DEBUG nova.scheduler.manager [req-6cb9920f-7705-43c9-ad06-42be84e6bf9c a1f3cd9117df43c8ad2a236b6f70e801 d6b72ece1f95470b817ea14f96205691 - default default] Got no allocation candidates from the Placement API. This may be a temporary occurrence as compute nodes start up and begin reporting inventory to the Placement service. select_destinations /usr/lib/python2.7/site-packages/nova/scheduler/manager.py:133
2017-11-10 05:30:33.083 1348 DEBUG oslo_concurrency.lockutils [req-d6621942-d42d-4826-bbbd-f3197a374167 - - - - -] Lock "host_instance" acquired by "nova.scheduler.host_manager.sync_instance_info" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:270

In /var/log/nova/nova-conductor.log:

2017-11-10 05:29:02.934 3033 ERROR nova.conductor.manager [req-dd94e11d-a69b-4d29-8ab3-667325074865 a1f3cd9117df43c8ad2a236b6f70e801 d6b72ece1f95470b817ea14f96205691 - default default] Failed to schedule instances: NoValidHost_Remote: No v
alid host was found. 
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 232, in inner
    return func(*args, **kwargs)

  File "/usr/lib/python2.7/site-packages/nova/scheduler/manager.py", line 137, in select_destinations
    raise exception.NoValidHost(reason="")

NoValidHost: No valid host was found.

Comment 3 Ollie Walsh 2017-11-10 13:07:45 UTC
Resource class was only set on one of the ironic nodes during upgrade

/home/stack/undercloud_upgrade.log:

2017-11-09 06:42:05,991 INFO: [2017-11-09 06:42:05,991] (os-refresh-config) [INFO] Completed phase post-configure
2017-11-09 06:42:06,000 INFO: os-refresh-config completed successfully
2017-11-09 06:42:07,623 INFO: Node f99ca41a-9daf-4927-8458-e937de3c93e3 resource class was set to baremetal
2017-11-09 06:42:07,662 INFO: Not creating flavor "baremetal" because it already exists.
2017-11-09 06:42:07,758 INFO: Flavor baremetal updated to use custom resource class baremetal
2017-11-09 06:42:07,876 INFO: Created flavor "control" with profile "control"
2017-11-09 06:42:07,876 INFO: Not creating flavor "compute" because it already exists.
2017-11-09 06:42:07,950 INFO: Flavor compute updated to use custom resource class baremetal
2017-11-09 06:42:08,046 INFO: Created flavor "ceph-storage" with profile "ceph-storage"
2017-11-09 06:42:08,137 INFO: Created flavor "block-storage" with profile "block-storage"
2017-11-09 06:42:08,228 INFO: Created flavor "swift-storage" with profile "swift-storage"
2017-11-09 06:42:08,236 INFO: Configuring Mistral workbooks
2017-11-09 06:42:34,598 INFO: Mistral workbooks configured successfully
2017-11-09 06:42:35,099 INFO: Migrating environment for plan overcloud to Swift.
2017-11-09 06:42:35,212 INFO: Not creating default plan "overcloud" because it already exists.
2017-11-09 06:42:35,212 INFO: Configuring an hourly cron trigger for tripleo-ui logging
2017-11-09 06:42:37,703 INFO: Added _member_ role to admin user
2017-11-09 06:42:37,986 INFO: Starting and waiting for validation groups ['post-upgrade']

limit should be 0 here https://review.openstack.org/#/c/490851/9/instack_undercloud/undercloud.py@1414.

Comment 4 Ollie Walsh 2017-11-10 13:09:50 UTC
with limit==-1 [<Node {u'uuid': u'f99ca41a-9daf-4927-8458-e937de3c93e3', u'links': [{u'href': u'http://192.168.24.1:6385/v1/nodes/f99ca41a-9daf-4927-8458-e937de3c93e3', u'rel': u'self'}, {u'href': u'http://192.168.24.1:6385/nodes/f99ca41a-9daf-4927-8458-e937de3c93e3', u'rel': u'bookmark'}], u'resource_class': u'baremetal'}>]

with limit==0 [<Node {u'uuid': u'f99ca41a-9daf-4927-8458-e937de3c93e3', u'links': [{u'href': u'http://192.168.24.1:6385/v1/nodes/f99ca41a-9daf-4927-8458-e937de3c93e3', u'rel': u'self'}, {u'href': u'http://192.168.24.1:6385/nodes/f99ca41a-9daf-4927-8458-e937de3c93e3', u'rel': u'bookmark'}], u'resource_class': u'baremetal'}>, <Node {u'uuid': u'4ebf6ff1-3f3a-447f-b5c2-ec9c04ced8ce', u'links': [{u'href': u'http://192.168.24.1:6385/v1/nodes/4ebf6ff1-3f3a-447f-b5c2-ec9c04ced8ce', u'rel': u'self'}, {u'href': u'http://192.168.24.1:6385/nodes/4ebf6ff1-3f3a-447f-b5c2-ec9c04ced8ce', u'rel': u'bookmark'}], u'resource_class': None}>, <Node {u'uuid': u'a6c3c3fb-0ff2-46dc-a02b-6d6ffe9d74b2', u'links': [{u'href': u'http://192.168.24.1:6385/v1/nodes/a6c3c3fb-0ff2-46dc-a02b-6d6ffe9d74b2', u'rel': u'self'}, {u'href': u'http://192.168.24.1:6385/nodes/a6c3c3fb-0ff2-46dc-a02b-6d6ffe9d74b2', u'rel': u'bookmark'}], u'resource_class': None}>, <Node {u'uuid': u'f5dd8219-6b8f-4a39-8a96-6330689d54e2', u'links': [{u'href': u'http://192.168.24.1:6385/v1/nodes/f5dd8219-6b8f-4a39-8a96-6330689d54e2', u'rel': u'self'}, {u'href': u'http://192.168.24.1:6385/nodes/f5dd8219-6b8f-4a39-8a96-6330689d54e2', u'rel': u'bookmark'}], u'resource_class': None}>, <Node {u'uuid': u'046cb1f3-5d50-4be8-80c2-1d4ccc58487a', u'links': [{u'href': u'http://192.168.24.1:6385/v1/nodes/046cb1f3-5d50-4be8-80c2-1d4ccc58487a', u'rel': u'self'}, {u'href': u'http://192.168.24.1:6385/nodes/046cb1f3-5d50-4be8-80c2-1d4ccc58487a', u'rel': u'bookmark'}], u'resource_class': None}>, <Node {u'uuid': u'782bdc4f-af01-47c4-ac02-d73276d7ab77', u'links': [{u'href': u'http://192.168.24.1:6385/v1/nodes/782bdc4f-af01-47c4-ac02-d73276d7ab77', u'rel': u'self'}, {u'href': u'http://192.168.24.1:6385/nodes/782bdc4f-af01-47c4-ac02-d73276d7ab77', u'rel': u'bookmark'}], u'resource_class': None}>, <Node {u'uuid': u'c7c26891-88d1-498f-a84e-c15886ec3198', u'links': [{u'href': u'http://192.168.24.1:6385/v1/nodes/c7c26891-88d1-498f-a84e-c15886ec3198', u'rel': u'self'}, {u'href': u'http://192.168.24.1:6385/nodes/c7c26891-88d1-498f-a84e-c15886ec3198', u'rel': u'bookmark'}], u'resource_class': None}>, <Node {u'uuid': u'81f8dd71-e0c6-4be7-b20f-47871c61a2a9', u'links': [{u'href': u'http://192.168.24.1:6385/v1/nodes/81f8dd71-e0c6-4be7-b20f-47871c61a2a9', u'rel': u'self'}, {u'href': u'http://192.168.24.1:6385/nodes/81f8dd71-e0c6-4be7-b20f-47871c61a2a9', u'rel': u'bookmark'}], u'resource_class': None}>]

Comment 5 Dmitry Tantsur 2017-11-13 11:10:52 UTC
Thanks for triaging, I can take care of it.

Comment 6 Dmitry Tantsur 2017-11-13 11:33:06 UTC
Correction: stable/pike patch is https://review.openstack.org/519312

Comment 7 Bob Fournier 2017-11-22 14:47:13 UTC
Merged downstream - https://code.engineering.redhat.com/gerrit/#/c/123953/

Comment 11 errata-xmlrpc 2017-12-13 22:20:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462


Note You need to log in before you can comment on or make changes to this bug.