Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1684798 - When deploying an overcloud, sometimes some neutron ports are created with wrong mac address
Summary: When deploying an overcloud, sometimes some neutron ports are created with wr...
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openshift-heat-templates
Version: 13.0 (Queens)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: ---
Assignee: Nate Johnston
QA Contact: Candido Campos
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-02 16:03 UTC by David Hill
Modified: 2019-04-12 04:42 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)

Description David Hill 2019-03-02 16:03:19 UTC
Description of problem:

When deploying an overcloud, sometimes some neutron ports are not created and the overcloud deployment fails later on by timing out due to one of the node (or more sometimes) not getting an IP address / PXE booting / etc.


(undercloud) [root@undercloud-0-rhosp13 2abe5542-e520-4b73-9582-9607d3e27c85]# neutron port-list | grep 52:54
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
| 257a80e1-18a3-42f6-a5bc-05bc1c051d3c | CephStorage-port-0      | 76ea045fa187441e9ad0f6dcc879e54d | 52:54:00:c2:e4:57 | {"subnet_id": "1c44aaa0-6c94-4eba-b311-242227af0293", "ip_address": "192.0.2.9"}       |
| 2f628edf-f035-4c32-a542-22f57c315410 | Controller-port-0       | 76ea045fa187441e9ad0f6dcc879e54d | 52:54:00:16:81:92 | {"subnet_id": "1c44aaa0-6c94-4eba-b311-242227af0293", "ip_address": "192.0.2.35"}      |
| 45d9e685-3b14-4032-ace5-9df7045b0166 | NovaCompute-port-0      | 76ea045fa187441e9ad0f6dcc879e54d | 52:54:00:ba:9a:3a | {"subnet_id": "1c44aaa0-6c94-4eba-b311-242227af0293", "ip_address": "192.0.2.6"}       |
| ca615256-e0df-46fe-8d24-c79e73727f06 | Controller-port-0       | 76ea045fa187441e9ad0f6dcc879e54d | 52:54:00:9a:c0:ee | {"subnet_id": "1c44aaa0-6c94-4eba-b311-242227af0293", "ip_address": "192.0.2.13"}      |
| e81cd2d5-8ec6-413e-b006-032f82d800d0 | NovaCompute-port-0      | 76ea045fa187441e9ad0f6dcc879e54d | 52:54:00:5b:d0:e4 | {"subnet_id": "1c44aaa0-6c94-4eba-b311-242227af0293", "ip_address": "192.0.2.8"}       |


Version-Release number of selected component (if applicable):


How reproducible:
Random

Steps to Reproduce:
1. Try deploying an overcloud with 6 nodes (3 controllers, 2 computes, 1 ceph)
2.
3.

Actual results:
One of the node (or more) fails to get an IP address due to missing neutron port

Expected results:
All neutron ports should always be created

Additional info:
This is random.

Comment 1 David Hill 2019-03-02 16:09:37 UTC
In fact, the problem is that it appears to be using the wrong mac:

/var/lib/neutron/dhcp/2abe5542-e520-4b73-9582-9607d3e27c8/host:
(undercloud) [root@undercloud-0-rhosp13 2abe5542-e520-4b73-9582-9607d3e27c85]# cat host 
fa:16:3e:7c:33:13,host-192-0-2-5.localdomain,192.0.2.5
fa:16:3e:ab:63:69,host-192-0-2-10.localdomain,192.0.2.10
52:54:00:5b:d0:e4,host-192-0-2-8.localdomain,192.0.2.8,set:e81cd2d5-8ec6-413e-b006-032f82d800d0
52:54:00:9a:c0:ee,host-192-0-2-13.localdomain,192.0.2.13,set:ca615256-e0df-46fe-8d24-c79e73727f06
52:54:00:ba:9a:3a,host-192-0-2-6.localdomain,192.0.2.6,set:45d9e685-3b14-4032-ace5-9df7045b0166
52:54:00:16:81:92,host-192-0-2-35.localdomain,192.0.2.35,set:2f628edf-f035-4c32-a542-22f57c315410
52:54:00:c2:e4:57,host-192-0-2-9.localdomain,192.0.2.9,set:257a80e1-18a3-42f6-a5bc-05bc1c051d3c
fa:16:3e:26:ed:83,host-192-0-2-19.localdomain,192.0.2.19 <========== where's that coming from ?

and the ports:
| 257a80e1-18a3-42f6-a5bc-05bc1c051d3c | CephStorage-port-0      | 76ea045fa187441e9ad0f6dcc879e54d | 52:54:00:c2:e4:57 | {"subnet_id": "1c44aaa0-6c94-4eba-b311-242227af0293", "ip_address": "192.0.2.9"}       |
| 2f628edf-f035-4c32-a542-22f57c315410 | Controller-port-0       | 76ea045fa187441e9ad0f6dcc879e54d | 52:54:00:16:81:92 | {"subnet_id": "1c44aaa0-6c94-4eba-b311-242227af0293", "ip_address": "192.0.2.35"}      |
| 45d9e685-3b14-4032-ace5-9df7045b0166 | NovaCompute-port-0      | 76ea045fa187441e9ad0f6dcc879e54d | 52:54:00:ba:9a:3a | {"subnet_id": "1c44aaa0-6c94-4eba-b311-242227af0293", "ip_address": "192.0.2.6"}       |
| ca615256-e0df-46fe-8d24-c79e73727f06 | Controller-port-0       | 76ea045fa187441e9ad0f6dcc879e54d | 52:54:00:9a:c0:ee | {"subnet_id": "1c44aaa0-6c94-4eba-b311-242227af0293", "ip_address": "192.0.2.13"}      |
| d978f0d8-6512-4648-bc2b-baaf2226271c | Controller-port-0       | 76ea045fa187441e9ad0f6dcc879e54d | fa:16:3e:26:ed:83 | {"subnet_id": "1c44aaa0-6c94-4eba-b311-242227af0293", "ip_address": "192.0.2.19"}      |
| e81cd2d5-8ec6-413e-b006-032f82d800d0 | NovaCompute-port-0      | 76ea045fa187441e9ad0f6dcc879e54d | 52:54:00:5b:d0:e4 | {"subnet_id": "1c44aaa0-6c94-4eba-b311-242227af0293", "ip_address": "192.0.2.8"}       |


when the node clearly don't have fa:16:3e:26:ed:83 ... where's that mac coming from ?

Comment 2 David Hill 2019-03-02 16:13:35 UTC
(undercloud) [root@undercloud-0-rhosp13 2abe5542-e520-4b73-9582-9607d3e27c85]# openstack baremetal port list
+--------------------------------------+-------------------+
| UUID                                 | Address           |
+--------------------------------------+-------------------+
| d258850c-81e7-46bc-a8af-b873cc1801dd | 52:54:00:9a:c0:ee |
| 668ad683-b2e3-4217-ba61-91e2f754eaeb | 52:54:00:16:81:92 |
| 5985c4af-e51d-4838-a6ab-80f57cce29d8 | 52:54:00:0d:a8:cd |
| 745db1c9-4667-4c50-9950-ee65873443bc | 52:54:00:ba:9a:3a |
| e2e89adb-ecd8-4d01-9383-934f7e2f1216 | 52:54:00:c2:e4:57 |
| ab9a062d-413e-4205-bb8d-100e882daeda | 52:54:00:5b:d0:e4 |
| 68bcac88-ae6c-4280-acd7-339676073daf | 52:54:00:c3:5b:fe |
+--------------------------------------+-------------------+


This is the mac that it should be selecting:

| 5985c4af-e51d-4838-a6ab-80f57cce29d8 | 52:54:00:0d:a8:cd |

Comment 6 Nate Johnston 2019-04-03 20:05:35 UTC
David,

Since neither Candido nor I can reproduce this, are you all right if we close it for now, and reopen it if any of us hits this issue again?

Thanks,

Nate

Comment 17 David Hill 2019-04-08 00:32:43 UTC
I get this error message:

2019-04-07 16:55:22.329 7905 ERROR nova.compute.manager [req-a692c82d-ae18-4811-8182-077a75d9f4fd 683e9ef412654f7fbf1514e6894ce944 deae599329fb401b9dc7f325401cf6ae - default default] [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773] Failure prepping block device: VirtualInterfacePlugException: Cannot attach VIF 1b3c473b-6c09-4111-9040-f4d00dbf0754 to the node 21c1542a-1e32-4698-833d-6df44029f154 due to error: Unable to attach VIF 1b3c473b-6c09-4111-9040-f4d00dbf0754, not enough free physical ports. (HTTP 400)
2019-04-07 16:55:22.329 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773] Traceback (most recent call last):
2019-04-07 16:55:22.329 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2194, in _build_resources
2019-04-07 16:55:22.329 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]     instance, network_info)
2019-04-07 16:55:22.329 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]   File "/usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py", line 1841, in prepare_networks_before_block_device_mapping
2019-04-07 16:55:22.329 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]     instance=instance)
2019-04-07 16:55:22.329 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-04-07 16:55:22.329 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]     self.force_reraise()
2019-04-07 16:55:22.329 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2019-04-07 16:55:22.329 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]     six.reraise(self.type_, self.value, self.tb)
2019-04-07 16:55:22.329 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]   File "/usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py", line 1834, in prepare_networks_before_block_device_mapping
2019-04-07 16:55:22.329 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]     self.plug_vifs(instance, network_info)
2019-04-07 16:55:22.329 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]   File "/usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py", line 1486, in plug_vifs
2019-04-07 16:55:22.329 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]     self._plug_vifs(node, instance, network_info)
2019-04-07 16:55:22.329 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]   File "/usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py", line 1456, in _plug_vifs
2019-04-07 16:55:22.329 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]     self._plug_vif(node, port_id)
2019-04-07 16:55:22.329 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]   File "/usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py", line 1438, in _plug_vif
2019-04-07 16:55:22.329 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]     raise exception.VirtualInterfacePlugException(msg)
2019-04-07 16:55:22.329 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773] VirtualInterfacePlugException: Cannot attach VIF 1b3c473b-6c09-4111-9040-f4d00dbf0754 to the node 21c1542a-1e32-4698-833d-6df44029f154 due to error: Unable to attach VIF 1b3c473b-6c09-4111-9040-f4d00dbf0754, not enough free physical ports. (HTTP 400)
2019-04-07 16:55:22.329 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773] 
2019-04-07 16:55:24.443 7905 ERROR nova.compute.manager [req-a692c82d-ae18-4811-8182-077a75d9f4fd 683e9ef412654f7fbf1514e6894ce944 deae599329fb401b9dc7f325401cf6ae - default default] [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773] Build of instance 3eb39e77-94c7-4955-ad83-70ae09f46773 aborted: Failure prepping block device.: BuildAbortException: Build of instance 3eb39e77-94c7-4955-ad83-70ae09f46773 aborted: Failure prepping block device.
2019-04-07 16:55:24.443 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773] Traceback (most recent call last):
2019-04-07 16:55:24.443 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1841, in _do_build_and_run_instance
2019-04-07 16:55:24.443 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]     filter_properties, request_spec)
2019-04-07 16:55:24.443 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2063, in _build_and_run_instance
2019-04-07 16:55:24.443 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]     bdms=block_device_mapping)
2019-04-07 16:55:24.443 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-04-07 16:55:24.443 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]     self.force_reraise()
2019-04-07 16:55:24.443 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2019-04-07 16:55:24.443 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]     six.reraise(self.type_, self.value, self.tb)
2019-04-07 16:55:24.443 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2015, in _build_and_run_instance
2019-04-07 16:55:24.443 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]     block_device_mapping) as resources:
2019-04-07 16:55:24.443 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]   File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
2019-04-07 16:55:24.443 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]     return self.gen.next()
2019-04-07 16:55:24.443 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2235, in _build_resources
2019-04-07 16:55:24.443 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]     reason=msg)
2019-04-07 16:55:24.443 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773] BuildAbortException: Build of instance 3eb39e77-94c7-4955-ad83-70ae09f46773 aborted: Failure prepping block device.
2019-04-07 16:55:24.443 7905 ERROR nova.compute.manager [instance: 3eb39e77-94c7-4955-ad83-70ae09f46773]

Comment 18 Nate Johnston 2019-04-10 17:56:25 UTC
David, 

It looks like the error you quote above is being reported by Nova but is actually an Ironic error [1].  That error is raised [2] when _get_free_portgroups_and_ports() returns an empty list [3].  This might be something that the HardProv DFG needs to look at.  It's sort of at the intersection of Neutron, Nova, and Ironic, if I understand it correctly.

Can you pust the whole log so I can see what precedes the sections you quoted above?

Thanks,

Nate

[1] https://opendev.org/openstack/nova/src/branch/stable/queens/nova/virt/ironic/driver.py#L1406-L1423
[2] https://opendev.org/openstack/ironic/src/branch/stable/queens/ironic/drivers/modules/network/common.py#L174-L175
[3] https://opendev.org/openstack/ironic/src/branch/stable/queens/ironic/drivers/modules/network/common.py#L87-L141


Note You need to log in before you can comment on or make changes to this bug.