Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1512713 - DHCP/iPXE client fails to get IP address from DHCP/iPXE server on neutron network on OVS-DPDK environment
Summary: DHCP/iPXE client fails to get IP address from DHCP/iPXE server on neutron ne...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openvswitch
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Vijay Chundury
QA Contact: Ofer Blaut
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-13 22:24 UTC by Aviv Guetta
Modified: 2017-12-18 10:50 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-18 10:50:53 UTC


Attachments (Terms of Use)

Description Aviv Guetta 2017-11-13 22:24:09 UTC
Description of problem:

The customer has an iPXE server and client Virtual Machines connected by one neutron network (no DHCP, no routers).

They fail to run DHCP process between the 2 VMs.


Version-Release number of selected component (if applicable):

Red Hat Openstack Platform 10 z-stream 5. OVS-DPDK compute notes only.

Steps to Reproduce:

Run a DHCP service between to VM's, one of them as client, another one as a server. Check the DHCP server successfully get an IP address.

Actual results:

DHCP/iPXE client fails to get IP address from the DHCP/iPXE server. 

Expected results:


DHCP/iPXE client gets IP address from the DHCP/iPXE server. 

Additional info:

Additional tests we run:

- We instantiated another VM (no-iPXE), configured manually an IP address, it communicated with the iPXE server with no problem (ping).

- We approached that issue with tcpdump by testing  connectivity by observing the BOOTP requests, with the following results:

compute 1 (CPT1) - iPXE server VM:
tcpdump on iPXE server NIC = NOK
tcpdump on br-int (using port mirroring) = OK 
tcpdump on br-ex = OK
compute 2 (CPT2) - iPXE client VM:
tcpdump on br-int (using port mirroring) = OK
tcpdump on br-ex = OK
 
The last part of the path, (that we couldn't be check by tcpdump) was between br-int to the iPXE server NIC.

- We took another approach and instantiated one new VM (called C1) on CPT1 and one in CPT2 (called C2), with dhcp enabled.
  Afterwards, we run tcpdump on both of them, here are the results (between VM's):

SC1 (DHCP server) <=> C1 = OK
PL3 (DHCP client) <=> C2 = OK
C2 <=> C1 = OK

So, at compute nodes level:
Broadcast traffic from CTP1 to CTP2 = OK
Broadcast traffic from CTP2 to CPT1 = NOK

- We installed ovs-tcpdump, we saw bootp traffic only from within that cpt1 (iPXE server) node.
As mentioned before, we managed to watch broadcast packets on br-int, br-ex on both compute nodes.

Comment 2 Franck Baudin 2017-11-15 23:48:21 UTC
You can also use ovs-tcpdump on vhost-user interfaces, however with a little trick as the interface name is too long so you need to use a specific option: "--mirror-to test0" for instance.

[root@overcloud-compute-0 ~]# virsh dumpxml 2| grep vhu
      <source type='unix' path='/var/run/openvswitch/vhuc359d121-a9' mode='client'/>
      <source type='unix' path='/var/run/openvswitch/vhu8a801025-d6' mode='client'/>
      <source type='unix' path='/var/run/openvswitch/vhu8ffc8d2b-66' mode='client'/>
[root@overcloud-compute-0 ~]# ovs-tcpdump -i vhuc359d121-a9 --mirror-to test0 -c 10 
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on test0, link-type EN10MB (Ethernet), capture size 262144 bytes
23:47:41.898750 IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
23:47:42.181745 IP6 :: > ff02::1:ff59:a2f3: ICMP6, neighbor solicitation, who has overcloud-compute-0, length 24
23:47:43.183749 IP6 overcloud-compute-0 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
23:47:43.745746 IP6 overcloud-compute-0 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
23:47:47.490332 ARP, Request who-has 192.0.2.9 tell controller-0.storage.localdomain, length 46
23:47:48.492499 ARP, Request who-has 192.0.2.9 tell controller-0.storage.localdomain, length 46
23:47:49.494479 ARP, Request who-has 192.0.2.9 tell controller-0.storage.localdomain, length 46
23:47:50.496537 ARP, Request who-has 192.0.2.9 tell controller-0.storage.localdomain, length 46
23:47:51.498487 ARP, Request who-has 192.0.2.9 tell controller-0.storage.localdomain, length 46
23:47:51.621942 IP gateway > 224.0.0.1: igmp query v2
10 packets captured
10 packets received by filter
0 packets dropped by kernel
[root@overcloud-compute-0 ~]#

Comment 6 Vijay Chundury 2017-11-16 09:32:04 UTC
Kuba (jlibosva@redhat.com) from Neutron team 
on IRC:
vijaykc4: avivgt I read the BZ description, if they use port security, we explicitly ban running DHCP server on openstack instance.

Aviv, 
Can you check on this point.

Comment 7 Aviv Guetta 2017-11-16 11:54:43 UTC
Vijay, 
We checked the port-security question during the remote session.
It's disabled.
Aviv

Comment 8 Yariv 2017-11-16 14:36:01 UTC
Can you upload DPDK relevant yaml files for VXLAN

Comment 9 David Hill 2017-11-16 14:40:48 UTC
Are you talking about the deployment templates yaml files?

Comment 14 Vijay Chundury 2017-11-17 17:35:14 UTC
Terry,
Kuba got the port_security_enabled=False checked and it is disabled so DHCP is on.

Comment 39 Yariv 2017-11-29 13:21:01 UTC
Aviv 
can you share guest VM vendor and version
cat /proc/version
cat /etc/*-release

Comment 46 Franck Baudin 2017-12-01 13:38:51 UTC
Just tested on my setup: you need to have one PMD thread at least on each NUMA socket. I have forced the use of numa  node 1 for the VM (vcpu_pin_set) and not started any PMD on the numa socket 1 => no connectivity. 

Adding a PMD thread on numa socket 1 solve the issue (need to destroy the faulty VM and restart a new one). Bottom line: just launch at least one PMD per NUMA socket if you include both socket in vcpu_pin_set. If this behavior is confirmed, we will need to add a warning in the product documentation.

Comment 47 Andreas Karis 2017-12-01 17:21:13 UTC
franck, isn't that this one here? https://access.redhat.com/solutions/3226511

Comment 48 Andreas Karis 2017-12-01 17:32:54 UTC
Yep so my issue was similar. I was isolating all cores on a NUMA node and then cross NUMA traffic wouldn't work :-) 
https://bugzilla.redhat.com/show_bug.cgi?id=1506031


Note You need to log in before you can comment on or make changes to this bug.