Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 955429 - displayNetwork must have an IP address on host
Summary: displayNetwork must have an IP address on host
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.1.3
Hardware: x86_64
OS: Linux
low
low
Target Milestone: ---
: 3.4.0
Assignee: Moti Asayag
QA Contact: GenadiC
URL:
Whiteboard: network
Depends On:
Blocks: rhev3.4beta 1142926
TreeView+ depends on / blocked
 
Reported: 2013-04-23 03:28 UTC by Kevein Liu
Modified: 2018-12-04 15:15 UTC (History)
23 users (show)

Fixed In Version: av3
Doc Type: Bug Fix
Doc Text:
Previously, virtual machines failed to start due to "libvirtError: internal error ifname "vnet20" not in key". This happened because the display network to which the virtual machine was assigned did not have an IP address configured on the host. Now, the engine blocks "setupNetwork" of a display network with no address, and the scheduler will attempt to start virtual machines only on a host on which the display network is configured with an IP address.
Clone Of:
Environment:
Last Closed: 2014-06-09 14:58:53 UTC
oVirt Team: Network
Target Upstream Version:


Attachments (Terms of Use)
Problematic OVF file (deleted)
2013-04-24 10:03 UTC, Kevein Liu
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 355283 None None None Never
Red Hat Product Errata RHSA-2014:0506 normal SHIPPED_LIVE Moderate: Red Hat Enterprise Virtualization Manager 3.4.0 update 2014-06-09 18:55:38 UTC
oVirt gerrit 25016 None None None Never

Description Kevein Liu 2013-04-23 03:28:05 UTC
Description of problem:

When trying to start a VM, it failed with the following error:
~~~
Thread-314::ERROR::2013-04-22 08:27:18,770::vm::680::vm.Vm::(_startUnderlyingVm) vmId=`cb5e3ac6-f351-4708-9729-c5287f991783`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 642, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 1475, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 83, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2645, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: internal error ifname "vnet20" not in key map
~~~

Version-Release number of selected component (if applicable):
* libvirt-0.10.2-18.el6_4.2.x86_64
* vdsm-4.10.2-1.8.el6ev.x86_64
* rhevm-3.1.0-43.el6ev.noarch

How reproducible:
Since the issue happened occasionally, so we don't know how to reproduce it at this time.

Steps to Reproduce:
N/A
  
Actual results:
VM failed to start 

Expected results:
VM should start successfully.

Additional info:
Customer has uploaded the Log-collector to our ftp server:
  ftp://dropbox.redhat.com/sosreport-LogCollector-m2-20130422090712-b1c2.tar.xz

Comment 2 yanbing du 2013-04-23 09:49:54 UTC
Hi Kevein,
I'm trying to reproduce this bug.
As no specific reproduce steps, i just trying to start a vm which has more than 20 nic in RHEVM, but can reproduce it. :(
I'm using:
# rpm -q libvirt
libvirt-0.10.2-18.el6_4.4.x86_64
# rpm -q vdsm
vdsm-4.10.2-1.9.el6ev.x86_64

I just attached the vm xml file, and could your help to check it? 
BTW, could you please provide the domain xml which encounter this bug?
Thanks!

Comment 3 Michal Privoznik 2013-04-23 13:39:30 UTC
Unfortunately, I cannot access the logs. Kevein, can you please attach logs to the BZ? Hopefully, I will get more insight from the logs.

Comment 10 Kevein Liu 2013-04-24 10:03:03 UTC
Created attachment 739374 [details]
Problematic OVF file

Comment 12 Michal Privoznik 2013-04-24 13:21:33 UTC
I've managed to find and dig out libvirt logs. Here's a short snippet which is causing the trouble:

2013-04-22 00:25:15.071+0000: 43401: error : virNetDevGetIPv4Address:834 : Unable to get IPv4 address for interface vlan111: Cannot assign requested address
2013-04-22 00:25:15.071+0000: 43401: debug : virFileClose:72 : Closed fd 159
2013-04-22 00:25:15.071+0000: 43401: error : qemuBuildCommandLine:6130 : XML error: listen network 'vdsm-vlan111' had no usable address
2013-04-22 00:25:15.071+0000: 43401: error : virNWFilterDHCPSnoopEnd:2131 : internal error ifname "vnet20" not in key map

So there are two problems:
1) we are overwriting previously reported error
2) why doesn't "vdsm-vlan111" have any usable address

For the first problem I've just posted a patch:

https://www.redhat.com/archives/libvir-list/2013-April/msg01738.html

For the second problem, unfortunately, there's not an XML of the network in the logs so I don't know why it doesn't have any usable address. Kevein, Dan and others - do you have any bright idea in case 'virsh net-dumpxml vdsm-vlan111' doesn't work (even if it does - are we guaranteed it is the very same network?). I think the best solution is to gather logs immediately when the error occurs again. And by logs I mean not only libvirt/vdsm logs, but routing table, iptables, ebtables listings as well.

Comment 17 Dan Kenigsberg 2013-04-25 10:23:36 UTC
(In reply to comment #16)
> 
> On[e] thing is vlan111 doesn't have any IPv4 address
> assigned, the other is if it should have one. But I think, once we find
> setupNetwork we will know the XML immediately, isn't that right Dan?

Correct. My guess is that the vlan111 network was configured on host with no IP address. The problem is twofold:

1. Engine should block setupNetwork of display network with no address.

2. Engine should avoid starting VMs whose displayNetwork has no IP address on hosts that somehow lost their address (i.e. bad dhcp server)

> 
> BTW any reason for these comments to be private?

Privacy is viral :-(

Comment 21 Kevein Liu 2013-04-26 07:01:49 UTC
Hi,

This issue happened again, could anyone provide a check list that I can get those information for further investigation?

Thank you!

Comment 22 Dan Kenigsberg 2013-04-27 20:14:23 UTC
(In reply to comment #21)
> This issue happened again, could anyone provide a check list that I can get
> those information for further investigation?

What is their cluster displayNetwork? Still this vlan111 network? What is the IP configuration for this network on the cluster hosts (static/dhcp/none)? And in particular, on the host that fails to start the VM?

The admin has to ensure that the displayNetwork has an IP address on each and every host.

Comment 29 Michal Privoznik 2013-04-29 10:41:07 UTC
Mark et all,

I am still not fully convinced where the real bug is. I know we've moved from libvirt to ovirt-engine, but I'd like to be 100% sure. Which means, we need logs from network setup process. Do you think it is possible to gather logs from setupNetwork command in vdsm.log? I still can't find it anywhere. I know we have thousands of logs here, but none of them contains that kind of info.

I just want to make sure somebody really did started a network without an IP address. The other possibility is, the network was stated with an IP address assigned, but something has taken it away. Either libvirt itself, or ...

Comment 30 Mark Huth 2013-04-30 01:00:52 UTC
Hi Michal,

I did a grep of all the vdsm.log* files on the hypervisor and setupNetwork wasn't matched in any of them (and I'm sure the relevant logs hadn't been rotated away).

From the rhev-prio-list "New critical issues from China Zhuji" email thread ...

<thread>
> Then customer found the "rhevm" is not the Display network, and made
> it as Display network. So the issue was repaired.

So if I understand correctly the original conclusion of engineering is
correct and what should be fixed is to avoid (and ATM I don't say how)
to run Virtual Machines on a host that has it's Display-Network does
not have an IP + inform this problematic status to the user.
</thread>

We are not sure how the display network was changed in the RHEVM WebUI.  Customer is sure they didn't change it away from rhevm, but when they changed it back to the rhevm network, the problem was resolved (ie VMs could be started again).  So it seems the problem was on the RHEVM in that it was passing the wrong displayNetwork to the hypervisor thus preventing VMs from starting.

I hope that information is helpful.  If not, please let me know.

-- Mark

Comment 31 Moti Asayag 2013-08-22 12:56:26 UTC
(In reply to Dan Kenigsberg from comment #17)
> (In reply to comment #16)
> > 
> > On[e] thing is vlan111 doesn't have any IPv4 address
> > assigned, the other is if it should have one. But I think, once we find
> > setupNetwork we will know the XML immediately, isn't that right Dan?
> 
> Correct. My guess is that the vlan111 network was configured on host with no
> IP address. The problem is twofold:
> 
> 1. Engine should block setupNetwork of display network with no address.
> 

This can be achieved by requiring Static or DHCP boot protocol for the display network, I'd suggest also to require it on the attach/update network api of the engine which seems to be commonly used by customers.

> 2. Engine should avoid starting VMs whose displayNetwork has no IP address
> on hosts that somehow lost their address (i.e. bad dhcp server)
> 

This is more tricky since the notorious race between the response from the DHCP server to the getVdsCaps after the network command is completed may occur and we might block the operation when we shouldn't. We can however consider a specific refreshCapilities call to verify the actual address of the display network if not exist. However, running multiple VMs are once will cost more resources from the host for that. When Bug 999947 will be implemented, it will be simpler to validate the display network is properly configured with IP address.

> > 
> > BTW any reason for these comments to be private?
> 
> Privacy is viral :-(

Comment 32 lpeer 2013-08-27 11:44:33 UTC
lowering priority given comment#31.
Following best practice to have ip address on the display network should work.

Comment 33 Moti Asayag 2014-03-09 13:57:39 UTC
With the suggested fix, a host which has no boot protocol configured for its display network, will be selected by the scheduler to run vms.

Comment 34 GenadiC 2014-03-18 09:28:48 UTC
Verified in AV3 that the VM is started when the display network has configured IP and fail on Can do action when the display network doesn't have IP

Comment 35 errata-xmlrpc 2014-06-09 14:58:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-0506.html


Note You need to log in before you can comment on or make changes to this bug.