Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1115420 - Network connectivity get lost on the hypervisor host adding it to a cluster if NetworkManager is running
Summary: Network connectivity get lost on the hypervisor host adding it to a cluster i...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: oVirt
Classification: Retired
Component: vdsm
Version: 3.5
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.5.0
Assignee: Antoni Segura Puimedon
QA Contact: Gil Klein
URL:
Whiteboard: network
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-02 10:29 UTC by Simone Tiraboschi
Modified: 2016-02-10 19:36 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-07-08 13:01:47 UTC
oVirt Team: Network


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1001186 None None None Never
Red Hat Bugzilla 1124876 None None None Never

Internal Links: 1001186 1124876

Description Simone Tiraboschi 2014-07-02 10:29:57 UTC
Description of problem:
On a fully virtualized test plant (two f19 VM on KVM with nested virtualization, one for the engine and the second as nested hypervisor host, one one network interface on any host) network connectivity get lost on the hypervisor host adding it to a cluster.

On the ovirt engine console everything seams to go well till "Starting vdsm" than, after a long timeout, it adds "Processing stopped due to timeout" and "SSH session timeout host 'root@f19td5'"

Now if I check the hypervisor host (f19td5 on my test plant) is not anymore reachable from network.
Checking the status of that host from the spice console of the exterior KVM I found that ifconfig command reports just about the loopback interface.

Ethernet interface seams missing. The same after a reboot.

'/bin/systemctl status network' reports:
network.service - LSB: Bring up/down networking
   Loaded: loaded (/etc/rc.d/init.d/network)
   Active: failed (Result: exit-code) since Wed 2014-07-02 11:57:03 CEST; 28min ago

Jul 02 11:57:03 f19td5.localdomain systemd[1]: Starting LSB: Bring up/down networking...
Jul 02 11:57:03 f19td5.localdomain network[2202]: Bringing up loopback interface:  [  OK  ]
Jul 02 11:57:03 f19td5.localdomain network[2202]: Bringing up interface eth0:  ERROR    : [/etc/sysconfig...ing.
Jul 02 11:57:03 f19td5.localdomain network[2202]: [FAILED]
Jul 02 11:57:03 f19td5.localdomain systemd[1]: network.service: control process exited, code=exited status=1
Jul 02 11:57:03 f19td5.localdomain systemd[1]: Failed to start LSB: Bring up/down networking.
Jul 02 11:57:03 f19td5.localdomain systemd[1]: Unit network.service entered failed state.



Version-Release number of selected component (if applicable):

On the engine host:
ovirt-engine.noarch                                                               3.5.0-0.0.master.20140629172304.git0b16ed7.fc19                                                               @ovirt-3.5-pre

on the hypervisor host:
vdsm.x86_64                            4.14.8.1-0.fc19                  @updates


How reproducible:
I tried more than one time always with the same result. 100% at least on my perspective.


Steps to Reproduce:
1. Install ovirt engine on a fresh system 
2. Try to add an hypervisor host
3.

Actual results:
Network connectivity get lost and and the host is not added to the cluster

Expected results:
Host becomes part of the cluster

Additional info:

Comment 1 Dan Kenigsberg 2014-07-07 23:29:22 UTC
Have you disabled NetworkManager? In f19 (and f20) it still tries to take over any network device.

Please try again after having run

    /usr/bin/systemctl stop NetworkManager.service
    /usr/bin/systemctl mask NetworkManager.service

on the nodes to be added.

If this is not the case, please attach the output of

    bash -xv /etc/sysconfig/network-scripts/ifup-eth eth0

to understand why this fails.

Comment 2 Simone Tiraboschi 2014-07-08 07:55:57 UTC
Yes, I think it was indeed enabled:

[stirabos@f19t2 ~]$ /usr/bin/systemctl status NetworkManager.service
NetworkManager.service - Network Manager
   Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; disabled)
   Active: inactive (dead)


I'll try again with a fresh VM disabling it.

Comment 3 Dan Kenigsberg 2014-07-08 09:43:51 UTC
Please reopen the bug if it's not all about NetworkManager (which should be harmless to Vdsm beginning Fedora 21)

Comment 4 Simone Tiraboschi 2014-07-08 11:56:35 UTC
Fedora 20, disabling NetworkManager before trying to add it, works correctly.

but what to do in the mean time? Fedora 20 uses NetworkManager by default.

Comment 5 Sandro Bonazzola 2014-07-08 12:03:34 UTC
I think that this is a regression: previously having NetworkManager running didn't cause any issue. And also if in F21 it will be harmless, we're not supporting F21, we're supporting F19 and F20. And there it's an issue.

If NetworkManager must be stopped, vdsm should ensure it's stopped or if not vdsm at least host-deploy.

I don't think this can be covered only by release note.

Comment 6 Dan Kenigsberg 2014-07-08 13:01:47 UTC
This is not a regression. We could never install vdsm (or setup networking in other circumstances) while NetworkManager was running. Unless configured otherwise (which should be available in F20, not only F21) NM auto-manages any new device, and takes it down.

https://bugzilla.redhat.com/show_bug.cgi?id=1001186#c14

Comment 7 Simone Tiraboschi 2014-07-09 07:27:08 UTC
I think that in such case at least host-deploy should detect NetworkManager in order to abort alerting the user to stop it before trying again.
Now it doesn't provide any hint to the user and it results in a not working network configuration. If the host is remote is always a mess.

Comment 8 Simone Tiraboschi 2014-07-09 07:32:58 UTC
By the way, no problem on my side to close it on VDSM front, but at least we should solve it on host-deploy side.

Comment 9 Dan Kenigsberg 2014-07-25 14:57:22 UTC
Simone, you can re-open, and change component, but I am not sure that we'd have resources to fix a f19-only bug.

Comment 10 Simone Tiraboschi 2014-07-25 15:00:24 UTC
Unfortunately we got the same behavior on RHEL7, Centos7 and f20.

Comment 11 Dan Kenigsberg 2014-07-30 08:51:13 UTC
Do you have NetworkManager-config-server installed on these hosts?

Comment 12 Simone Tiraboschi 2014-07-30 15:02:15 UTC
No, I don't: I just discovered now this pkg.

This morning sbonazzo told me that he got it working on centos7 with NetworkManager simply enforcing 'NM_CONTROLLED=no' on the network-script of the physical interface before starting engine-setup.
I didn't try with that.


Note You need to log in before you can comment on or make changes to this bug.