Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1063545 - NetworkManager-0.9.9.0-29.git20140131.fc20 breaks virbr0 handling
Summary: NetworkManager-0.9.9.0-29.git20140131.fc20 breaks virbr0 handling
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: NetworkManager
Version: 20
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Dan Williams
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1064441
TreeView+ depends on / blocked
 
Reported: 2014-02-11 00:43 UTC by James Ralston
Modified: 2018-04-11 15:10 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 1063290
Environment:
Last Closed: 2014-02-21 23:17:19 UTC


Attachments (Terms of Use)

Description James Ralston 2014-02-11 00:43:35 UTC
+++ This bug was initially created as a clone of Bug #1063290 +++

I have a bridge set up in NetworkManager that has been working for months. When I started up my machine this morning, it failed to start up properly. The physical interface that was slaved to the bridge started up instead and picked up an IPv6 address via SLAAC, but did not configure a DHCP address. em1 should be slaved to br0, but this is what it looked like this morning after booting:

br0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::5432:f1ff:fe22:4a70  prefixlen 64  scopeid 0x20<link>
        ether 56:32:f1:22:4a:70  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 8  bytes 648 (648.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

em1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::3a60:77ff:fe93:a95d  prefixlen 64  scopeid 0x20<link>
        inet6 2001:470:8:d63:3a60:77ff:fe93:a95d  prefixlen 64  scopeid 0x0<global>
        ether 38:60:77:93:a9:5d  txqueuelen 1000  (Ethernet)
        RX packets 14  bytes 1002 (1002.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 7  bytes 614 (614.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 20  memory 0xfe300000-fe320000  

...if I then do a "systemctl restart NetworkManager", it eventually configures itself properly:

br0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.3  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 2001:470:8:d63:5432:f1ff:fe22:4a70  prefixlen 128  scopeid 0x0<global>
        inet6 fe80::5432:f1ff:fe22:4a70  prefixlen 64  scopeid 0x20<link>
        ether 38:60:77:93:a9:5d  txqueuelen 0  (Ethernet)
        RX packets 170  bytes 52874 (51.6 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 197  bytes 37747 (36.8 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

em1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::3a60:77ff:fe93:a95d  prefixlen 64  scopeid 0x20<link>
        ether 38:60:77:93:a9:5d  txqueuelen 1000  (Ethernet)
        RX packets 217  bytes 60631 (59.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 234  bytes 41163 (40.1 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 20  memory 0xfe300000-fe320000  

...I did do a yum update yesterday and a few peripheral NM packages got updated, but nothing that looks like an obvious culprit:

Feb 09 15:30:47 Updated: libnm-gtk-0.9.9.0-8.git20140123.fc20.x86_64
Feb 09 15:31:09 Updated: nm-connection-editor-0.9.9.0-8.git20140123.fc20.x86_64
Feb 09 15:32:30 Updated: network-manager-applet-0.9.9.0-8.git20140123.fc20.x86_64

...I tried downgrading them but that didn't help. Attaching my /etc/sysconfig/network-scripts/ifcfg-* files

--- Additional comment from Jeff Layton on 2014-02-10 07:30:27 EST ---

fwiw...

# rpm -qa NetworkManager\*
NetworkManager-vpnc-0.9.8.2-2.fc20.x86_64
NetworkManager-glib-devel-0.9.9.0-28.git20131003.fc20.x86_64
NetworkManager-devel-0.9.9.0-28.git20131003.fc20.x86_64
NetworkManager-0.9.9.0-28.git20131003.fc20.x86_64
NetworkManager-vpnc-gnome-0.9.8.2-2.fc20.x86_64
NetworkManager-glib-0.9.9.0-28.git20131003.fc20.x86_64
NetworkManager-glib-0.9.9.0-28.git20131003.fc20.i686
NetworkManager-openconnect-0.9.8.0-2.fc20.x86_64
NetworkManager-pptp-gnome-0.9.8.2-3.fc20.x86_64
NetworkManager-openvpn-gnome-0.9.9.0-0.1.git20140128.fc20.x86_64
NetworkManager-openvpn-0.9.9.0-0.1.git20140128.fc20.x86_64
NetworkManager-pptp-0.9.8.2-3.fc20.x86_64

--- Additional comment from Jeff Layton on 2014-02-10 07:36:56 EST ---

Logs say this:

Feb 10 06:57:31 tlielax NetworkManager[880]: <info> (br0): carrier is OFF
Feb 10 06:57:31 tlielax NetworkManager[880]: <error> [1392033451.208423] [platform/nm-linux-platform.c:1338] sysctl_get(): error reading /sys/class/net/br0/phys_port_id: Failed to read from file '/sys/class/net/br0/phys_port_id': Operation not supported
Feb 10 06:57:31 tlielax NetworkManager[880]: <info> (br0): new Bridge device (driver: 'bridge' ifindex: 3)
Feb 10 06:57:31 tlielax NetworkManager[880]: <info> (br0): exported as /org/freedesktop/NetworkManager/Devices/1
Feb 10 06:57:31 tlielax NetworkManager[880]: <info> (br0): No existing connection detected.
Feb 10 06:57:31 tlielax NetworkManager[880]: <info> (br0): device state change: unmanaged -> unavailable (reason 'managed') [10 20 2]
Feb 10 06:57:31 tlielax NetworkManager[880]: <info> (br0): bringing up device.
Feb 10 06:57:31 tlielax NetworkManager[880]: <warn> (br0): device not up after timeout!
Feb 10 06:57:31 tlielax NetworkManager[880]: <info> (em1): carrier is OFF
Feb 10 06:57:31 tlielax NetworkManager[880]: <error> [1392033451.249225] [platform/nm-linux-platform.c:1338] sysctl_get(): error reading /sys/class/net/em1/phys_port_id: Failed to read from file '/sys/class/net/em1/phys_port_id': Operation not supported
Feb 10 06:57:31 tlielax NetworkManager[880]: <info> (em1): new Ethernet device (driver: 'e1000e' ifindex: 2)
Feb 10 06:57:31 tlielax NetworkManager[880]: <info> (em1): exported as /org/freedesktop/NetworkManager/Devices/2
Feb 10 06:57:31 tlielax dbus[926]: [system] Activating service name='org.freedesktop.problems' (using servicehelper)
Feb 10 06:57:31 tlielax NetworkManager[880]: <info> (em1): No existing connection detected.
Feb 10 06:57:31 tlielax NetworkManager[880]: <info> (em1): device state change: unmanaged -> unavailable (reason 'managed') [10 20 2]
Feb 10 06:57:31 tlielax NetworkManager[880]: <info> (em1): bringing up device.

...and then a little later:

Feb 10 06:57:35 tlielax kernel: [   35.230530] e1000e: em1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Feb 10 06:57:35 tlielax kernel: [   35.230726] IPv6: ADDRCONF(NETDEV_CHANGE): em1: link becomes ready
Feb 10 06:57:35 tlielax NetworkManager[880]: <info> (em1): link connected
Feb 10 06:57:35 tlielax NetworkManager[880]: <info> (em1): device state change: unavailable -> disconnected (reason 'carrier-changed') [20 30 40]
Feb 10 06:57:35 tlielax NetworkManager[880]: <info> Auto-activating connection 'br0sl1'.
Feb 10 06:57:35 tlielax NetworkManager[880]: <info> Connection 'br0sl1' auto-activation failed: (1) No compatible disconnected device found for master connection 92ab87e2-4af7-41af-a963-a5fcb8954d34.
Feb 10 06:57:35 tlielax NetworkManager[880]: <info> startup complete

I'm not sure, but problem seems to be that em1 is picking up an IPv6 address when it should stay unconfigured and slaved to the bridge?

I'm really not sure what changed and caused this to start happening though.

--- Additional comment from James Ralston on 2014-02-10 14:01:57 EST ---

I'm seeing a similar problem with the updates-testing version of NetworkManager (1:0.9.9.0-29.git20140131.fc20).

The behavior I see is that NetworkManager stays in the "twirling" animation. If I mouse over it, it says:

    Requesting address for 'virbr0'...

If I tell NetworkManager to disconnect from virbr0, it takes the virbr0 interface down entirely, which makes it non-functional for virtual guests:

2014-02-10T13:56:58.985749-05:00 localhost dnsmasq-dhcp[2031]: DHCP packet received on virbr0 which has no address

If I then try to bring the interface back up via NetworkManger, I receive this error dialog:

    Connection activation failed

    (1) Creating object for path '/org/freedesktop/NetworkManager/ActiveConnection/2' failed in libnm-glib.

So it looks like the updates-testing version of NetworkManager has issues with bridge devices, including virbr0.

--- Additional comment from James Ralston on 2014-02-10 14:24:22 EST ---

If I revert to 1:0.9.9.0-28.git20131003.fc20, NetworkManager doesn't acknowledge that virbr0 exists: it doesn't appear in the "Network Connections" list.

I do note that there is no /etc/sysconfig/network-scripts/ifcfg-virbr0 script on my system. I wonder if creating it and adding "NM_CONTROLLED=no" to it will prevent git20140131 from [mis]managing virbr0.

(Even if it does, though, I don't think that's a realistic option: even if libvirtd is wrong for not telling NM to not manage virbr0, breaking everyone who is running libvirtd isn't acceptable.)

--- Additional comment from Jeff Layton on 2014-02-10 14:54:44 EST ---

That problem sounds different, but it may be due to similar causes. My bridge is set up to come up whether libvirtd is running or not (though I have it running to provide a bridge for guests to attach to). The problem you're having sounds more like some sort of strange interaction between libvirtd and NM. It may be best to open your problem as a separate bug since I'm not convinced they're the same issue. The bug owner can always collapse them together if they do turn out to be related.

--- Additional comment from Scott Shambarger on 2014-02-10 15:48:32 EST ---

FYI, you might try downgrading libnl3 to 3.2.21-2 -- my bridge started working again with the older libnl, so there's an incompatibility there someplace.

--- Additional comment from Orion Poplawski on 2014-02-10 16:08:16 EST ---

James - I saw some weirdness with virbr0 and NM until I restarted NM.  I'm not sure it is a good idea to display virbr0, but at least now the status shows as up/connected.

--- Additional comment from Thomas Haller on 2014-02-10 16:17:35 EST ---

I can reproduce this issue.

having the following two connections:


br0:[connection]
br0:id=br0
br0:uuid=7ecb25ca-f89e-44a9-869b-0c66ddba0cb8
br0:interface-name=br0
br0:type=bridge
br0:autoconnect=false
br0:
br0:[ipv6]
br0:method=auto
br0:
br0:[ipv4]
br0:method=auto
br0:
br0:[bridge]
br0:interface-name=br0
br0-em1:[connection]
br0-em1:id=br0-em1
br0-em1:uuid=bf8ec57d-c917-499b-b33f-5bb46581efc7
br0-em1:interface-name=em1
br0-em1:type=ethernet
br0-em1:autoconnect=false
br0-em1:master=br0
br0-em1:slave-type=bridge



The error happens when I `nmcli connection up br0-em1` and the bridge br0 does not exist yet.





Interestingly, the error does not happen with

  - NetworkManager-current-master
  - libnl3-3.2.22-2.fc20.x86_64
  - kernel-3.12.10-300.fc20.x86_64

In that case, I can up the bridge and down it successfully. Only afterwards, the em1 device has NO-CARRIER and state DOWN for about 10 minutes and the interface is effectively unusable, neither warm-reboot or rmmod helps. Only waiting or cold-reboot makes em1 usable again. I think, this is an unrelated bug kernel bug that I might file later.



The error with
<warn> (br0): device not up after timeout!
does however happen with

  - NetworkManager-current-master
  - libnl3-3.2.24-1.fc20.x86_64
  - kernel-3.12.10-300.fc20.x86_64

so, it seems the difference is libnl3, which just last week was pushed to F20/stable. Reassign bug to libnl3...

--- Additional comment from Jeff Layton on 2014-02-10 16:42:14 EST ---

(In reply to Scott Shambarger from comment #6)
> FYI, you might try downgrading libnl3 to 3.2.21-2 -- my bridge started
> working again with the older libnl, so there's an incompatibility there
> someplace.

Yep, thanks! Downgrading to libnl3-3.2.21-2.fc20 fixed the issue for me.

--- Additional comment from Scott Shambarger on 2014-02-10 16:43:04 EST ---

Working bridge (libnl3-3.2.21-2):

<info> (br0): bringing up device.
<debug> [1392067683.902405] [platform/nm-platform.c:767] nm_platform_link_set_up(): link: setting up 'br0' (5)
...
<debug> [1392067683.905160] [platform/nm-linux-platform.c:993] check_cache_items(): cache 0x7f1f28839600 object 0x7f1f2883a690
<debug> [1392067683.905190] [platform/nm-platform.c:2091] log_link(): signal: link changed: br0 (5)

Non-working bridge (libnl3-3.2.24-1):

<info> (br0): bringing up device.
<debug> [1392065455.670394] [platform/nm-platform.c:767] nm_platform_link_set_up(): link: setting up 'br0' (4)
<debug> [1392065455.670948] [platform/nm-platform.c:2091] log_link(): signal: link added: br0 (4)
<warn> (br0): device not up after timeout!

Note the difference.... "link changed" works, "link added" doesn't - could be the new netlink has changes how things are reported?

--- Additional comment from James Ralston on 2014-02-10 19:40:53 EST ---

Downgrading to libnl3-3.2.21-2.fc20 doesn't change the behavior of NetworkManager-0.9.9.0-29.git20140131.fc20 in beating on the virbr0 interface, so I agree that this is probably a separate issue. I'll open a separate bug report.

--- Additional comment from James Ralston on 2014-02-10 19:42:05 EST ---

@Orion (comment 7): I tried both restarting NetworkManager, and performing a full system reboot. NetworkManager's behavior was still broken.

Comment 1 James Ralston 2014-02-11 01:31:08 UTC
I also tried creating an /etc/sysconfig/network-scripts/ifcfg-virbr0 file and adding "NM_MANAGED=no" to it. It doesn't change NetworkManager's behavior.

Comment 2 James Ralston 2014-02-21 23:17:19 UTC
I've been testing NetworkManager-0.9.9.0-30.git20131003.fc20 the past few days, and it resolves this issue for me, so closing.


Note You need to log in before you can comment on or make changes to this bug.