Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1685631 - network adapter not activated with vlan
Summary: network adapter not activated with vlan
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: initscripts
Version: 7.6
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Lukáš Nykrýn
QA Contact: qe-baseos-daemons
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-05 17:02 UTC by f1outsourcing
Modified: 2019-03-15 11:46 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-15 11:46:01 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description f1outsourcing 2019-03-05 17:02:09 UTC
I don't know why, but this simple configuration should be able to be activated at boot but it is not.

After boot I have:

[@c04 ~]# ifconfig
o: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0


Then I can bring up the vlan with:

[@c04 ~]# ifup eth4.601
RTNETLINK answers: Numerical result out of range

[@c04 ~]# ifconfig
eth4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether e4:1d:2d:0c:3e:20  txqueuelen 1000  (Ethernet)
        RX packets 4798  bytes 378817 (369.9 KiB)
        RX errors 0  dropped 60  overruns 0  frame 0
        TX packets 55  bytes 8822 (8.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth4.601: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.0.14  netmask 255.255.255.0  broadcast 10.0.0.255
        ether e4:1d:2d:0c:3e:20  txqueuelen 1000  (Ethernet)
        RX packets 65  bytes 10368 (10.1 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 59  bytes 9456 (9.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0


I have no idea what this rtnetlink error is about. Also notice that the configuration files have both mtu 9000, while the activated adapters have 1500


[@c04 network-scripts]# cat ifcfg-eth4
DEVICE=eth4
ONBOOT=yes
BOOTPROTO=none
MTU=9000
USERCTL=no
IPV6INIT=no
HWADDR="x:x:x:x:x:x"

[@c04 network-scripts]# cat ifcfg-eth4.601
DEVICE="eth4.601"
VLAN=yes
IPADDR=10.0.0.14
NETMASK=255.255.255.0
NETWORK=10.0.0.0
BROADCAST=10.0.0.255
MTU=9000
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
IPV6INIT=no


[@c04 network-scripts]# cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)

[@c04 network-scripts]# lspci | grep -i mel
03:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

I don't want an ip on eth4 just on the vlan.

Comment 2 f1outsourcing 2019-03-05 17:18:29 UTC
If I bring up first eth4 and then eth4.601 there is no rtnetlink error and both adapters van mtu 9000. What is wrong with the configuration of eth4 that it does not want to start?

Comment 3 f1outsourcing 2019-03-05 21:11:51 UTC
if I configure eth0 with an ip address, eth4 and eth4.529 are up as expected.

Comment 4 Lukáš Nykrýn 2019-03-06 10:18:39 UTC
Can you reboot the machine, add rc.debug on kernel cmdline and post here the output of journalctl -b?

Comment 5 Lukáš Nykrýn 2019-03-06 10:20:56 UTC
> I have no idea what this rtnetlink error is about. Also notice that the
> configuration files have both mtu 9000, while the activated adapters have
> 1500

That is the message from kernel that it is unable to set the MTU to such value for the device.

Comment 6 f1outsourcing 2019-03-06 22:02:11 UTC
http://www.f1-outsourcing.eu/journalctl-eth4-boot-red.log

Sorry tried to delete some lines, but the output was still to much to paste here.

sed -i '/ceph/d' journalctl-eth4-boot-red.log
sed -i '/Ceph/d' journalctl-eth4-boot-red.log
sed -i '/scsi/d' journalctl-eth4-boot-red.log
sed -i '/libvirt/d' journalctl-eth4-boot-red.log
sed -i '/rsyslogd/d' journalctl-eth4-boot-red.log

Comment 7 f1outsourcing 2019-03-07 09:19:11 UTC
(In reply to Lukáš Nykrýn from comment #5)
> > I have no idea what this rtnetlink error is about. Also notice that the
> > configuration files have both mtu 9000, while the activated adapters have
> > 1500
> 
> That is the message from kernel that it is unable to set the MTU to such
> value for the device.

Yes, so what happens is that when starting eth4.529 it activates eth4 without reading and applying its config ifcfg-eth4(the config has mtu 9000). If it did read the configuration it would not have the mtu 1500

Comment 8 Lukáš Nykrýn 2019-03-07 12:02:21 UTC
So the problem is that that the driver is pretty slow to initialize and it creates the device after network-scripts were run. That explains the behaviour you saw, where things started to work when you also configured the eth0. That caused delay, so you did not hit the race.

The solution here is to ditch the network-scripts and use NetworkManager. NM handles such dynamic environment much better. Network-scripts are from the time where the boot was slow, and at the moment they run, they could assume that all HW was initialised. 

There is also one workaround you could use. The network-scripts have an undocumented option called DEVTIMEOUT, that will cause that initscripts will set a time in seconds the ifup should try to wait for the device.

Also regarding your configuration. Combination of HWADDR="x:x:x:x:x:x" and DEVICE=eth4 tells udev that the device with the specified mac address should be renamed to eth4 at boot. Since the eth is the prefix used by kernel every time a device appears, this could lead to races between udev and kernel on setups where you have multiple NIC. I would suggest using a different prefix there (like "DEVICE=net4").

Comment 9 f1outsourcing 2019-03-15 11:22:05 UTC
Thanks for the DEVTIMEOUT=5 hint, that helped indeed.

Still think that the basic configuration should would work. If it is of any interest, the specific card is this one, just using of course ethernet mode.
03:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

( Can't seem to get rid also of the infiniband crap that is being loaded constantly.
modprobe -r rpcrdma
modprobe -r rdma_ucm
modprobe -r ib_isert
modprobe -r ib_iser
modprobe -r ib_ucm
modprobe -r ib_uverbs
modprobe -r ib_srp
modprobe -r ib_srpt
modprobe -r rdma_cm
modprobe -r ib_ipoib
modprobe -r ib_cm
modprobe -r ib_umad
modprobe -r mlx4_ib
modprobe -r iw_cm )

Comment 10 Lukáš Nykrýn 2019-03-15 11:46:01 UTC
The problem is that networks scripts were written before hotplug was a thing, so they are not designed to handle today's dynamic use-cases. In rhel6 we workaround that the problem by introducing a udev rule that run ifup every time a device appeared, but that approach was buggy as hell.

I would really encourage you to try NetworkManager. I know it has a bad reputation from the past, but currently, it works great even on servers and if I remember correctly it is used on default setup.


Note You need to log in before you can comment on or make changes to this bug.