Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1513424 - bonding failover fails first time
Summary: bonding failover fails first time
Keywords:
Status: CLOSED DUPLICATE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel
Version: 7.4
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: rc
: ---
Assignee: Jarod Wilson
QA Contact: Rick Alongi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-15 11:46 UTC by Jeremy Harris
Modified: 2019-01-23 08:20 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-11-15 19:20:57 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1496837 None None None 2019-03-17 07:19:44 UTC

Description Jeremy Harris 2017-11-15 11:46:24 UTC
Description of problem:

 Active/backup bond, MII monitor, two slaves, both link-up according to
 /proc/net/bonding/bond2.  Virtual-cable pull of the active slave for testing
 failover  ("via HP virtual connect host network profile").
 Bond reports status "Currently Active Slave: None".

 This only happens first time after boot.  Subsequent failover tests work ok.
 An older kernel does not show the problem.

Version-Release number of selected component (if applicable):

Kernel           3.10.0-693.1.1.el7.x86_64
Bonding driver:  v3.7.1 (April 27, 2011)

How reproducible:

 100%

Steps to Reproduce:
1.  Boot
2.  Cable pull
3.  Observe no connectivity via bond

Actual results:

 Unusable network

Expected results:

 Usable network

Additional info:

 Original status:
# cat /proc/net/bonding/bond2
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eno53
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eno53
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 5c:b9:01:ca:d2:82
Slave queue ID: 0

Slave Interface: eno54
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 5c:b9:01:ca:d2:8a
Slave queue ID: 0

------------
 Status after cable pull:
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: None
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eno53
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 1
Permanent HW addr: 5c:b9:01:ca:d2:82
Slave queue ID: 0

Slave Interface: eno54
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 5c:b9:01:ca:d2:8a
Slave queue ID: 0

----------------
 Status after cable replacement:
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eno53
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eno53
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 5c:b9:01:ca:d2:82
Slave queue ID: 0

Slave Interface: eno54
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 5c:b9:01:ca:d2:8a
Slave queue ID: 0


------------
Debug info logged:
Nov 14 20:32:17 hebaip04pc01 kernel: bnx2x 0000:06:00.4 eno53: NIC Link is Down
Nov 14 20:32:17 hebaip04pc01 kernel: bond2: link status definitely down for interface eno53, disabling it
Nov 14 20:32:17 hebaip04pc01 kernel: device eno53 left promiscuous mode
Nov 14 20:32:17 hebaip04pc01 kernel: bond2: now running without any active interface!
Nov 14 20:32:17 hebaip04pc01 kernel: bond2: link status up again after 0 ms for interface eno54
Nov 14 20:32:18 hebaip04pc01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): bond2: link becomes ready
Nov 14 20:33:43 hebaip04pc01 kernel: bnx2x 0000:06:00.4 eno53: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit
Nov 14 20:33:43 hebaip04pc01 kernel: bond2: link status definitely up for interface eno53, 10000 Mbps full duplex
Nov 14 20:33:43 hebaip04pc01 kernel: bond2: making interface eno53 the new active one
Nov 14 20:33:43 hebaip04pc01 kernel: device eno53 entered promiscuous mode


------------
Kernel git info:
previously working was:  3.10.0-514.10.2.el7.x86_64
this, nonworking:        3.10.0-693.1.1.el7.x86_64 

The bond driver version is 3.7.1 for both, so the issue must be in a fix backported to it.  There are 70 commits in that period that
mention "bond", 31 that mention "bonding:".

Possibles:
0faebcc3bdaf0514e2ec69a8ed6e085e651ffecb bonding: fix active-backup transition
68b981419263e87cd3b4199fd1d843ecf4d2bec9 bonding: correctly update link status during mii-commit phase
175635fc649b1dc488d187c1ce2bb385ffd70954 bonding: improve link-status update in mii-monitoring
6c346434268f57726cf3c9db7c76b46f875a65c2 bonding: split bond_set_slave_link_state into two parts
0cf57683250ad6bc0895c692bcdca35f5120dfd1 bonding: implement lower state change propagation


Note You need to log in before you can comment on or make changes to this bug.