Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 451787 - domU network broken when bridged to Intel 82575 nic
Summary: domU network broken when bridged to Intel 82575 nic
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.2
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Andy Gospodarek
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-06-17 12:39 UTC by David L. Parsley
Modified: 2014-06-29 23:00 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-02-24 18:38:48 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description David L. Parsley 2008-06-17 12:39:53 UTC
Description of problem:
I recently purchased a Dell 6950 w/ an Intel 82575 4-port add-in card, and
installed RHEL5.2 on it.  I found that when a Xen VM is bridged to one
of the Intel ports, networking for domU is somehow subtly broken.  For
instance, I can ping the VM, but if I try to ssh to it, the three-way handshake
completes, but then I start getting icmp unreachable from the VM.
Bridging to the onboard broadcom interface on the same VLAN, the VM
works fine.

Version-Release number of selected component (if applicable):
kernel-xen-2.6.18-92.1.1.el5
igb driver: 1.0.8-k2

NOTE: I've tried the latest driver from intel.com - 1.2.24, with the same results.

How reproducible:
Always

Steps to Reproduce:
1. Build a VM on a machine with the Intel 82575 nic
2. Bridge the VM to the Intel nic
3. Try ssh'ing into the VM, or running 'yum update' from the VM console
  
Actual results:
ssh, yum update fail

Expected results:
they should work

Additional info:
I can plug a broadcom NIC into a port on the same vlan, and the VM networking is
fine.

Comment 2 Herbert Xu 2008-07-01 13:07:55 UTC
Could you take a packet dump on peth0 on the host as well as eth0 on the guest
while this is all happening? Thanks!

Comment 3 David L. Parsley 2008-07-08 17:25:55 UTC
Sorry for not getting right back to you, but I got this response on the
e1000-devel list:

I sent a summary of this issue to the e1000-devel mailing list, and here's the
reply I got:
<quote>
from	Williams, Mitch A <mitch.a.williams@intel.com>
subject	RE: [E1000-devel] Issue with igb/82575 and Xen
	
This is a known issue, which will be fixed in the upcoming release.
You should be able to work around the issue by turning off TX
checksumming:
 $ ethtool -K <ifname> tx off
</quote>

So I tested a VM by bridging it to eth1; again the VM was pingable, but I
couldn't ssh into it.  Then, as root on the host machine: "ethtool -K peth1 tx
off".  After that, ssh into the VM worked fine, and yum update on the VM worked
fine.

So, how best to fix this for now?  I don't think I can use ETHTOOL_OPTS to do
this.  The RH driver is already behind the Intel version; will RH eventually
issue an errata kernel that includes the fix?

Comment 4 Daniel Berrange 2008-07-08 17:37:48 UTC
If this report is correct, then the e1000 driver needs to be fixed.  Turning off
TX checksumming has a significant, detrimental performance impact on Xen
networking, so isn't viable except as a short term workaround while the real
e1000 problem is fixed.


Comment 5 Matthias Saou 2008-11-09 21:36:49 UTC
FWIW, I've had to use this "workaround" for a while with the bnx2 driver (for the integrated ports many recent Dell servers) in order to have domUs of on the same dom0 be able to communicate with each other properly. I didn't know it also affected Intel chips/cards, though.

Comment 8 Andy Gospodarek 2009-02-10 15:53:44 UTC
David, have you tried the latest 5.3 kernels and see if it makes any difference? 

I don't know that it will since I don't see anything specific in the upstream changelog that indicates this is fixed, but if this was something that was going to be 'fxied in the upcoming release' in the July time-frame then there is certainly a chance that our pull to the latest upstream in late August caught this fix.

I also took a look at Intel's changelog for their last two out-of-tree driver updates and don't see anything indicating this was fixed already.

Comment 9 David L. Parsley 2009-02-10 16:20:39 UTC
Yes, with version 1.2.45-k2 of the igb driver, this issue has gone away, thanks.  I verified by bridging a non-production VM to an otherwise-unused Intel port.  It was reachable by both ping and ssh.  To double-check, I did 'ip link set peth1 down' (the Intel interface), and verified that the test VM was unavailable, where all other VMs were.  I also verified tx checksumming:
# ethtool -k peth1
Offload parameters for peth1:
Cannot get device udp large send offload settings: Operation not supported
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: off

I don't think I've missed anything in my testing, so I think we can close this.

Comment 10 Bill Burns 2009-02-24 18:38:48 UTC
Closing this per previous comment. Thanks Andy and David!


Note You need to log in before you can comment on or make changes to this bug.