Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 157647 - tg3 network broken on some chipsets
Summary: tg3 network broken on some chipsets
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: John W. Linville
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-05-13 13:30 UTC by Doug Ledford
Modified: 2007-11-30 22:11 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-07-06 19:41:28 UTC


Attachments (Terms of Use)

Description Doug Ledford 2005-05-13 13:30:13 UTC
Description of problem:

Certain versions of tg3 chipset have problems with networking.  Problems exists
in both UP and SMP kernels.  Sample chipset problems are with the built in
Broadcom chipsets on Dell PE2650 machines.  Other tg3 chipsets, such as used in
the Netgear gigabit lan cards I have, don't exhibit the problem.

Version-Release number of selected component (if applicable):


How reproducible:

Every time.

Steps to Reproduce:
1. Install on a Dell PE2650
2. Attempt any meaningful network transfer
3. Watch with tcpdump on the other host, packets will be sent, but they won't be
properly received on the effected machines.
  
Actual results:
Packet loss, ICMP reassembly timeout messages, piss poor network performance,
generally really sucky networking.

Expected results:
Good gigabit network throughput

Additional info:
This is going through a Netgear gigabit ethernet switch.  Speed may play a
factor.  Tried both NFS via UDP and http via TCP.  Both sucked rocks.  Kernels
tested were 2.6.11-1_1284FC4 and 2.6.11-1_1284FC4smp.  Installed tree was the
re0503.0 tree.  I tried to copy an 80MB file via NFS v3 TCP mount from a RHEL3
server to the FC4 machine and these are the results:

[dledford@pe-fc4 ~]$ time cp /dist/FC4/i386/Fedora/base/stage2.img /tmp


real    7m45.691s
user    0m0.008s
sys     0m0.301s

(this was an interrupted copy, it didn't finish)

The tcpdump log on the two machines showed lots of these entries:

(From RHEL3 server)
09:12:37.934096 dledford.xsintricity.com.nfs > pe.xsintricity.com.1390227174:
reply ERR 1448 (DF)
09:12:37.934225 pe.xsintricity.com.796 > dledford.xsintricity.com.nfs: . ack
37649 win 32447 <nop,nop,timestamp 696056 134637964,nop,nop,sack sack 1
{39097:40545} > (DF)
09:12:38.142921 dledford.xsintricity.com.nfs > pe.xsintricity.com.2638413729:
reply ERR 1448 (DF)
09:12:38.143063 pe.xsintricity.com.796 > dledford.xsintricity.com.nfs: . ack
40545 win 32580 <nop,nop,timestamp 696265 134637985> (DF)
09:12:38.143105 dledford.xsintricity.com.nfs > pe.xsintricity.com.581951787:
reply ERR 1448 (DF)
09:12:38.143117 dledford.xsintricity.com.nfs > pe.xsintricity.com.3204149527:
reply ERR 1448 (DF)
09:12:38.143146 pe.xsintricity.com.1811703941 > dledford.xsintricity.com.nfs:
144 read [|nfs] (DF)
09:12:38.182803 pe.xsintricity.com.796 > dledford.xsintricity.com.nfs: . ack
41993 win 32580 <nop,nop,timestamp 696305 134637985> (DF)
09:12:38.182853 dledford.xsintricity.com.nfs > pe.xsintricity.com.3725932352:
reply ERR 1448 (DF)
09:12:38.182866 dledford.xsintricity.com.nfs > pe.xsintricity.com.2398206208:
reply ERR 1448 (DF)

(From FC4 client)
09:12:38.314218 IP dledford.xsintricity.com.nfs > pe.xsintricity.com.2829902664:
reply ERR 1448
09:12:38.314232 IP pe.xsintricity.com.796 > dledford.xsintricity.com.nfs: . ack
1251073 win 32447 <nop,nop,timestamp 705569 134638915,nop,nop,sack sack 1
{1252521:1255417} >
09:12:38.314258 IP dledford.xsintricity.com.nfs > pe.xsintricity.com.3935615357:
reply ERR 1448
09:12:38.314270 IP pe.xsintricity.com.796 > dledford.xsintricity.com.nfs: . ack
1251073 win 32447 <nop,nop,timestamp 705569 134638915,nop,nop,sack sack 1
{1252521:1256865} >
09:12:38.314357 IP dledford.xsintricity.com.nfs > pe.xsintricity.com.477524051:
reply ERR 1448
09:12:38.314371 IP pe.xsintricity.com.796 > dledford.xsintricity.com.nfs: . ack
1251073 win 32447 <nop,nop,timestamp 705569 134638915,nop,nop,sack sack 1
{1252521:1258313} >
09:12:38.314403 IP dledford.xsintricity.com.nfs > pe.xsintricity.com.2152342361:
reply ERR 1448
09:12:38.314442 IP pe.xsintricity.com.796 > dledford.xsintricity.com.nfs: . ack
1258313 win 32580 <nop,nop,timestamp 705569 134638915>

Looks like it might possibly be a hardware checksumming problem.  These are the
relevant dmesg lines about the tg3 adapter in use on the client:

tg3.c:v3.25 (March 24, 2005)
ACPI: PCI Interrupt 0000:04:06.0[A] -> GSI 28 (level, low) -> IRQ 217
eth0: Tigon3 [partno(BCM95701A10) rev 0105 PHY(5701)] (PCIX:133MHz:64-bit)
10/100/1000BaseT Ethernet 00:06:5b:3f:c0:8c
eth0: RXcsums[1] LinkChgREG[1] MIirq[1] ASF[0] Split[0] WireSpeed[1] TSOcap[0]

These are the tg3 detection messages from the RHEL3 server:

tg3.c:v3.22RH (February 11, 2005)
PCI: Assigned IRQ 5 for device 00:0a.0
divert: allocating divert_blk for eth1
eth1: Tigon3 [partno(AC91002A1) rev 0105 PHY(5701)] (PCI:33MHz:32-bit)
10/100/1000BaseT Ethernet 00:09:5b:8c:86:da
eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[0]

Comment 1 John W. Linville 2005-05-18 13:57:45 UTC
Doug, latest rawhide has tg3 v3.27...could you give that a try as well? 

Comment 3 Dave Jones 2005-06-27 23:14:42 UTC
Mass update for bugs reported against -test:
Updating version field to FC4 final. Please retest with final FC4 release if you
have not already done so. Thanks.

Comment 4 John W. Linville 2005-07-06 19:41:28 UTC
Closing due to lack of response.  Please re-open if this continues to be a 
problem...thanks! 


Note You need to log in before you can comment on or make changes to this bug.