Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 223505 - LSPP: tcpdump crashes kernel and system goes into debugger.
Summary: LSPP: tcpdump crashes kernel and system goes into debugger.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.0
Hardware: ppc64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Herbert Xu
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: RHEL5LSPPCertTracker
TreeView+ depends on / blocked
 
Reported: 2007-01-19 19:35 UTC by Joy Latten
Modified: 2007-11-30 22:07 UTC (History)
11 users (show)

Fixed In Version: 5.0.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-02-13 17:01:27 UTC
Target Upstream Version:


Attachments (Terms of Use)
diff between 3013 and 3014 (deleted)
2007-01-23 17:12 UTC, Eric Paris
no flags Details | Diff
[PACKET]: Fix skb->cb clobbering between aux and sockaddr (deleted)
2007-01-23 23:46 UTC, Herbert Xu
no flags Details | Diff

Description Joy Latten 2007-01-19 19:35:19 UTC
Description of problem:
Just issuing, "tcpdump" or "tcpdump -i eth0" in lspp 63 kernel
causes the kernel to crash and system goes into debugger.

Version-Release number of selected component (if applicable):
tcpdump-3.9.4-8.1

How reproducible:
Happens every time. 

Steps to Reproduce:
1. tcpdump -i eth0 OR tcpdump
  
Actual results:

tcpdump -i eth0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes

(a few packes are picked up)
...
...
Unable to handle kernel paging request for instruction fetch
Faulting instruction address: 0x002d1694
cpu 0x0: Vector: 400 (Instruction Access) at [c00000000277bb10]
    pc: 00000000002d1694
    lr: 00000000002d1694
    sp: c00000000277bd90
   msr: 8000000040009032
  current = 0xc0000000029122f0
  paca    = 0xc000000000464300
    pid   = 1701, comm = tcpdump
enter ? for help
0:mon>t
0:mon> r
R00 = 00000000002d1694   R16 = 0000000000000000
R01 = c00000000277bd90   R17 = 0000000000000000
R02 = c000000000579640   R18 = 00000000ffffffff
R03 = 00000000000000c8   R19 = 0000000000000000
R04 = c00000000277bcd8   R20 = 000000001008dd64
R05 = 0000000000000004   R21 = 00000000100b0000
R06 = 0000000000000000   R22 = 00000000100b0000
R07 = 0000000000000001   R23 = 00000000fd31fe3b
R08 = 000000c80000000e   R24 = c00000000fd50688
R09 = c000000002778000   R25 = c00000000fd50750
R10 = 0000000000000000   R26 = c00000000fd50880
R11 = 0000000000000000   R27 = 0000000000000000
R12 = 000c00010000a8c0   R28 = 0000000000000000
R13 = c000000000464300   R29 = 0000000000000000
R14 = 0000000000000000   R30 = c00000000050bed0
R15 = 0000000000000000   R31 = c00000000f6fbb80
pc  = 00000000002d1694
lr  = 00000000002d1694
msr = 8000000040009032   cr  = 24022482
ctr = 0000000000000000   xer = 0000000000000000   trap =  400
0:mon>

Expected results:
Don't expect to see kernel debugger. :-)

Additional info:
uname -a
Linux XXXXXXXX 2.6.18-1.3015.2.1.el5.lspp.63 #1 SMP Mon Jan 15 16:51:12 EST 2007
ppc64 ppc64 ppc64 GNU/Linux

I think this may be a kernel issue. 
The same machine is installed with 2.6.18-1.3002.el5 kernel, and 
tcpdump works fine when using this kernel.

Comment 2 Linda Wang 2007-01-22 20:26:58 UTC
Can someone verify that the tcpdump work on other ethernet adapter? 
Also, what networking driver/adapter is eth0 attached to?


Comment 3 Joy Latten 2007-01-22 21:44:29 UTC
This occurs on an lpar which is using ibmveth driver, that is it is a virtual
ethernet. 

Comment 4 Tim Burke 2007-01-23 15:30:08 UTC
Just so we understand this correctly.... is the original problem description
stating that this works fine on stock RHEL5RC, but fails on the LSPP specific
kernel?


Comment 5 Linda Wang 2007-01-23 15:35:55 UTC
The last ibmveth change went in on 1.2789.el5 for rhel5, is tcpdump worked on
prior kernels?  i.e. beta2 kernel, etc.  

Comment 6 Eric Paris 2007-01-23 16:04:15 UTC
tcpdump -i eth0 caused a panic on a Cell architecture blade after about
receiving 8 packets.  This was running 2.6.18-4.el5.  Will attempt to switch to
the kernel mentioned in comment #5 and look for an difference.

Comment 7 Eric Paris 2007-01-23 16:13:26 UTC
2.6.18-1.2767.el5 appears to work correctly and without issue

Comment 8 Eric Paris 2007-01-23 16:22:09 UTC
2.6.18-1.2789.el5 also worked fine.  Still working to isolate the probomatic patch.

Comment 9 Eric Paris 2007-01-23 16:40:05 UTC
panic was introduced somewhere between 1.3002.el5 and 1.3014.el5

Comment 10 Eric Paris 2007-01-23 16:47:10 UTC
even better, appears to work fine on 1.3013.el    so problem must be between
3013 and 3014

Comment 11 Eric Paris 2007-01-23 17:10:46 UTC
I'm going to go back a reverify my work that this patch is the problem but the
differences between 3013 and 3014 seem to be a result of 

Related: rhbz#219681 - xen dhcp patch has a new fix for a missing prototype,
round 2.

Adding Herbert to the CC since I believe it is his patch.  This appears to work
just fine on x86/x86_64 however on ppc64 it goes boom.

Comment 12 Eric Paris 2007-01-23 17:12:07 UTC
Created attachment 146320 [details]
diff between 3013 and 3014

Comment 14 James Morris 2007-01-23 20:53:32 UTC
When you're in the debugger, can you get a backtrace of the crash?

Comment 15 Eric Paris 2007-01-23 21:00:11 UTC
No.  Below is what I get.  You can easily access the machine I'm doing this on
internally.

[root@ibm-cell-01 ~]# tcpdump -i eth0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
15:33:13.867488 arp who-has frodo.lab.boston.redhat.com tell
i386-5as.lab.boston.redhat.com
15:33:13.870286 arp who-has frodo.lab.boston.redhat.com tell
ibm-cell-01.lab.boston.redhat.com
15:33:13.872051 IP squad5-lp1.lab.boston.redhat.com >
ibm-cell-01.lab.boston.redhat.com: ICMP echo request, id 27000, seq 44259, length 64
15:33:13.872080 IP ibm-cell-01.lab.boston.redhat.com >
squad5-lp1.lab.boston.redhat.com: ICMP echo reply, id 27000, seq 44259, length 64
15:33:13.882778 arp reply frodo.lab.boston.redhat.com is-at 00:08:02:46:ea:e9
(oui Unknown)
15:33:13.882792 IP ibm-cell-01.lab.boston.redhat.com.cap >
frodo.lab.boston.redhat.com.domain:  22026+ PTR? 10.76.168.192.in-acpu 0x1:
Vector: 700 (Program Check) at [c00000001b023b10]
    pc: c000000000940004
    lr: c000000000940000
    sp: c00000001b023d90
   msr: 9000000000089032
  current = 0xc000000001f5cb40
  paca    = 0xc000000000464500
    pid   = 2625, comm = tcpdump
enter ? for help
1:mon> t
[c00000001b023d90] c000000000940000 (unreliable)
1:mon>


Comment 16 Herbert Xu 2007-01-23 23:46:40 UTC
Created attachment 146377 [details]
[PACKET]: Fix skb->cb clobbering between aux and sockaddr

Both aux data and sockaddr tries to use the same buffer which
obviously doesn't work.  We just happen to have 4 bytes free in
the skb->cb if you take away the maximum length of sockaddr_ll.
That's just enough to store the one piece of info from aux data
that we can't generate at recvmsg(2) time.

This is what the following patch does.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Comment 20 Jay Turner 2007-01-24 12:56:07 UTC
QE ack for RHEL5.

Comment 22 Don Zickus 2007-01-24 21:45:17 UTC
in 2.6.18-6.el5

Comment 23 Jay Turner 2007-02-13 17:01:27 UTC
Closing out.


Note You need to log in before you can comment on or make changes to this bug.