Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 455843 - Kernel panic at hcd_pci_release+16
Summary: Kernel panic at hcd_pci_release+16
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.6.z
Hardware: All
OS: Linux
high
medium
Target Milestone: rc
: ---
Assignee: Pete Zaitcev
QA Contact: Martin Jenner
URL:
Whiteboard:
: 456065 (view as bug list)
Depends On:
Blocks: 461304
TreeView+ depends on / blocked
 
Reported: 2008-07-18 09:14 UTC by Qian Cai
Modified: 2018-10-20 02:20 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-05-18 19:26:52 UTC
Target Upstream Version:


Attachments (Terms of Use)
reproducer (deleted)
2008-07-24 09:00 UTC, Vitaly Mayatskikh
no flags Details
proposed patch (deleted)
2008-07-24 11:43 UTC, Vitaly Mayatskikh
no flags Details | Diff
new proposed patch (deleted)
2008-10-27 02:15 UTC, Pete Zaitcev
no flags Details | Diff
Full Log of Oops on SGI Altix (deleted)
2008-11-12 10:53 UTC, Qian Cai
no flags Details
proposed patch w/ 471560 (deleted)
2008-12-06 00:38 UTC, Pete Zaitcev
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:1024 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 4.8 kernel security and bug fix update 2009-05-18 14:57:26 UTC

Description Qian Cai 2008-07-18 09:14:22 UTC
Description of problem:
When running the reproducer of bz450865 (load/unload ohci-hcd module in a loop),
there was a Kernel Oops,

http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=3663173

Unable to handle kernel paging request at ffffffffa0039dd0 RIP: 
<ffffffff80290a84>{hcd_pci_release+16}
PML4 103027 PGD 105027 PMD 3fc68c067 PTE 0
Oops: 0000 [1] SMP 
CPU 6 
Modules linked in: netconsole netdump md5 ipv6 parport_pc lp parport autofs4
sunrpc ds yenta_socket pcmcia_core loop button battery ac k8_edac edac_mc e1000
sr_mod dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod mptscsih mptsas mptspi
mptscsi mptbase usb_storage uhci_hcd ehci_hcd sd_mod scsi_mod
Pid: 3723, comm: hald Not tainted 2.6.9-67.0.22.ELsmp
RIP: 0010:[<ffffffff80290a84>] <ffffffff80290a84>{hcd_pci_release+16}
RSP: 0018:00000102fb217e10  EFLAGS: 00010206
RAX: ffffffffa0039d80 RBX: 00000100dfe6ed00 RCX: 0000000000000030
RDX: 00000100dfe6ed00 RSI: ffffffff801ec6e2 RDI: 00000100dfe6ec78
RBP: ffffffff8040c040 R08: 00000105fc7ba878 R09: ffffffff801ec6e2
R10: ffffffff801ec6e2 R11: ffffffff80290a74 R12: ffffffff8040bf80
R13: ffffffff80416208 R14: 00000103001f7140 R15: 0000000000000000
FS:  0000002a963b1d20(0000) GS:ffffffff804f3980(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffa0039dd0 CR3: 00000002fc7b2000 CR4: 00000000000006e0
Process hald (pid: 3723, threadinfo 00000102fb216000, task 00000103fc6477f0)
Stack: ffffffff801ec6b5 ffffffff801ec6e2 00000101fc7d2c00 ffffffff8040bd00 
       ffffffff8040bc40 00000102fc65f4f8 ffffffff802888f6 00000101fc7d2d30 
       ffffffff801ec6b5 ffffffff801ec6e2 
Call Trace:<ffffffff801ec6b5>{kobject_cleanup+84}
<ffffffff801ec6e2>{kobject_release+0} 
       <ffffffff802888f6>{usb_release_dev+60}
<ffffffff801ec6b5>{kobject_cleanup+84} 
       <ffffffff801ec6e2>{kobject_release+0}
<ffffffffa002707c>{:sd_mod:scsi_disk_put+81} 
       <ffffffffa002770d>{:sd_mod:sd_release+112}
<ffffffff801824dd>{blkdev_put+161} 
       <ffffffff8017be4b>{__fput+99} <ffffffff8017aa31>{filp_close+103} 
       <ffffffff8017aaba>{sys_close+130} <ffffffff80110276>{system_call+126} 
       

Code: 4c 8b 58 50 41 ff e3 c3 55 48 89 fd 53 51 48 8b 9f 30 01 00 
RIP <ffffffff80290a84>{hcd_pci_release+16} RSP <00000102fb217e10>
CR2: ffffffffa0039dd0

From the log, there were lots of sda failures. Looks like it was a virtual
floppy,

Jul 18 01:51:08 sun-x4600-01 kernel: usb 2-5: new full speed USB device using
address 4
Jul 18 01:51:08 sun-x4600-01 kernel: scsi1 : SCSI emulation for USB Mass Storage
devices
Jul 18 01:51:08 sun-x4600-01 kernel:   Vendor: AMI       Model: Virtual Floppy  
Rev: 1.00
Jul 18 01:51:08 sun-x4600-01 kernel:   Type:   Direct-Access                    
ANSI SCSI revision: 02
Jul 18 01:51:08 sun-x4600-01 kernel: Attached scsi removable disk sda at scsi1,
channel 0, id 0, lun 0

The machine in question is sun-x4600-01.rhts.bos.redhat.com. I had setup a
netdump before the Oops, but don't know why it failed to capture it.

Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-67.0.22.EL

How reproducible:
not always

Comment 1 Vitaly Mayatskikh 2008-07-21 07:48:08 UTC
How to reproduce: 

Insert any h/w into ohci usb port, run two scripts in parallel:

$ while true; do rmmod ohci; modprobe ohci; done

$ while true; do lsusb; done > /dev/null

It's better to run 2-3 lsusb loops simultaneously. Seems to me, this is a race
condition w.r.t. procfs

Comment 2 Vitaly Mayatskikh 2008-07-21 10:10:42 UTC
Hmm, reproducer from #c1 just hangs the kernel (verified on x86_64 and ppc64).
So, this is another bug in ohci.

Comment 3 Qian Cai 2008-07-22 00:41:39 UTC
The same panic happened on another machine, ibm-morrison2.rhts.bos.redhat.com
(x86_64). Vmcore can be found at,
porkchop.devel.redhat.com:/mnt/redhat/qa/qa/qcai/vmcores/vmcore-455843

Hardware information about this machine can be found at,
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=3683993

Unable to handle kernel paging request at ffffffffa024f910 RIP: 
<ffffffff802d36f8>{hcd_pci_release+16}
PML4 103027 PGD 105027 PMD 106cad067 PTE 0
Oops: 0000 [1] 
CPU 0 
Modules linked in: nfs lockd nfs_acl netconsole netdump md5 ipv6 parport_pc lp
parport autofs4 sunrpc ds yenta_socket pcmcia_core loop button battery ac
hw_random k8_edac edac_mc tg3 dm_snapshot dm_zero dm_mirror ext3 jbd raid1 raid0
dm_mod mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod
Pid: 13506, comm: cat Not tainted 2.6.9-67.0.22.EL
RIP: 0010:[<ffffffff802d36f8>] <ffffffff802d36f8>{hcd_pci_release+16}
RSP: 0018:00000100ed8abe90  EFLAGS: 00010202
RAX: ffffffffa024f8c0 RBX: 000001010f9d9d50 RCX: 0000000000000030
RDX: 000001010f9d9d50 RSI: ffffffff8021c010 RDI: 000001010f9d9cc8
RBP: ffffffff804642e0 R08: 000001010e39c840 R09: 00000100ebdc9180
R10: ffffffff8021c010 R11: ffffffff802d36e8 R12: ffffffff80464200
R13: ffffffff8046f268 R14: 0000000000000000 R15: 0000000000000000
FS:  0000002a95561b00(0000) GS:ffffffff80555000(0000) knlGS:00000000f7eb06c0
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffa024f910 CR3: 0000000000101000 CR4: 00000000000006e0
Process cat (pid: 13506, threadinfo 00000100ed8aa000, task 00000100eb370130)
Stack: ffffffff8021bfe3 ffffffff8021c010 0000010103f07c00 ffffffff80463f40 
       ffffffff80463e60 0000010103f2eef8 ffffffff802c94ba 0000010103f07d58 
       ffffffff8021bfe3 ffffffff8021c010 
Call Trace:<ffffffff8021bfe3>{kobject_cleanup+84}
<ffffffff8021c010>{kobject_release+0} 
       <ffffffff802c94ba>{usb_release_dev+60}
<ffffffff8021bfe3>{kobject_cleanup+84} 
       <ffffffff8021c010>{kobject_release+0} <ffffffff801df3c4>{sysfs_release+54} 
       <ffffffff801906f4>{__fput+99} <ffffffff8018ed24>{filp_close+103} 
       <ffffffff8018ee6d>{sys_close+322} <ffffffff80110a9e>{system_call+126} 
       

Code: 4c 8b 58 50 41 ff e3 c3 55 48 89 fd 53 51 48 8b 9f 30 01 00 
RIP <ffffffff802d36f8>{hcd_pci_release+16} RSP <00000100ed8abe90>
CR2: ffffffffa024f910

Modules linked in: nfs lockd nfs_acl netconsole netdump md5 ipv6 parport_pc lp
parport autofs4 sunrpc ds yenta_socket pcmcia_core loop button battery ac
hw_random k8_edac edac_mc tg3 dm_snapshot dm_zero dm_mirror ext3 jbd raid1 raid0
dm_mod mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod
Pid: 13506, comm: cat Not tainted 2.6.9-67.0.22.EL
RIP: 0010:[<ffffffff802d36f8>] <ffffffff802d36f8>{hcd_pci_release+16}
RSP: 0018:00000100ed8abe90  EFLAGS: 00010202
RAX: ffffffffa024f8c0 RBX: 000001010f9d9d50 RCX: 0000000000000030
RDX: 000001010f9d9d50 RSI: ffffffff8021c010 RDI: 000001010f9d9cc8
RBP: ffffffff804642e0 R08: 000001010e39c840 R09: 00000100ebdc9180
R10: ffffffff8021c010 R11: ffffffff802d36e8 R12: ffffffff80464200
R13: ffffffff8046f268 R14: 0000000000000000 R15: 0000000000000000
FS:  0000002a95561b00(0000) GS:ffffffff80555000(0000) knlGS:00000000f7eb06c0
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffa024f910 CR3: 0000000000101000 CR4: 00000000000006e0

Call Trace:<ffffffff8021bfe3>{kobject_cleanup+84}
<ffffffff8021c010>{kobject_release+0} 
       <ffffffff802c94ba>{usb_release_dev+60}
<ffffffff8021bfe3>{kobject_cleanup+84} 
       <ffffffff8021c010>{kobject_release+0} <ffffffff801df3c4>{sysfs_release+54} 
       <ffffffff801906f4>{__fput+99} <ffffffff8018ed24>{filp_close+103} 
       <ffffffff8018ee6d>{sys_close+322} <ffffffff80110a9e>{system_call+126} 


Comment 4 Vitaly Mayatskikh 2008-07-24 06:46:18 UTC
*** Bug 456065 has been marked as a duplicate of this bug. ***

Comment 5 Vitaly Mayatskikh 2008-07-24 09:00:09 UTC
Created attachment 312538 [details]
reproducer

This is a common bug for all usb host controller drivers (ohci, ehci, uhci), it
cause kernel to oops or to hang.

Comment 6 Vitaly Mayatskikh 2008-07-24 11:43:00 UTC
Created attachment 312543 [details]
proposed patch

Comment 11 RHEL Product and Program Management 2008-09-03 13:15:11 UTC
Updating PM score.

Comment 12 Pete Zaitcev 2008-10-27 02:15:27 UTC
Created attachment 321570 [details]
new proposed patch

This patch has two parts:
 1. Allow kfree() if hdc_free is NULL, and relocate usb_hcd so it's legal
 2. Add the "dead" HCD stub so we don'tuse hc_driver ifreed with the module

Comment 18 Qian Cai 2008-11-12 10:51:39 UTC
While testing on a RHEL 4.7 Zstream Kernel, I have seen the following Oops on SGI Altix machine. Do you think it is the same issue as in here? 

11/11/08 14:36:59  JobID:35787 Test:/kernel/errata/4.6.z/450865 Response:1
11/11/08 14:36:59  testID:1061889 start:
ACPI: PCI interrupt 0002:01:02.0[A]: no GSI
ACPI: PCI interrupt 0002:01:02.1[B]: no GSI
ACPI: PCI interrupt 0012:01:02.0[A]: no GSI
...
ohci_hcd 0012:01:02.0: init err
ohci_hcd 0012:01:02.0: can't start
ohci_hcd 0012:01:02.0: init error -16
ohci_hcd: probe of 0012:01:02.0 failed with error -16
...
ACPI: PCI interrupt 0012:01:02.1[B]: no GSI
ACPI: PCI interrupt 0002:01:02.0[A]: no GSI
ACPI: PCI interrupt 0002:01:02.1[B]: no GSI
...
Unable to handle kernel paging request at virtual address a00000020021a080
cat[6124]: Oops 8813272891392 [1]
Modules linked in: nfsd exportfs nfs lockd nfs_acl netconsole netdump md5 ipv6 parport_pc lp parport autofs4 sunrpc ds yenta_socket pcmcia_core vfat fat loop button ehci_hcd tg3 dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod^M

Pid: 6124, CPU 2, comm:                  cat
psr : 0000101008126010 ifs : 8000000000000205 ip  : [<a000000100424c50>]    Not tainted
ip is at hcd_pci_release+0x50/0xc0
unat: 0000000000000000 pfs : 0000000000000205 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : 0000000069559a99
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a0000001004187c0 b6  : a000000100424c00 b7  : a000000100012970
f6  : 1003e0000000000000000 f7  : 1003e0000000000004000
f8  : 1003e0000000000000000 f9  : 1000d8000000000000000
f10 : 1003e0000000000000001 f11 : 0fffffefdfffff0102000
r1  : a0000001009e0fd0 r2  : a00000010079b108 r3  : a00000020021a080
r8  : a00000020021a030 r9  : a000000100366080 r10 : 0000000000000001
r11 : a0000001002551a0 r12 : e00000301144fe20 r13 : e000003011448000
r14 : e000003015e23c78 r15 : e000003015e23e58 r16 : e000003015e23d38
r17 : 000000000000002e r18 : e00000b0f67c8190 r19 : a0007fff65270000
r20 : 0000000006009d98 r21 : 0000000000c013b3 r22 : e000003011448dd4
r23 : a0000001007f4738 r24 : a0000001007f4738 r25 : 0000000000000000
r26 : 0000000000000001 r27 : 0000001008126010 r28 : 4000000000002300
r29 : 00001213081a6010 r30 : 0000000000004000 r31 : 0000000000004000

Call Trace:

 [<a000000100016e40>] show_stack+0x80/0xa0
                                sp=e00000301144f9b0 bsp=e000003011449378
 [<a000000100017750>] show_regs+0x890/0x8c0
                                sp=e00000301144fb80 bsp=e000003011449330
 [<a00000010003e9b0>] die+0x150/0x240
                                sp=e00000301144fba0 bsp=e0000030114492f0
 [<a000000100064920>] ia64_do_page_fault+0x8e0/0xbe0
                                sp=e00000301144fba0 bsp=e000003011449288
 [<a00000010000f600>] ia64_leave_kernel+0x0/0x260
                                sp=e00000301144fc50 bsp=e000003011449288
 [<a000000100424c50>] hcd_pci_release+0x50/0xc0
                                sp=e00000301144fe20 bsp=e000003011449260
 [<a0000001004187c0>] usb_host_release+0x60/0x80
                                sp=e00000301144fe20 bsp=e000003011449240
 [<a000000100366100>] class_dev_release+0x80/0x120
                                sp=e00000301144fe20 bsp=e000003011449220
 [<a000000100255130>] kobject_cleanup+0x170/0x1e0
                                sp=e00000301144fe20 bsp=e0000030114491e0
 [<a0000001002551c0>] kobject_release+0x20/0x40^M
                                sp=e00000301144fe20 bsp=e0000030114491c0
 [<a000000100256350>] kref_put+0xf0/0x1e0
                                sp=e00000301144fe20 bsp=e000003011449198
 [<a000000100254f90>] kobject_put+0x30/0x60
                                sp=e00000301144fe20 bsp=e000003011449178
 [<a000000100366440>] class_device_put+0x20/0x40
                                sp=e00000301144fe20 bsp=e000003011449158
 [<a000000100418730>] usb_bus_put+0x30/0x60^@^M
                                sp=e00000301144fe20 bsp=e000003011449138
 [<a00000010040ee30>] usb_release_dev+0x190/0x220
                                sp=e00000301144fe20 bsp=e000003011449118
 [<a000000100360370>] device_release+0x70/0x120
                                sp=e00000301144fe20 bsp=e0000030114490f8
 [<a000000100255130>] kobject_cleanup+0x170/0x1e0
                                sp=e00000301144fe20 bsp=e0000030114490c0
 [<a0000001002551c0>] kobject_release+0x20/0x40
                                sp=e00000301144fe20 bsp=e0000030114490a0
 [<a000000100256350>] kref_put+0xf0/0x1e0
                                sp=e00000301144fe20 bsp=e000003011449078
 [<a000000100254f90>] kobject_put+0x30/0x60
                                sp=e00000301144fe20 bsp=e000003011449058
 [<a000000100255170>] kobject_cleanup+0x1b0/0x1e0
                                sp=e00000301144fe20 bsp=e000003011449020
 [<a0000001002551c0>] kobject_release+0x20/0x40
                                sp=e00000301144fe20 bsp=e000003011449000^M
 [<a000000100256350>] kref_put+0xf0/0x1e0
                                sp=e00000301144fe20 bsp=e000003011448fd0
 [<a000000100254f90>] kobject_put+0x30/0x60
                                sp=e00000301144fe20 bsp=e000003011448fb0
 [<a0000001001c47e0>] sysfs_release+0xa0/0x1e0
                                sp=e00000301144fe20 bsp=e000003011448f80
 [<a00000010012b780>] __fput+0x380/0x3e0
                                sp=e00000301144fe20 bsp=e000003011448f30
 [<a00000010012b820>] fput+0x40/0x60
                                sp=e00000301144fe30 bsp=e000003011448f10
 [<a0000001001280e0>] filp_close+0xc0/0x1a0
                                sp=e00000301144fe30 bsp=e000003011448ee0
 [<a000000100128310>] sys_close+0x150/0x1c0
                                sp=e00000301144fe30 bsp=e000003011448e68
 [<a00000010000f4a0>] ia64_ret_from_syscall+0x0/0x20
                                sp=e00000301144fe30 bsp=e000003011448e68
 [<a000000000010640>] 0xa000000000010640
                                sp=e000003011450000 bsp=e000003011448e68

Comment 19 Qian Cai 2008-11-12 10:53:10 UTC
Created attachment 323322 [details]
Full Log of Oops on SGI Altix

Comment 20 Vitaly Mayatskikh 2008-11-12 11:30:15 UTC
Trace path is the same like we have in original report. I think this is the same issue.

Comment 22 Pete Zaitcev 2008-11-20 04:59:22 UTC
Test kernel 2.6.9-78.18.EL.bz455843.1 is available here (with ia64):
  http://people.redhat.com/zaitcev/ftp/455843/

Feel free to let me know if more packages are needed, e.g. kernel-devel
for any specific arch.

Comment 23 Qian Cai 2008-11-21 10:30:14 UTC
Tested on altix4.rhts.bos.redhat.com with the new Linux kernel by the reproducer in comment #5 on, and it did not panic any more. Only the following messages output on the serial console.

bus 3: replacing with dummies
bus 4: replacing with dummies
bus 5: replacing with dummies
bus 6: replacing with dummies
bus 1: replacing with dummies
bus 2: replacing with dummies

The reproducer almost immediately caused the panic with the old Linux kernel. I have also tried the following test with the new Linux kernel for around a hour without seeing any issue.

while :; do rmmod ohci-hcd; modprobe ohci-hcd; done

Comment 24 Pete Zaitcev 2008-12-06 00:27:59 UTC
New test kernel 2.6.9-78.18.EL.bz455843.4 is available at the same location.
Cai, Ulrich, and anyone interested in this but, please test.

The .4 incorporates fixes for bug 471560 and a fix for a failure case
(it has actually happened at the box that Cai provided for me).
Otherwise it's the same as .1.

Comment 25 Pete Zaitcev 2008-12-06 00:38:31 UTC
Created attachment 325952 [details]
proposed patch w/ 471560

This is built into .bz455843.4.

Comment 28 Linda Wang 2008-12-17 20:17:28 UTC
patch posted on Wed, 10 Dec 2008 18:42:46 -0700. move to POST, and dev ack.

Comment 29 RHEL Product and Program Management 2008-12-17 20:20:19 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 30 Qian Cai 2008-12-31 10:38:27 UTC
I have tested the new kernel 2.6.9-78.18.EL.bz455843.4 on several machines, and have not seen any problem,

https://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=40744

Thanks Pete!

Comment 31 Qian Cai 2009-01-04 01:34:30 UTC
Also, running the test for 3 hours on various machines over the weekend did not show any issue.

Comment 32 Pete Zaitcev 2009-01-04 03:20:05 UTC
That's great to know. Unfortunately, Prarit was sceptical, so I'm having
trouble drumming up reviews for it. Thread:
 http://post-office.corp.redhat.com/archives/rhkernel-list/2008-December/msg00467.html

Comment 34 Vivek Goyal 2009-01-15 14:03:54 UTC
Committed in 78.29.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 37 Han Pingtian 2009-04-20 05:00:41 UTC
I have reproduced this bug on bm-morrison2.rhts.bos.redhat.com with RHEL4-U7,
kernel version 2.6.9-78.ELsmp:

cannot read deviUnable to handle kernel paging requestce descriptor No at ffffffffa01c7dd0 RIP:
 such device (19<ffffffff80299004>{hcd_pci_release+16})

PML4 103027 PGD 105027 PMD edd0b067 PTE 770cb163
Oops: 0000 [1] SMP
CPU 2
Modules linked in: md5 ipv6 parport_pc lp parport autofs4 sunrpc ds yenta_socket pcmcia_core cpufreq_powersave loop button battery ac hw_random k8_edac edac_mc tg3 dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod
Pid: 29401, comm: lsusb Not tainted 2.6.9-78.ELsmp
RIP: 0010:[<ffffffff80299004>] <ffffffff80299004>{hcd_pci_release+16}
RSP: 0018:000001007a17be70  EFLAGS: 00010206
RAX: ffffffffa01c7d80 RBX: 000001010b347d00 RCX: 0000000000000030
RDX: 000001010b347d00 RSI: ffffffff801edc9a RDI: 000001010b347c78
RBP: ffffffff80418740 R08: 0000000000000001 R09: ffffffff801edc9a
R10: ffffffff801edc9a R11: ffffffff80298ff4 R12: ffffffff80418680
R13: ffffffff80427488 R14: 000001010aee6178 R15: 00000000ffffffff
FS:  0000002a958a5b00(0000) GS:ffffffff8050d380(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffa01c7dd0 CR3: 00000000edfa2000 CR4: 00000000000006e0
Process lsusb (pid: 29401, threadinfo 000001007a17a000, task 000001007bdb77f0)
Stack: ffffffff801edc6d ffffffff801edc9a 0000010037dd6c00 ffffffff80418400
       ffffffff80418340 00000100edd055b8 ffffffff80290e42 0000010037dd6d30
       ffffffff801edc6d 0000007fbffff501
Call Trace:<ffffffff801edc6d>{kobject_cleanup+84} <ffffffff801edc9a>{kobject_release+0}
       <ffffffff80290e42>{usb_release_dev+60} <ffffffff801edc6d>{kobject_cleanup+84}
       <ffffffff8029a195>{usbdev_release+173} <ffffffff8017c920>{__fput+100}
       <ffffffff8017b501>{filp_close+103} <ffffffff8017b58b>{sys_close+131}
       <ffffffff801102f6>{system_call+126}

Code: 4c 8b 58 50 41 ff e3 c3 55 48 89 fd 53 51 48 8b 9f 30 01 00
RIP <ffffffff80299004>{hcd_pci_release+16} RSP <000001007a17be70>
CR2: ffffffffa01c7dd0
 <0>Kernel panic - not syncing: Oops

Comment 38 Pete Zaitcev 2009-04-20 05:17:56 UTC
No surprise here, the -78.EL does not have the fix. The fix was
committed in -78.29.EL, see Vivek's comment #33.
What was the need to test the -78?

Comment 39 Han Pingtian 2009-04-21 09:09:42 UTC
(In reply to comment #38)
> No surprise here, the -78.EL does not have the fix. The fix was
> committed in -78.29.EL, see Vivek's comment #33.
> What was the need to test the -78?  
Sorry for the confused comment.
I am just trying to verfiy the fix. First, I have to ensure the bug 
could be reporduced on the testing machine.
In the end, I reproduce it on altix4.rhts.bos.redhat.com (load/unload ehci_hcd
and run there "lsusb" parallely) under 2.6.9-78.EL within 1 minute:


Unable to handle kernel paging request at virtual address a0000002001b7d38
lsusb[6601]: Oops 8813272891392 [1]
Modules linked in: md5 ipv6 parport_pc lp parport autofs4 sunrpc ds yenta_socket pcmcia_core vfat fat loop button ohci_hcd tg3 dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod

Pid: 6601, CPU 0, comm:                lsusb
psr : 0000101008126010 ifs : 8000000000000205 ip  : [<a000000100424790>]    Not tainted
ip is at hcd_pci_release+0x50/0xc0
unat: 0000000000000000 pfs : 0000000000000205 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : 0000000005559a99
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a000000100418300 b6  : a000000100424740 b7  : a000000100012970
f6  : 000000000000000000000 f7  : 000000000000000000000
f8  : 000000000000000000000 f9  : 000000000000000000000
f10 : 000000000000000000000 f11 : 000000000000000000000
r1  : a0000001009e0ea0 r2  : a00000010079b040 r3  : a0000002001b7d38
r8  : a0000002001b7ce8 r9  : a000000100365bc0 r10 : 0000000000000001
r11 : a000000100254ce0 r12 : e00000b0084cfe20 r13 : e00000b0084c8000
r14 : e0000030f7acd140 r15 : e0000030f7acd320 r16 : e0000030f7acd200
r17 : 0000000000000011 r18 : e0000030f7fa0110 r19 : a0007fff65270000
r20 : 00000000161ec7f8 r21 : 0000000002c3d8ff r22 : e00000b0084c8dd4
r23 : a0000001007f45f8 r24 : a0000001007f45f8 r25 : 0000000000000000
r26 : 0000000000000001 r27 : 0000001008126010 r28 : 400000000000b020
r29 : 00001213081a6018 r30 : 0000000000004000 r31 : 0000000000004000

Call Trace:
 [<a000000100016e40>] show_stack+0x80/0xa0
                                sp=e00000b0084cf9b0 bsp=e00000b0084c9338
 [<a000000100017750>] show_regs+0x890/0x8c0
                                sp=e00000b0084cfb80 bsp=e00000b0084c92f0
 [<a00000010003e9b0>] die+0x150/0x240
                                sp=e00000b0084cfba0 bsp=e00000b0084c92b0
 [<a000000100064920>] ia64_do_page_fault+0x8e0/0xbe0
                                sp=e00000b0084cfba0 bsp=e00000b0084c9248
 [<a00000010000f600>] ia64_leave_kernel+0x0/0x260
                                sp=e00000b0084cfc50 bsp=e00000b0084c9248
 [<a000000100424790>] hcd_pci_release+0x50/0xc0
                                sp=e00000b0084cfe20 bsp=e00000b0084c9220
 [<a000000100418300>] usb_host_release+0x60/0x80
                                sp=e00000b0084cfe20 bsp=e00000b0084c9200
 [<a000000100365c40>] class_dev_release+0x80/0x120
                                sp=e00000b0084cfe20 bsp=e00000b0084c91d8
 [<a000000100254c70>] kobject_cleanup+0x170/0x1e0
                                sp=e00000b0084cfe20 bsp=e00000b0084c91a0
 [<a000000100254d00>] kobject_release+0x20/0x40
                                sp=e00000b0084cfe20 bsp=e00000b0084c9180
 [<a000000100255e90>] kref_put+0xf0/0x1e0
                                sp=e00000b0084cfe20 bsp=e00000b0084c9158
 [<a000000100254ad0>] kobject_put+0x30/0x60
                                sp=e00000b0084cfe20 bsp=e00000b0084c9138
 [<a000000100365f80>] class_device_put+0x20/0x40
                                sp=e00000b0084cfe20 bsp=e00000b0084c9118
 [<a000000100418270>] usb_bus_put+0x30/0x60
                                sp=e00000b0084cfe20 bsp=e00000b0084c90f8
 [<a00000010040e970>] usb_release_dev+0x190/0x220
                                sp=e00000b0084cfe20 bsp=e00000b0084c90d8
 [<a00000010035feb0>] device_release+0x70/0x120
                                sp=e00000b0084cfe20 bsp=e00000b0084c90b8
 [<a000000100254c70>] kobject_cleanup+0x170/0x1e0
                                sp=e00000b0084cfe20 bsp=e00000b0084c9080
 [<a000000100254d00>] kobject_release+0x20/0x40
                                sp=e00000b0084cfe20 bsp=e00000b0084c9060
 [<a000000100255e90>] kref_put+0xf0/0x1e0
                                sp=e00000b0084cfe20 bsp=e00000b0084c9038
 [<a000000100254ad0>] kobject_put+0x30/0x60
                                sp=e00000b0084cfe20 bsp=e00000b0084c9018
 [<a0000001003601a0>] put_device+0x20/0x40
                                sp=e00000b0084cfe20 bsp=e00000b0084c8ff0
 [<a00000010040f010>] usb_put_dev+0x30/0x60
                                sp=e00000b0084cfe20 bsp=e00000b0084c8fd0
 [<a000000100427770>] usbdev_release+0x1f0/0x220
                                sp=e00000b0084cfe20 bsp=e00000b0084c8f80
 [<a00000010012b620>] __fput+0x380/0x3e0
                                sp=e00000b0084cfe20 bsp=e00000b0084c8f30
 [<a00000010012b6c0>] fput+0x40/0x60
                                sp=e00000b0084cfe30 bsp=e00000b0084c8f10
 [<a000000100127f80>] filp_close+0xc0/0x1a0
                                sp=e00000b0084cfe30 bsp=e00000b0084c8ee0
 [<a0000001001281b0>] sys_close+0x150/0x1c0
                                sp=e00000b0084cfe30 bsp=e00000b0084c8e68
 [<a00000010000f4a0>] ia64_ret_from_syscall+0x0/0x20
                                sp=e00000b0084cfe30 bsp=e00000b0084c8e68
 [<a000000000010640>] 0xa000000000010640
                                sp=e00000b0084d0000 bsp=e00000b0084c8e68
Kernel panic - not syncing: Fatal exception

Then, I install the latest RHEL4-U8 kernel 2.6.9-88.EL. The testing has been
running about 3 hours. And The bug doesn't be reproduced. So I think the fix
works. I will change status to VERIFIED. Thanks!

Comment 41 errata-xmlrpc 2009-05-18 19:26:52 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html


Note You need to log in before you can comment on or make changes to this bug.