Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 235463 - Unplugging and replugging firewire devices kernel panics system
Summary: Unplugging and replugging firewire devices kernel panics system
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Kristian Høgsberg
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-04-05 22:02 UTC by Jarod Wilson
Modified: 2008-03-04 17:12 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-03-04 17:12:18 UTC


Attachments (Terms of Use)

Description Jarod Wilson 2007-04-05 22:02:44 UTC
Description of problem:
For starters, my firewire devices aren't brought up at all by the firewire
stack, using the onboard firewire port on a PowerBook G4. I unplugged the
devices and then plugged them back in to see if they'd maybe do something
different the second time, and I wound up seeing xmon for the first time ever on
my PowerBook. :)

Version-Release number of selected component (if applicable):
kernel-2.6.20-1.2945.fc7.ppc

Comment 1 Jarod Wilson 2007-04-05 22:06:54 UTC
Hrm... This might not be ppc-specific. I just unplugged my fw hd and cd-rw from
an i686 box, plugged them back in and powered them back up. Kernel panic.
007-04-05 18:03:37 fw_sbp2: reconnected to unit fw1.0 (1 retries)
2007-04-05 18:03:39 fw_core: Unsolicited response (source ffc0, tlabel 1f)
2007-04-05 18:03:39 fw_core: Unsolicited response (source ffc0, tlabel 19)
2007-04-05 18:04:32 fw_core: phy config: card 0, new root=ffc2, gap_count=7
2007-04-05 18:04:32 fw_sbp2: management write failed, rcode 0x12
2007-04-05 18:04:32 fw_sbp2: reconnected to unit fw1.0 (1 retries)
2007-04-05 18:04:38 fw_core: phy config: card 0, new root=ffc2, gap_count=7
2007-04-05 18:04:38 fw_sbp2: management write failed, rcode 0x12
2007-04-05 18:04:38 fw_sbp2: reconnected to unit fw1.0 (1 retries)
2007-04-05 18:04:43 fw_core: Unsolicited response (source ffc0, tlabel 14)
2007-04-05 18:04:43 fw_core: Unsolicited response (source ffc0, tlabel 15)
2007-04-05 18:04:43 BUG: unable to handle kernel paging request at virtual
address 6b6b6b6b
2007-04-05 18:04:43  printing eip:
2007-04-05 18:04:43 6b6b6b6b
2007-04-05 18:04:43 *pde = 00000000
2007-04-05 18:04:43 Oops: 0000 [#1]
2007-04-05 18:04:43 SMP 
2007-04-05 18:04:43 last sysfs file: /block/sr2/size
2007-04-05 18:04:43 Modules linked in: fw_sbp2 radeon drm autofs4 hidp rfcomm
l2cap bluetooth sunrpc nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4
xt_state nf_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp
ip6table_filter ip6_tables x_tables ipv6 dm_mirror dm_multipath dm_mod raid0
parport_pc lp parport serio_raw 3c59x mii pcspkr i2c_piix4 fw_ohci fw_core
i2c_core ata_piix ata_generic libata sr_mod cdrom sg aic7xxx scsi_transport_spi
sd_mod scsi_mod raid456 xor raid1 ext3 jbd mbcache ehci_hcd ohci_hcd uhci_hcd
2007-04-05 18:04:43 CPU:    0
2007-04-05 18:04:43 EIP:    0060:[<6b6b6b6b>]    Not tainted VLI
2007-04-05 18:04:43 EFLAGS: 00010283   (2.6.20-1.3045.fc7 #1)
2007-04-05 18:04:43 EIP is at 0x6b6b6b6b
2007-04-05 18:04:43 eax: ea876014   ebx: ea876014   ecx: 00000002   edx: f7f19000
2007-04-05 18:04:43 esi: f7f19000   edi: f6700ca0   ebp: c077efbc   esp: c077efb0
2007-04-05 18:04:43 ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068
2007-04-05 18:04:43 Process swapper (pid: 0, ti=c077e000 task=c06f24e0
task.ti=c0736000)
2007-04-05 18:04:43 Stack: f8c0e8c6 f6700c80 f7f19dac c077efd0 f8c0e034 f7f19dd4
00000000 00000008 
2007-04-05 18:04:43        c077efe0 c042c16c 00000001 c072bb28 c077eff8 c042c06b
00000000 c0736f28 
2007-04-05 18:04:43        c0736000 00000046 c0736f48 c0407420 
2007-04-05 18:04:43 Call Trace:
2007-04-05 18:04:43  [<c04061e9>] show_trace_log_lvl+0x1a/0x2f
2007-04-05 18:04:43  [<c0406299>] show_stack_log_lvl+0x9b/0xa3
2007-04-05 18:04:43  [<c0406459>] show_registers+0x1b8/0x289
2007-04-05 18:04:43  [<c0406657>] die+0x12d/0x242
2007-04-05 18:04:43  [<c0617666>] do_page_fault+0x3ee/0x4ba
2007-04-05 18:04:43  [<c0615dfc>] error_code+0x7c/0x84
2007-04-05 18:04:43  [<f8c0e034>] context_tasklet+0x34/0x4a [fw_ohci]
2007-04-05 18:04:43  [<c042c16c>] tasklet_action+0x5b/0xc3
2007-04-05 18:04:43  [<c042c06b>] __do_softirq+0x6f/0xe2
2007-04-05 18:04:43  [<c0407420>] do_softirq+0x61/0xd0
2007-04-05 18:04:43  =======================
2007-04-05 18:04:43 Code:  Bad EIP value.
2007-04-05 18:04:43 EIP: [<6b6b6b6b>] 0x6b6b6b6b SS:ESP 0068:c077efb0
2007-04-05 18:04:43 Kernel panic - not syncing: Fatal exception in interrupt


Comment 2 Kristian Høgsberg 2007-04-06 22:56:14 UTC
Looking at all the "unsolicited response" messages, it looks like you're hitting
the bug I just fixed in rawhide.  kernel-2.6.20-1.3051.fc7 has the fix, please
give it a try when you get the time.  I was re-starting DMA when it was already
running in some cases, which can explain both the "unsolicted response" messages
and the crash.


Comment 3 Jarod Wilson 2007-04-10 17:25:53 UTC
Slightly better with 2.6.20-1.3054.fc7, in that upon replugging the devices and
powering them up (cd-rw first), I do briefly see both devices being recognized,
but I'm still getting a kernel panic (part of it never made it to the console
though):

2007-04-10 13:17:42 fw_core: phy config: card 0, new root=ffc2, gap_count=7
2007-04-10 13:17:46 fw_core: phy config: card 0, new root=ffc2, gap_count=7
2007-04-10 13:17:47 fw_core: phy config: card 0, new root=ffc2, gap_count=7
2007-04-10 13:17:48 fw_core: created new fw device fw1 (3 config rom retries)
2007-04-10 13:17:48 fw_sbp2: logged in to sbp2 unit fw1.0 (0 retries)
2007-04-10 13:17:48 fw_sbp2:  - management_agent_address:    0xfffff0030000
2007-04-10 13:17:48 fw_sbp2:  - command_block_agent_address: 0xfffff0100000
2007-04-10 13:17:48 fw_sbp2:  - status write address:        0x000100000000
2007-04-10 13:17:48 scsi 3:0:0:0: CD-ROM            TEAC     CD-W540E        
1.0B PQ: 0 ANSI: 0
2007-04-10 13:17:50 sr2: scsi3-mmc drive: 48x/48x writer cd/rw xa/form2 cdda tray
2007-04-10 13:17:50 sr 3:0:0:0: Attached scsi generic sg4 type 5
2007-04-10 13:17:52 fw_core: giving up on config rom for node id ffc1
2007-04-10 13:17:59 fw_core: phy config: card 0, new root=ffc2, gap_count=7
2007-04-10 13:17:59 fw_sbp2: orb reply timed out, rcode=0x12
2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 1
2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 1
2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 1
2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 8
2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 2
2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 12
2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 3
2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 4
2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 5
2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 1
2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 1
2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:17:59 Buffer I/O error on device sr2, logical block 1
2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:17:59 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:18:00 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:18:00 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:18:00 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:18:00 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:18:00 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:18:00 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:18:00 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:18:00 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:18:00 end_request: I/O error, dev sr2, sector 4
2007-04-10 13:18:00 fw_sbp2: reconnected to unit fw1.0 (1 retries)
2007-04-10 13:18:14 fw_core: Unsolicited response (source ffc0, tlabel 8)
2007-04-10 13:18:14 fw_core: Unsolicited response (source ffc0, tlabel 9)
2007-04-10 13:18:14 BUG: unable to handle kernel paging request at virtual
address 6b6b6b6b
2007-04-10 13:18:14  printing eip:
2007-04-10 13:18:14 f8c533f5
2007-04-10 13:18:14 *pde = 00000000
2007-04-10 13:18:14 Oops: 0000 [#1]
2007-04-10 13:18:14 SMP 
2007-04-10 13:18:14 last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
2007-04-10 13:18:14 Modules linked in: fw_sbp2 radeon drm autofs4 hidp rfcomm
l2cap bluetooth sunrpc nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4
xt_state nf_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp
ip6table_filter ip6_tables x_tables ipv6 dm_mirror dm_multipath dm_mod raid0
parport_pc lp parport serio_raw ata_piix ata_generic fw_ohci pcspkr fw_core
libata i2c_piix4


Comment 4 Kristian Høgsberg 2007-04-10 21:30:18 UTC
Sigh... taking out of modified.

Comment 5 Jarod Wilson 2007-04-10 21:50:54 UTC
So to further clarify the issue... Things behave nicely if the drives aren't
powered off when unplugging and replugging the firewire cables, but things go
south in a hurry if the drives are powered off, unplugged, plugged back in and
then powered back up.

For posterity, here's the full trace captured the last go 'round:

BUG: unable to handle kernel paging request at virtual address 6b6b6b6b
 printing eip:
f8c863f5
*pde = 00000000
Oops: 0000 [#1]
SMP 
last sysfs file:
/devices/pci0000:00/0000:00:09.0/fw1/fw1.0/host23/target23:0:0/23:0:0:0/rev
Modules linked in: fw_sbp2 radeon drm autofs4 hidp rfcomm l2cap bluetooth sunrpc
nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack
nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter
ip6_tables x_tables ipv6 dm_mirror dm_multipath dm_mod raid0 parport_pc lp
parport serio_raw fw_ohci fw_core ata_piix i2c_piix4 pcspkr ata_generic 3c59x
i2c_core libata mii sr_mod cdrom sg aic7xxx scsi_transport_spi sd_mod scsi_mod
raid456 xor raid1 ext3 jbd mbcache ehci_hcd ohci_hcd uhci_hcd
CPU:    0
EIP:    0060:[<f8c863f5>]    Not tainted VLI
EFLAGS: 00010002   (2.6.20-1.3054.fc7 #1)
EIP is at fw_core_handle_response+0x95/0x108 [fw_core]
eax: 0000ffc0   ebx: 6b6b6b5f   ecx: ede1000c   edx: 6b6b6b6b
esi: 0000001e   edi: f6333470   ebp: c077ff4c   esp: c077ff1c
ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068
Process swapper (pid: 0, ti=c077f000 task=c06f34e0 task.ti=c0737000)
Stack: 00000296 f7e52218 00000000 c077ff58 f6333000 ffc27920 ffc06000 00000286 
       0000ffc0 f6333000 f5ddda18 f6333d84 c077ffa0 f8c429af f5ddda14 00000002 
       0000005a ffc27920 ffc06000 00000000 c1e68a90 0000000c f5ddda18 00000000 
Call Trace:
 [<c04061e9>] show_trace_log_lvl+0x1a/0x2f
 [<c0406299>] show_stack_log_lvl+0x9b/0xa3
 [<c0406459>] show_registers+0x1b8/0x289
 [<c0406657>] die+0x12d/0x242
 [<c0618026>] do_page_fault+0x3ee/0x4ba
 [<c06167bc>] error_code+0x7c/0x84
 [<f8c429af>] handle_ar_packet+0xe0/0xeb [fw_ohci]
 [<f8c437e4>] ar_context_tasklet+0xb6/0xc4 [fw_ohci]
 [<c042c0a8>] tasklet_action+0x5b/0xc3
 [<c042bfa7>] __do_softirq+0x6f/0xe2
 [<c0407420>] do_softirq+0x61/0xd0
 =======================
Code: 89 f8 8b 55 ec e8 0b 02 99 c7 8b 45 e4 c1 e8 04 83 e0 0f 83 f8 07 74 36 83
f8 0b 74 31 31 c9 31 d2 83 f8 06 75 32 eb 19 8d 5a f4 <8b> 53 0c 0f 18 02 90 8b
45 e0 8d 4b 0c 83 c0 20 39 c1 75 9e eb 
EIP: [<f8c863f5>] fw_core_handle_response+0x95/0x108 [fw_core] SS:ESP 0068:c077ff1c
Kernel panic - not syncing: Fatal exception in interrupt
BUG: warning at arch/i386/kernel/smp.c:549/smp_call_function() (Not tainted)
 [<c04061e9>] show_trace_log_lvl+0x1a/0x2f
 [<c04067ad>] show_trace+0x12/0x14
 [<c0406831>] dump_stack+0x16/0x18
 [<c0417baa>] smp_call_function+0x67/0xcd
 [<c0417c2e>] smp_send_stop+0x1e/0x31
 [<c0427443>] panic+0x54/0x195
 [<c0406737>] die+0x20d/0x242
 [<c0618026>] do_page_fault+0x3ee/0x4ba
 [<c06167bc>] error_code+0x7c/0x84
 [<f8c429af>] handle_ar_packet+0xe0/0xeb [fw_ohci]
 [<f8c437e4>] ar_context_tasklet+0xb6/0xc4 [fw_ohci]
 [<c042c0a8>] tasklet_action+0x5b/0xc3
 [<c042bfa7>] __do_softirq+0x6f/0xe2
 [<c0407420>] do_softirq+0x61/0xd0
 =======================


Comment 6 Jarod Wilson 2007-04-17 17:38:56 UTC
Okay, I'm no longer seeing the kernel panic with 2.6.20-1.3079.fc7. I still have
to unplug and replug the drives at least once after they're powered up to get
them usable on the system, but this is definitely progress...

Comment 7 Stefan Richter 2007-04-28 14:17:43 UTC
Kristian, some (better) SBP-2 devices with extra PSU power their PHY from bus
power when the PSU is switched off.  That way they still act as a repeater when
switched off but plugged in (provided another node powers the bus), and
daisy-chained devices can still be accessed.

When such an SBP-2 device is switched on, it powers up the link layer controller
and spins up the disk, but doesn't make a config ROM available yet.  After the
disk is up and running and the LLC scanned it, it constructs and exposes the
config ROM.  (And usually, but not necessarily, it issues a bus reset to
announce it to the bus.  The SBP-2 spec says it doesn't have to, or actually
/should not/, issue a bus reset if the ROM becomes available within 5s after
power reset.)

Perhaps fw-core's device discovery code ignores such devices for whatever reason.

Comment 8 Stefan Richter 2008-03-04 16:10:04 UTC
Jarod, is anything of this bug still relevant?

(The still unfixed http://bugzilla.kernel.org/show_bug.cgi?id=9617 looks different.)

Comment 9 Jarod Wilson 2008-03-04 17:12:18 UTC
Whoa, I filed this forever ago, forgot all about it... :)

No, haven't seen the panic in ages, and teh unplug-replug dance in comment #6 I
believe was remedied by the login orb config rom read timeout patch.


Note You need to log in before you can comment on or make changes to this bug.