Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 230805 - Soft Lockup detected when loading cyclades firmware
Summary: Soft Lockup detected when loading cyclades firmware
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Aristeu Rozanski
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-03-02 23:24 UTC by Andreas Thienemann
Modified: 2008-02-07 21:06 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-02-07 21:06:43 UTC


Attachments (Terms of Use)

Description Andreas Thienemann 2007-03-02 23:24:01 UTC
Description of problem:
When loading the firmware on a 4way (2xDual-Core) Opteron for the Cyclades-Z
serial port multiplexer card, the system quickly becomes unresponsive and crashed.

Calling dmesg right after loading the firmware with cyzload -f cyzfirm.bin shows
the following:


Cyclades driver 2.3.2.20 2004/02/25 18:14:16
        built Nov  9 2006 18:53:40
Cyclades-8Zo/PCI #1: 0xff500000-0xff57ffff, 8 channels starting from port 0.
BUG: soft lockup detected on CPU#1!

Call Trace:
 [<ffffffff80069632>] show_trace+0x34/0x47
 [<ffffffff80069657>] dump_stack+0x12/0x17
 [<ffffffff800b4d8b>] softlockup_tick+0xdb/0xed
 [<ffffffff8009432e>] update_process_times+0x42/0x68
 [<ffffffff8007427c>] smp_local_timer_interrupt+0x23/0x47
 [<ffffffff80074938>] smp_apic_timer_interrupt+0x41/0x47
 [<ffffffff8005c042>] apic_timer_interrupt+0x66/0x6c
DWARF2 unwinder stuck at apic_timer_interrupt+0x66/0x6c
Leftover inexact backtrace:
 <IRQ>  [<ffffffff88451e8d>] :cyclades:cyz_poll+0x2fc/0x77f
 [<ffffffff88451b91>] :cyclades:cyz_poll+0x0/0x77f
 [<ffffffff80093b54>] run_timer_softirq+0x133/0x1b0
 [<ffffffff80011c0e>] __do_softirq+0x5e/0xd5
 [<ffffffff8005c6b0>] call_softirq+0x1c/0x28
 [<ffffffff8006a7eb>] do_softirq+0x2c/0x85
 [<ffffffff80068df5>] default_idle+0x0/0x50
 [<ffffffff8005c042>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80068e1e>] default_idle+0x29/0x50
 [<ffffffff800472a9>] cpu_idle+0x95/0xb8
 [<ffffffff8007409a>] start_secondary+0x45a/0x469

BUG: soft lockup detected on CPU#1!

Call Trace:
 [<ffffffff80069632>] show_trace+0x34/0x47
 [<ffffffff80069657>] dump_stack+0x12/0x17
 [<ffffffff800b4d8b>] softlockup_tick+0xdb/0xed
 [<ffffffff8009432e>] update_process_times+0x42/0x68
 [<ffffffff8007427c>] smp_local_timer_interrupt+0x23/0x47
 [<ffffffff80074938>] smp_apic_timer_interrupt+0x41/0x47
 [<ffffffff8005c042>] apic_timer_interrupt+0x66/0x6c
DWARF2 unwinder stuck at apic_timer_interrupt+0x66/0x6c
Leftover inexact backtrace:
 <IRQ>  [<ffffffff88451ea4>] :cyclades:cyz_poll+0x313/0x77f
 [<ffffffff88451b91>] :cyclades:cyz_poll+0x0/0x77f
 [<ffffffff80093b54>] run_timer_softirq+0x133/0x1b0
 [<ffffffff80011c0e>] __do_softirq+0x5e/0xd5
 [<ffffffff8005c6b0>] call_softirq+0x1c/0x28
 [<ffffffff8006a7eb>] do_softirq+0x2c/0x85
 [<ffffffff80068df5>] default_idle+0x0/0x50
 [<ffffffff8005c042>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80068e1e>] default_idle+0x29/0x50
 [<ffffffff800472a9>] cpu_idle+0x95/0xb8
 [<ffffffff8007409a>] start_secondary+0x45a/0x469

[root@sysiphus2 bin]# dmesg

The second dmesg call hangs as the kernel seems to have locked up.
The message about the cyclades driver on the top comes from the successfull
module insertion with modprobe cyclades.

Version-Release number of selected component (if applicable):
kernel-2.6.18-1.2747.el5

How reproducible:
Always

Steps to Reproduce:
1. modprobe cyclades
2. cyzload -f $firmware file
3. lean back, wait a few seconds.
  
If there's anything I can do to help debugging this, just say so. thx.

Comment 1 Aristeu Rozanski 2007-09-12 12:26:33 UTC
Hi Andreas,
just to make sure: did you tested this card with this firmware with success in
another machine or RHEL version perhaps?


Comment 2 Andreas Thienemann 2007-09-12 12:39:23 UTC
Hey Aristeu,

not yet. I'll have a new 5.1beta CSB build available soonish though. I'll try
the card on it then.

Comment 3 Aristeu Rozanski 2007-09-14 13:03:14 UTC
Andreas, I've checked where it could be possible generating this problem and my
suspects of firmware problem increased. Are you willing to test a kernel that will
be a bit more verbose when the problem happens?



Comment 4 Andreas Thienemann 2007-09-14 22:02:10 UTC
Hello,

no problem, I'll gladly try the debugging kernel. I think I'll have access to
the test-rig next week or the week after. So the answer might take a bit
unfortunately. :(

Comment 5 Andreas Thienemann 2007-09-18 20:45:26 UTC
Aristeu, I got access to the test-rig today but could only install the rhel5
2.6.18-8 kernel on the box. The problem is the same though, down to the stacktrace.
I'll have access to the system from now on, so I'd be glad to try your debugging
kernel.


Comment 6 Aristeu Rozanski 2007-09-18 21:05:39 UTC
Andreas, did you tried other firmware versions?


Comment 7 Andreas Thienemann 2007-09-18 21:26:59 UTC
I tried the latest firmware there is, which is from 2005.
ftp://ftp.cyclades.com/pub/cyclades/async/linux/cyc_async-700-1.tar.gz



Comment 8 Aristeu Rozanski 2007-09-25 19:37:28 UTC
Andreas, the test kernels are in:
http://people.redhat.com/arozansk/cyclades/
try to get the dmesg output and attach here (using serial console may help)


Comment 9 Andreas Thienemann 2007-09-25 22:59:23 UTC
(In reply to comment #8)

> http://people.redhat.com/arozansk/cyclades/
> try to get the dmesg output and attach here (using serial console may help)

I just tried your kernel and it's looking quite good right now. The system is
stable, firmware has been loaded into the card and the cyclades.ko module has
been loaded as well.


serial output is rather sparse right now:

Cyclades driver 2.3.2.20 2004/02/25 18:14:16
        built Sep 19 2007 13:54:39
Cyclades-8Zo/PCI #1: 0xff500000-0xff57ffff, 8 channels starting from port 0.


I'll leave the machine running over night and do some tests tomorrow but right
now it's looking quite good as the box locked up in the past nearly instantly
after loading the firmware.

Comment 10 Aristeu Rozanski 2007-10-02 13:56:22 UTC
Andreas, any news? it's still running?



Comment 11 Aristeu Rozanski 2008-01-07 16:46:53 UTC
Andreas, any updates on this one?


Comment 12 Andreas Thienemann 2008-01-07 16:53:51 UTC
Sorry, missed that. Thx for the heads-up.

System seems to be running fine for some time now in production use.

Comment 13 Aristeu Rozanski 2008-01-07 17:00:46 UTC
OK, this is strange. The patch does nothing but warn when it gets too many
packets from the card. Please try it with the newest RHEL5 kernel you have
access and tell me how it goes.


Comment 14 Aristeu Rozanski 2008-02-07 16:50:32 UTC
Andreas, any news?


Comment 15 Andreas Thienemann 2008-02-07 20:37:25 UTC
Sorry, forgot about that. Newest kernels are fine. It's currently been running
for a month without too much troubles with 2.6.18-59.el5bbHPmgmtxen, a special
xen build von brew.

I'd say, close that bug.


Note You need to log in before you can comment on or make changes to this bug.