Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 229274 - RHEL-3 kernel can hang while doing Sysrq on the keyboard during heavy network traffic
Summary: RHEL-3 kernel can hang while doing Sysrq on the keyboard during heavy network...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.8
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Red Hat Kernel Manager
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-02-19 21:18 UTC by Chris Lalancette
Modified: 2007-11-17 01:14 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-10-19 18:38:35 UTC
Target Upstream Version:


Attachments (Terms of Use)
Debugging netconsole patch (deleted)
2007-02-19 21:32 UTC, Chris Lalancette
no flags Details | Diff

Description Chris Lalancette 2007-02-19 21:18:43 UTC
Description of problem:
The RHEL-3 kernel can hang while doing a Sysrq-? on the keyboard, while there is
heavy network traffic.  Here is how to reproduce

1)  Boot a single processor system with the SMP kernel (or, alternatively, boot
with maxcpus=1)
1)  ping flood from 2 (or more) separate boxes to your victim machine
2)  enable sysrq on the machine (echo 1 > /proc/sys/kernel/sysrq)
3)  Press Alt-Sysrq-t repeatedly

And watch the box spin forever.

What happens is that we take the dev->xmit_lock in softirq context, to process
the incoming/outgoing network packets.  However, interrupts are *not* disabled
during this time.  So if you hit the Sysrq sequence while it is holding the
dev->xmit_lock, printk will be fired, which will lead to the
write_netconsole_msg() function in drivers/net/netconsole.c, which also tries to
take dev->xmit_lock, and we deadlock.

     Jeff suggested that in write_netconsole_msg() we could do a spin_trylock()
and a check for xmit_owner != current_cpu, and if we don't get the lock (because
someone else is already holding it), we just drop the current packet.

Comment 1 Chris Lalancette 2007-02-19 21:32:08 UTC
Created attachment 148369 [details]
Debugging netconsole patch

For the morbidly curious, this is the patch used to show what happens when you
get into this situation.

Comment 2 RHEL Product and Program Management 2007-10-19 18:38:35 UTC
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.


Note You need to log in before you can comment on or make changes to this bug.