Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 108092 - 2.4.20-20.7 has kernel stack trace deadlocks - gcc-2.96-113
Summary: 2.4.20-20.7 has kernel stack trace deadlocks - gcc-2.96-113
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.3
Hardware: i686
OS: Linux
Target Milestone: ---
Assignee: Dave Jones
QA Contact: Brian Brock
: 91566 (view as bug list)
Depends On: 87659
TreeView+ depends on / blocked
Reported: 2003-10-27 16:39 UTC by Howard Owen
Modified: 2015-01-04 22:03 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2004-01-05 03:22:07 UTC

Attachments (Terms of Use)
Script to deadlock kernels built with gcc-2.96-113 (deleted)
2003-10-27 16:44 UTC, Howard Owen
no flags Details
Sample console messages when kernel deadlocks. (deleted)
2003-10-27 16:44 UTC, Howard Owen
no flags Details

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2003:394 normal SHIPPED_LIVE Updated 2.4 kernel fixes various bugs 2003-12-23 05:00:00 UTC

Description Howard Owen 2003-10-27 16:39:17 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686) Gecko/20030807 Galeon/1.3.5

Description of problem:
Kernels built with gcc-2.96-113 are vulnerable to a deadlock when the current
process has traversed 5 successive symlinks on an NFS file system, and the
kernel takes an IRQ. There being less than 1KiB remainingon the process's kernel
stack, the service routine prints a stack trace rather than servicing the
interrupt. If the systemconsoleis on a serial port, this results in more IRQs
from the UART, which results in a deadlock.

Somehow, gcc-2.96-113 makes this condition far more likely. Kernels built with
gcc-2.96-112 or gcc3 do not exhibit the problem. Thedefault rsize/wsize of 4KiB
makes he problem less likely to occur also. With rsize=wsize=16KiB, the problem
shows up reliably.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. On  an NFS file system with rsize-wsize=16KiB, run the attached perl script
with --jobs=30 --net.
2. Start three to eight large transfers off the box using scp
3. Watch the serial console

Actual Results:  Kernel stack trace messages appear on the serial console and
the system deadlocks

Expected Results:  Load should go up to above 30. The large copies should run to
completion. The system should stay up.

Additional info: Sample stack trace

Comment 1 Howard Owen 2003-10-27 16:44:04 UTC
Created attachment 95517 [details]
Script to deadlock kernels built with gcc-2.96-113

With cwd in a NFS mount with rsize=wsize=16KiB, run this script with --jobs=30

Start 3-8 large file transfers off the box. (I use scp with a 500MiB file.)
Watch the serial console. The system will deadlock before the transferrs

Comment 2 Howard Owen 2003-10-27 16:44:53 UTC
Created attachment 95518 [details]
Sample console messages when kernel deadlocks.

Comment 3 Howard Owen 2003-10-27 16:50:08 UTC
I placed the severity at "security" because this is essentially a
denial-of-service attack on the affected kernel. A local user with normal
privileges can deadlock the system.

Comment 4 Joshua Jensen 2003-12-23 19:21:10 UTC
I noticed that a new kernel errata,, mentions this bug...
but it doesn't say way.  Does this mean that the kernel *wasn't*
compiled with gcc-2.96-113?  If so, what version does Red Hat recommend?

Comment 5 Howard Owen 2003-12-23 19:33:21 UTC
The new kernel was compiled with gcc-2.96-126, which is an unreleased
version. I haven't tested this kernel yet, but I will soon. Since they
call out this bug, I'm assuming the unreleased gcc addresses the issue.

But if you want to build your own kernel, the workaround of
downgrading to gcc-2.96-112 is the only solution I'm aware of. Perhaps
Red Hat will fix bug #87659 by releasing the updated gcc before 7.3
end-of-life next week. The fix for this bug is well over half a loaf,
however, since most installations won't be running custom kernels.

Comment 6 Erling Jacobsen 2003-12-29 22:10:11 UTC
I haven't found a gcc-2.96-124 myself, but I _did_ find a SRPM of
gcc-2.96-124 as an update to the 2.1 enterprise version of RHL.
One of the changes from -113 to -124 is apparently a fix for some
"excessive stack usage caused by the -fno-strict-aliasing patch".
Doesn't that sound relevant ? I'm no expert, but I think it would
be interesting to take the relevant new patches from gcc-2.96-124
and stick them into gcc-2.96-113, rebuild gcc, and use that to rebuild
the kernel.

Comment 7 Howard Owen 2003-12-31 22:52:42 UTC
The patch you are apparantly referring to:
gcc-strict-alias-optimization2.patch, seems to address the problem
when applied to the gcc-2.96-113 SRPM. At least, a variant of my crash
script doesn't crash a kernel built with the resulting gcc. The pach
applied cleanly and the gcc build went smoothly. However I'm not in a
position to judge if this patch, applied in isolation, is a good
general fix for production systems.

One for Progeny, I guess.

Comment 8 Dave Jones 2004-01-05 04:16:49 UTC
*** Bug 91566 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.