|Summary:||2.4.20-20.7 has kernel stack trace deadlocks - gcc-2.96-113|
|Product:||[Retired] Red Hat Linux||Reporter:||Howard Owen <howen>|
|Component:||kernel||Assignee:||Dave Jones <davej>|
|Status:||CLOSED ERRATA||QA Contact:||Brian Brock <bbrock>|
|Version:||7.3||CC:||jason, k.georgiou, linuxcub, pfrields, riel|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2004-01-05 03:22:07 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
|Bug Depends On:||87659|
Description Howard Owen 2003-10-27 16:39:17 UTC
From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686) Gecko/20030807 Galeon/1.3.5 Description of problem: Kernels built with gcc-2.96-113 are vulnerable to a deadlock when the current process has traversed 5 successive symlinks on an NFS file system, and the kernel takes an IRQ. There being less than 1KiB remainingon the process's kernel stack, the service routine prints a stack trace rather than servicing the interrupt. If the systemconsoleis on a serial port, this results in more IRQs from the UART, which results in a deadlock. Somehow, gcc-2.96-113 makes this condition far more likely. Kernels built with gcc-2.96-112 or gcc3 do not exhibit the problem. Thedefault rsize/wsize of 4KiB makes he problem less likely to occur also. With rsize=wsize=16KiB, the problem shows up reliably. Version-Release number of selected component (if applicable): kernel-2.4.20-20.7 How reproducible: Always Steps to Reproduce: 1. On an NFS file system with rsize-wsize=16KiB, run the attached perl script with --jobs=30 --net. 2. Start three to eight large transfers off the box using scp 3. Watch the serial console Actual Results: Kernel stack trace messages appear on the serial console and the system deadlocks Expected Results: Load should go up to above 30. The large copies should run to completion. The system should stay up. Additional info: Sample stack trace
Comment 1 Howard Owen 2003-10-27 16:44:04 UTC
Created attachment 95517 [details] Script to deadlock kernels built with gcc-2.96-113 With cwd in a NFS mount with rsize=wsize=16KiB, run this script with --jobs=30 --net Start 3-8 large file transfers off the box. (I use scp with a 500MiB file.) Watch the serial console. The system will deadlock before the transferrs complete.
Comment 2 Howard Owen 2003-10-27 16:44:53 UTC
Created attachment 95518 [details] Sample console messages when kernel deadlocks.
Comment 3 Howard Owen 2003-10-27 16:50:08 UTC
I placed the severity at "security" because this is essentially a denial-of-service attack on the affected kernel. A local user with normal privileges can deadlock the system.
Comment 4 Joshua Jensen 2003-12-23 19:21:10 UTC
I noticed that a new kernel errata, https://rhn.redhat.com/errata/RHBA-2003-394.html, mentions this bug... but it doesn't say way. Does this mean that the kernel *wasn't* compiled with gcc-2.96-113? If so, what version does Red Hat recommend?
Comment 5 Howard Owen 2003-12-23 19:33:21 UTC
The new kernel was compiled with gcc-2.96-126, which is an unreleased version. I haven't tested this kernel yet, but I will soon. Since they call out this bug, I'm assuming the unreleased gcc addresses the issue. But if you want to build your own kernel, the workaround of downgrading to gcc-2.96-112 is the only solution I'm aware of. Perhaps Red Hat will fix bug #87659 by releasing the updated gcc before 7.3 end-of-life next week. The fix for this bug is well over half a loaf, however, since most installations won't be running custom kernels.
Comment 6 Erling Jacobsen 2003-12-29 22:10:11 UTC
I haven't found a gcc-2.96-124 myself, but I _did_ find a SRPM of gcc-2.96-124 as an update to the 2.1 enterprise version of RHL. One of the changes from -113 to -124 is apparently a fix for some "excessive stack usage caused by the -fno-strict-aliasing patch". Doesn't that sound relevant ? I'm no expert, but I think it would be interesting to take the relevant new patches from gcc-2.96-124 and stick them into gcc-2.96-113, rebuild gcc, and use that to rebuild the kernel.
Comment 7 Howard Owen 2003-12-31 22:52:42 UTC
The patch you are apparantly referring to: gcc-strict-alias-optimization2.patch, seems to address the problem when applied to the gcc-2.96-113 SRPM. At least, a variant of my crash script doesn't crash a kernel built with the resulting gcc. The pach applied cleanly and the gcc build went smoothly. However I'm not in a position to judge if this patch, applied in isolation, is a good general fix for production systems. One for Progeny, I guess.